Saturday, November 28, 2020

Visualizing Decision tree in Python (with codes included)

By VIJAY YADAV November 28, 2020 Data Visulaization, Machine learning, Python No comments

List of libraries required to be installed (if not already installed). Here installation is done through Jupyter Notebook. For terminal use only "pip install library-name".

#import sys

#!{sys.executable} -m pip install numpy

#!{sys.executable} -m pip install pandas

#!{sys.executable} -m pip install sklearn

#!{sys.executable} -m pip install hvplot

#!{sys.executable} -m pip install six

#!{sys.executable} -m pip install pydotplus

#!{sys.executable} -m pip install python-graphviz

Importing libraries:

import numpy as np

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn import tree

import hvplot.pandas

About the dataset

Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y.

Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to.

It is a sample of binary classifier, and you can use the training part of the dataset to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.

Let's see the sample of dataset:

The shape of dataset is (200,6)

Pre-processing

Using my_data as the Drug.csv data read by pandas, declare the following variables:

X as the Feature Matrix (data of my_data)
y as the response vector (target)
Remove the column containing the target name since it doesn't contain numeric values.

As you may figure out, some features in this dataset are categorical such as Sex or BP. Unfortunately, Sklearn Decision Trees do not handle categorical variables. But still we can convert these features to numerical values. pandas.get_dummies() Convert categorical variable into dummy/indicator variables.

Now we can fill the target variable.

Setting up the Decision Tree

We will be using train/test split on our decision tree. Let's import train_test_split from sklearn.cross_validation.

from sklearn.model_selection import train_test_split

Now train_test_split will return 4 different parameters. We will name them:
X_trainset, X_testset, y_trainset, y_testset

The train_test_split will need the parameters:
X, y, test_size=0.3, and random_state=3.

The X and y are the arrays required before the split, the test_size represents the ratio of the testing dataset, and the random_state ensures that we obtain the same splits.

Modeling

We will first create an instance of the DecisionTreeClassifier called drugTree.
Inside of the classifier, specify criterion="entropy" so we can see the information gain of each node.

We can use "gini" criterion as well. Result will be the same. Gini is actually a default criteria for decision tree classifier.

From the graph below, we can see the accuracy score is highest at max-depth=4 and it remains constant

thereafter so we have used max-depth=4 in this case.

We can also plot max-depth vs accuracy score for both testset and trainset:

For testset:

For trainset:

Now, plotting the graph of max-depth vs accuracy score for both trainset and testset:

Visualization

The final image of decision tree is given below:

Another easy method to generate decision tree:

Drawing Decision path (more readable form of decision tree):

Important features for classification

Listing the features in their rank of importance for classifying the data.

The github link to the program can be found here.

Tuesday, November 10, 2020

Python code to extract Temporal Expression from a text (Using Regular Expression)

By VIJAY YADAV November 10, 2020 Python No comments

#Method 1

Code:

import re

print('Enter the text:')

text = input()

re2=r'\d?\w*(ago|after|before|now)'

re3=r'((\d{1,2}(st|nd|rd|th)\s?)?(%s\s?)(\d{1,2})?)' % months

re4=r'\d{1,2}\s?[a|p]m'

re5=r'(\d{1,2}(:\d{2})?\s?((hour|minute|second|hr|min|sec)(?:s)?))'

re6=r'(\d{1,2}/\d{1,2}/\d{4})|(\d{4}/\d{1,2}/\d{1,2})'

re7=r'(([0-1]?[0-9]|2[0-3]):[0-5][0-9])'

re8=r'\d{4}'

relist= [re1, re2, re3, re4, re5, re6, re7, re8]

print("\n\nTemporal expressions are listed below:\n")

for exp in relist:

match = re.findall(exp,text)

for x in match:

print(x)

Output:

Enter the text:
I get up in the morning at 6 am. I have been playing cricket since 2004.  My birth date is 1997/9/8. It has been 2 hours since I am studying. The time right now is  12:45. He is coming today. I study till midnight everyday. It takes 2 mins to solve this problem. He is coming in January. He went abroad on 2nd March, 1999. He was working here 5 years ago. 


Temporal expressions are listed below:

('morning', 'mor', '', '')
('today', '', '', '')
('midnight', '', 'mid', '')
('everyday', '', '', 'every')
now
ago
('January', '', '', 'January', 'January', '', '')
('2nd March', '2nd ', 'nd', 'March', 'March', '', '')
6 am
('2 hours', '', 'hours', 'hour')
('2 mins', '', 'mins', 'min')
('', '1997/9/8')
('12:45', '12')
2004
1997
1999

#Method 2

Code:

import re

print('Enter the text:')
text = input()
text=list(text.split("."))
months='(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)'

re1=r'\w?((mor|eve)(?:ning)|after(?:noon)?|(mid)?night|today|tomorrow|(yester|every)(?:day))'
re2=r'\d?\w*(ago|after|before|now)'
re3=r'((\d{1,2}(st|nd|rd|th)\s?)?(%s\s?)(\d{1,2})?)' % months
re4=r'\d{1,2}\s?[a|p]m'
re5=r'(\d{1,2}(:\d{2})?\s?((hour|minute|second|hr|min|sec)(?:s)?))'
re6=r'(\d{1,2}/\d{1,2}/\d{4})|(\d{4}/\d{1,2}/\d{1,2})'
re7=r'(([0-1]?[0-9]|2[0-3]):[0-5][0-9])'
re8=r'\d{4}'

relist= [re1, re2, re3, re4, re5, re6, re7, re8]

re_compiled = re.compile("(%s|%s|%s|%s|%s|%s|%s|%s)" % (re1, re2, re3, re4, re5, re6, re7, re8))
print("\n\n Output(with temporal expression enclosed within square bracket:\n")

for s in text:
    print (re.sub(re_compiled, r'[\1]', s))

Output:

Enter the text:
I get up in the morning at 6 am. I have been playing cricket since 2004.  My birth date is 1997/9/8. It has been 2 hours since I am studying. The time right now is  12:45. He is coming today. I study till midnight everyday. It takes 2 mins to solve this problem. He is coming in January. He went abroad on 2nd March, 1999. He was working here 5 years ago. 


 Output(with temporal expression enclosed within square bracket:

I get up in the [morning] at [6 am]
 I have been playing cricket since [2004]
  My birth date is [1997/9/8]
 It has been [2 hours] since I am studying
 The time right [now] is  [12:45]
 He is coming [today]
 I study till [midnight] [everyday]
 It takes [2 mins] to solve this problem
 He is coming in [January]
 He went abroad on [2nd March], [1999]
 He was working here 5 years [ago]

Python code to extract temporal expression from the text using Regular E...

By VIJAY YADAV November 10, 2020 No comments

VIJAY YADAV

AITB International Conference, 2019

My Youtube Channel

Flag of Nepal

World Covid-19 Data Visualization

Word Cloud in Python