My Youtube Channel

Please Subscribe

Flag of Nepal

Built in OpenGL

Word Cloud in Python

With masked image

Saturday, November 28, 2020

Visualizing Decision tree in Python (with codes included)

List of libraries required to be installed (if not already installed). Here installation is done through Jupyter Notebook. For terminal use only "pip install library-name".

#import sys

#!{sys.executable} -m pip install numpy

#!{sys.executable} -m pip install pandas

#!{sys.executable} -m pip install sklearn

#!{sys.executable} -m pip install hvplot

#!{sys.executable} -m pip install six

#!{sys.executable} -m pip install pydotplus

#!{sys.executable} -m pip install  python-graphviz

Importing libraries:

import numpy as np 

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn import tree

import hvplot.pandas

About the dataset

Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y.

Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to.

It is a sample of binary classifier, and you can use the training part of the dataset to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.

Let's see the sample of dataset:

The shape of dataset is (200,6)

Pre-processing

Using my_data as the Drug.csv data read by pandas, declare the following variables:

  • as the Feature Matrix (data of my_data)
  • as the response vector (target)
  • Remove the column containing the target name since it doesn't contain numeric values.

As you may figure out, some features in this dataset are categorical such as Sex or BP. Unfortunately, Sklearn Decision Trees do not handle categorical variables. But still we can convert these features to numerical values. pandas.get_dummies() Convert categorical variable into dummy/indicator variables.

Now we can fill the target variable.











Setting up the Decision Tree


We will be using train/test split on our decision tree. Let's import train_test_split from sklearn.cross_validation.


from sklearn.model_selection import train_test_split


Now train_test_split will return 4 different parameters. We will name them:
X_trainset, X_testset, y_trainset, y_testset

The train_test_split will need the parameters:
X, y, test_size=0.3, and random_state=3.

The X and y are the arrays required before the split, the test_size represents the ratio of the testing dataset, and the random_state ensures that we obtain the same splits.




Modeling


We will first create an instance of the DecisionTreeClassifier called drugTree.
Inside of the classifier, specify criterion="entropy" so we can see the information gain of each node.

We can use "gini" criterion as well. Result will be the same. Gini is actually a default criteria for decision tree classifier.












From the graph below, we can see the accuracy score is highest at max-depth=4 and it remains constant

thereafter so we have used max-depth=4 in this case.















We can also plot max-depth vs accuracy score for both testset and trainset:

For testset:


















For trainset:


















Now, plotting the graph of max-depth vs accuracy score for both trainset and testset:


















Visualization

















The final image of decision tree is given below:






























Another easy method to generate decision tree:
























Drawing Decision path (more readable form of decision tree):

























Important features for classification

Listing the features in their rank of importance for classifying the data.












































The github link to the program can be found here.

Tuesday, November 10, 2020

Python code to extract Temporal Expression from a text (Using Regular Expression)

#Method 1

Code:

import re

print('Enter the text:')

text = input()

months='(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)'


re1=r'\w?((mor|eve)(?:ning)|after(?:noon)?|(mid)?night|today|tomorrow|(yester|every)(?:day))' 

re2=r'\d?\w*(ago|after|before|now)'

re3=r'((\d{1,2}(st|nd|rd|th)\s?)?(%s\s?)(\d{1,2})?)' % months

re4=r'\d{1,2}\s?[a|p]m'

re5=r'(\d{1,2}(:\d{2})?\s?((hour|minute|second|hr|min|sec)(?:s)?))'

re6=r'(\d{1,2}/\d{1,2}/\d{4})|(\d{4}/\d{1,2}/\d{1,2})'

re7=r'(([0-1]?[0-9]|2[0-3]):[0-5][0-9])'

re8=r'\d{4}'


relist= [re1, re2, re3, re4, re5, re6, re7, re8]


print("\n\nTemporal expressions are listed below:\n")

for exp in relist:

    match = re.findall(exp,text)

    for x in match: 

        print(x)


Output:

Enter the text:
I get up in the morning at 6 am. I have been playing cricket since 2004.  My birth date is 1997/9/8. It has been 2 hours since I am studying. The time right now is  12:45. He is coming today. I study till midnight everyday. It takes 2 mins to solve this problem. He is coming in January. He went abroad on 2nd March, 1999. He was working here 5 years ago. 


Temporal expressions are listed below:

('morning', 'mor', '', '')
('today', '', '', '')
('midnight', '', 'mid', '')
('everyday', '', '', 'every')
now
ago
('January', '', '', 'January', 'January', '', '')
('2nd March', '2nd ', 'nd', 'March', 'March', '', '')
6 am
('2 hours', '', 'hours', 'hour')
('2 mins', '', 'mins', 'min')
('', '1997/9/8')
('12:45', '12')
2004
1997
1999


#Method 2
Code:
import re
print('Enter the text:')
text = input()
text=list(text.split("."))
months='(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)'

re1=r'\w?((mor|eve)(?:ning)|after(?:noon)?|(mid)?night|today|tomorrow|(yester|every)(?:day))'
re2=r'\d?\w*(ago|after|before|now)'
re3=r'((\d{1,2}(st|nd|rd|th)\s?)?(%s\s?)(\d{1,2})?)' % months
re4=r'\d{1,2}\s?[a|p]m'
re5=r'(\d{1,2}(:\d{2})?\s?((hour|minute|second|hr|min|sec)(?:s)?))'
re6=r'(\d{1,2}/\d{1,2}/\d{4})|(\d{4}/\d{1,2}/\d{1,2})'
re7=r'(([0-1]?[0-9]|2[0-3]):[0-5][0-9])'
re8=r'\d{4}'

relist= [re1, re2, re3, re4, re5, re6, re7, re8]

re_compiled = re.compile("(%s|%s|%s|%s|%s|%s|%s|%s)" % (re1, re2, re3, re4, re5, re6, re7, re8))
print("\n\n Output(with temporal expression enclosed within square bracket:\n")

for s in text:
    print (re.sub(re_compiled, r'[\1]', s))


Output:
Enter the text:
I get up in the morning at 6 am. I have been playing cricket since 2004.  My birth date is 1997/9/8. It has been 2 hours since I am studying. The time right now is  12:45. He is coming today. I study till midnight everyday. It takes 2 mins to solve this problem. He is coming in January. He went abroad on 2nd March, 1999. He was working here 5 years ago. 


 Output(with temporal expression enclosed within square bracket:

I get up in the [morning] at [6 am]
 I have been playing cricket since [2004]
  My birth date is [1997/9/8]
 It has been [2 hours] since I am studying
 The time right [now] is  [12:45]
 He is coming [today]
 I study till [midnight] [everyday]
 It takes [2 mins] to solve this problem
 He is coming in [January]
 He went abroad on [2nd March], [1999]
 He was working here 5 years [ago]
  

Python code to extract temporal expression from the text using Regular E...

Sunday, September 13, 2020

Raptor tutorial 3: Using loop in raptor

Raptor tutorial 2: If/else statement in raptor

Raptor tutorial 1: Find the sum of two numbers

Thursday, September 3, 2020

Solved: Program not working in Codeblocks

Make attractive CV online for absolutely free. No watermarks added

Tuesday, August 25, 2020

Shotcut: Easiest way to add scrolling text and credits in video

Monday, August 24, 2020

Problem solved: Codeblocks compiler not working or not found? Install co...

Saturday, August 22, 2020

Create facebook home page using HTML and CSS

Best online Course for an absolute beginner of Data Science with certifi...

When and how to draw scatter plot in excel

When and how to draw line chart in excel?

When and how to draw histogram in excel

Get free Online courses from above 50 leading websites like Oracle, IBM,...

How to Hardcode Subtitle to Video?

100% working: How to recover the permanently deleted files for free?

Make online CV and Cover letter for free

10 best websites to get free online courses from udemy, lynda, bitdegree...

How to get Udemy courses for free with certificates?

Jobs portal web application

Solved! search.yahoo.com browser hijacker

Easy way Compress video without decreasing quality of the video

easily migrate a django DB from SQLite to MYSQL

Could not find any executable java binary. Please install java in your P...

Cricket database project (with CRUD operations) developed using php and ...

Add different types of page no. in a single document of MS Word.

How to make Picture gallery using Lightbox library ?

Solved: Fatal error: Uncaught Error: Call to undefined function mysqli_...

Advanced Hospital Management System Project with Data Visualization tools

Wednesday, August 19, 2020

Get courses of Asian Development Bank and Microsoft virtual internship f...

Get fully funded scholarships for studying Masters/ PHD in USA, Europe, ...

Sunday, August 16, 2020

Various features of Markdown in Jupyter Notebook


Output of Emphasis:




Output of List:



Output of Links:




Output of Images:





Output of table:




Output of Blockquotes:



Output of Horizontal Rule:



Output of Youtube links:




Output of Headers:



Get the Github link here.

Web scrapping using a single line of code in python

We will scrap the data of wikipedia using a single line of code in python. No extra libraries are required. Only Pandas can do the job.

Step 1: Install and import pandas library

import numpy as np 

Step 2: Read the data of web (here Wikipedia website) using pd.read_html('Website link here')[integer]

df = pd.read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory')[1]

Step 3: View the data scrapped from the web

Step 4: In case there are multiple table within a web page, you can change the index value to an integer starting from 0 until you get your required data (i.e. [0] or [1] or [2] or [3] and so on).

Build a colorful Word Cloud in python using mask image

Word cloud is a data visualization tool in data science. It is very efficient to visualize various words in a text according to the quantum of their repetition within the text. The stopwords have been ignored while visualization. A text file called "skill.txt" has been used to visualize. Mask image of map of Nepal has been used to visualize the word cloud.

The libraries required are:

Reading the text file "alice.txt" whose word cloud will be formed. After reading text file, setting the stopwords.


Generating a word cloud and storing it into "skillwc" variable.

Importing libraries and creating a simple image of word cloud (without using mask image).



Now, using mask image of map of Nepal to create word cloud. First of all, we will open the image and save it in a variable "mask_image" and then view the mask image without superimposing the text onto it.






Click here to download the collection of mask image.

Finally, we will impose the text file 'alice.txt' onto the image shown above with adding original color of image to the word cloud instead of default color.




Get the Github link here.