My Youtube Channel

Please Subscribe

Flag of Nepal

Built in OpenGL

Word Cloud in Python

With masked image

Sunday, August 16, 2020

Various features of Markdown in Jupyter Notebook


Output of Emphasis:




Output of List:



Output of Links:




Output of Images:





Output of table:




Output of Blockquotes:



Output of Horizontal Rule:



Output of Youtube links:




Output of Headers:



Get the Github link here.

Web scrapping using a single line of code in python

We will scrap the data of wikipedia using a single line of code in python. No extra libraries are required. Only Pandas can do the job.

Step 1: Install and import pandas library

import numpy as np 

Step 2: Read the data of web (here Wikipedia website) using pd.read_html('Website link here')[integer]

df = pd.read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory')[1]

Step 3: View the data scrapped from the web

Step 4: In case there are multiple table within a web page, you can change the index value to an integer starting from 0 until you get your required data (i.e. [0] or [1] or [2] or [3] and so on).

Build a colorful Word Cloud in python using mask image

Word cloud is a data visualization tool in data science. It is very efficient to visualize various words in a text according to the quantum of their repetition within the text. The stopwords have been ignored while visualization. A text file called "skill.txt" has been used to visualize. Mask image of map of Nepal has been used to visualize the word cloud.

The libraries required are:

Reading the text file "alice.txt" whose word cloud will be formed. After reading text file, setting the stopwords.


Generating a word cloud and storing it into "skillwc" variable.

Importing libraries and creating a simple image of word cloud (without using mask image).



Now, using mask image of map of Nepal to create word cloud. First of all, we will open the image and save it in a variable "mask_image" and then view the mask image without superimposing the text onto it.






Click here to download the collection of mask image.

Finally, we will impose the text file 'alice.txt' onto the image shown above with adding original color of image to the word cloud instead of default color.




Get the Github link here.

Covid-19 Data visualization across the world using Choropleth map

Covid-19 data visualization of Nepal using Choropleth map

Capstone project of Data Science - The battle of neighborhood in Dubai

Build colorful Word Cloud in python - Data Visualization

Best online Course for an absolute beginner of Data Science with certifi...

Web Scrapping using one line of code in python

Markdown in Jupyter Notebook

Friday, August 14, 2020

Covid-19 Data Visualization across the World using Choropleth map

Introduction

This project visualizes the Covid-19 data (i.e. Total cases, Deaths and Recoveries) across various Provinces and Districts of Nepal as of 12th August, 2020. Geojson file of Nepal's states and districts have been used. Also python library i.e. Folium has been used to generate Choropleth map whose geo_data value is the geojson of Nepal.

The libraries imported are:

Data description:

Covid-19 data of  Countries across the world were scrapped from wikipedia.

Click here to go to the wikipedia page.

Simple one line code can be used to scrap the table of wikipedia. We will store the scrapped data into a dataframe called 'df'.

df = pd.read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory')[1]

View of original data:

Data Wrangling/ Cleaning:

Step 1: 

Selecting only required columns into new dataframe df1 from the data above.



Step 2:

 Converting the multi-index column into single-index colums.

Step 3:

Removing the index attached with the name of each countries in the dataframe above.

df1['Countries'] = df1['Countries'].str.replace(r'\[.*?\]$', '') 

Step 4:

Changing the country name 'United States' to 'United States of America' to match with the name in Geojson file.

df1['Countries'].replace('United States', 'United States of America', inplace=True)

Step 5:

We can see the last 3 rows of dataframe are not required so dropping them out.

df1=df1[:-3]

Step 6:

Replacing the value 'No data' to 0 (Zero) in each column.

df1['Recovered'].replace('No data', '0', inplace=True)


Step 7:

Changing the data type of columns Cases, Recovered and Deaths to integer.

After changing the datatypes:

Visualizing the data across world:

For cases:


Similarly, it can be done for Recovered and Deaths.

For recovered:

For deaths:


Get the Github link here.

Covid-19 Data Visualization of Nepal using Choropleth map

Introduction

This project visualizes the Covid-19 data (i.e. Total cases, Deaths and Recoveries) across various Provinces and Districts of Nepal as of 9th August, 2020. Geojson file of Nepal's states and districts have been used. Also python library i.e. Folium has been used to generate Choropleth map whose geo_data value is the geojson of Nepal.

The libraries imported are:

Data description:

Covid-19 data of various provinces and districts were scrapped from wikipedia.

Click here to go to the wikipedia page.

Simple one line code can be used to scrap the table of wikipedia. We will store the scrapped data into a dataframe called 'df'.

df = pd.read_html('https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/Nepal_medical_cases_by_province_and_district')[1]

Original view of data:

Data Wrangling/ Cleaning:

Step 1: 

We can see in data above that the columns are of multi-index. So, converting it into single index columns.

Step 2:

Dropping the 'Index Case column'.

df.drop(columns=['Index Case'], axis=1, inplace=True)

Step 3:

We can see the rows with index 84 (it's a grand total case in Nepal) and 85 are not required so dropping them out.

df=df[:-2]   # Getting all rows except the last two rows 

Step 4:

We can see in above image that the data types of columns Cases, Recovered and Deaths are not in desired form. So, converting them into 'integer'.

Step 5:

Our dataframe (df) after cleaning looks like this:


We can see the data of Provinces and Districts are together in single dataframe. So, we need to separate them into different dataframes.

Creating a dataframe of Provinces only:

Step 1:

Extracting the data of only provinces into a new dataframe called df_prov.

We can use two methods to do so.

Method 1

#df_prov=df.iloc[[0, 15, 24, 38, 50, 63, 74],:]


Method 2    (More robust method as in above method the index of provinces may change in original link of data)

df_prov=df[df['Location'].str.contains('Province') | df['Location'].str.contains('Bagmati') | df['Location'].str.contains('Gandaki') | df['Location'].str.contains('Karnali') | df['Location'].str.contains('Sudurpashchim') ]

Step 2:

Resetting the index of newly formed dataframe.

df_prov.reset_index(drop=True, inplace=True)

View of new dataframe:
Step 3:
Creating a copy of this dataframe to use it while creating a dataframe for districts.

df_backup=df_prov.copy()

Step 4:
Renaming the Provinces to match them with the name of Provinces mentioned in Geojson file of Nepal.

Step 5:
Renaming the column 'Location' to 'Province'.

df_prov.rename(columns={'Location':'Province'}, inplace=True)

Final view of dataframe of Provinces:

Visualizing the data of provinces:

For Cases

Reading the Geojson file and creating a plain map of Nepal:

Defining a Choropleth map in a plain map created above:

Map of Nepal as seen for Cases in various Provinces:

Similarly, we can visualize map for Recovered and Deaths in various Provinces of Nepal as follow:

Map for recovered:


Map for deaths:


Creating a dataframe of districts only:


1. Creating a new dataframe for districts called 'df_dist' by concatenating dataframes df and df_prov, and removing the duplicates rows between them.

df_dist=pd.concat([df, df_backup]).drop_duplicates(keep=False)

2. Renaming the column 'Location' to 'District'. 

df_dist.rename(columns={"Location":"District"}, inplace=True)

3. Resetting the index of dataframe df_dist.

df_dist.reset_index(drop=True, inplace=True)

4. Final dataframe of districts i.e. df_dist looks like this:

Visualizing data of districts:

For cases:


Map for cases:

Map for recovered:

Map for deaths:

Get the Github link here.