Projects
The code below is the code for analyzing the Illiteracy Rate of the Vietnam Population in 2020.
This project looks the most interesting in my opinion so I will put it here.
My other projects can be found here (link)
Visualizing the illiteracy rate among provinces in Vietnam¶
In this notebook, I will visualize the literacy rate among provinces in Vietnam as the title shows. The data for the illiteracy rate and the geojson file for visualizing the map can be found at the reference section here
The data for the illiteracy rate is in the form of a csv file. The data is collected from the General Statistics Office of Vietnam. The data is collected from 2006 to early 2021. The data is collected from the age group of above 15 years.
It took me a really long time to make the merging of the geojson file and the dataframe of the illiteracy rate work. It is because the geojson file contains a lot of numbers, which can be really confusing and difficult to look at.
All in all, I made it work, it is not the best visualization, but it is a good start for me to learn more about the process of data analysis and visualization.
Importing necessary libraries¶
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import seaborn as sns
import pandas as pd
import geopandas as gpd
Loading the data¶
# Load the data
data_path = "illiteracy.csv"
data = pd.read_csv("illiteracy.csv", sep=";")
# Load the GeoJSON file containing the Vietnam boundaries
vietnam_geojson_path_1 = "vietnam.geojson"
vietnam_map = gpd.read_file(vietnam_geojson_path_1)
vietnam_map
cartodb_id | id_1 | name | slug | geometry | |
---|---|---|---|---|---|
0 | 17 | 33 | Kiên Giang | vietnam-kiengiang | MULTIPOLYGON (((105.40141 10.04024, 105.53898 ... |
1 | 62 | 49 | Quảng Ninh | vietnam-quangninh | MULTIPOLYGON (((106.53680 21.05216, 106.43977 ... |
2 | 47 | 11 | Bình Phước | vietnam-binhphuoc | MULTIPOLYGON (((106.75164 11.46867, 106.70483 ... |
3 | 44 | 12 | Bình Thuận | vietnam-binhthuan | MULTIPOLYGON (((107.50771 11.01104, 107.39390 ... |
4 | 26 | 18 | Cà Mau | vietnam-camau | MULTIPOLYGON (((105.26105 9.17828, 105.28011 9... |
... | ... | ... | ... | ... | ... |
58 | 39 | 56 | Thừa Thiên - Huế | vietnam-thuathienhue | MULTIPOLYGON (((107.57778 16.57250, 107.64472 ... |
59 | 46 | 57 | Thanh Hóa | vietnam-thanhhoa | MULTIPOLYGON (((105.17656 19.89632, 105.15601 ... |
60 | 55 | 52 | Sơn La | vietnam-sonla | MULTIPOLYGON (((104.64836 21.38327, 104.73264 ... |
61 | 41 | 47 | Quảng Nam | vietnam-quangnam | MULTIPOLYGON (((108.47603 15.67618, 108.59167 ... |
62 | 53 | 45 | Phú Yên | vietnam-phuyen | MULTIPOLYGON (((109.30447 13.12111, 109.41975 ... |
63 rows × 5 columns
vietnam_map_2
Name | Note | geometry | |
---|---|---|---|
0 | An Giang | NaN | MULTIPOLYGON (((105.18712 10.91317, 105.18719 ... |
1 | Ba Ria - Vung Tau | NaN | MULTIPOLYGON (((106.08110 8.57754, 106.08069 8... |
2 | Bac Giang | NaN | MULTIPOLYGON (((106.18304 21.60530, 106.18034 ... |
3 | Bac Kan | NaN | MULTIPOLYGON (((106.18298 22.38830, 106.18589 ... |
4 | Bac Lieu | NaN | MULTIPOLYGON (((105.37226 9.59691, 105.37281 9... |
... | ... | ... | ... |
60 | Yen Bai | NaN | MULTIPOLYGON (((104.84701 22.18620, 104.84696 ... |
61 | Da Nang | Hoang Sa (Da Nang city) | MULTIPOLYGON (((111.21578 15.77342, 111.20997 ... |
62 | Da Nang | NaN | MULTIPOLYGON (((107.91821 16.20767, 107.92228 ... |
63 | Khanh Hoa | Truong Sa (Khanh Hoa) | MULTIPOLYGON (((113.59087 6.95462, 113.58326 6... |
64 | Khanh Hoa | NaN | MULTIPOLYGON (((109.21996 11.77342, 109.21888 ... |
65 rows × 3 columns
63 provinces, that's correct, let's move on
data.name
0 Hà Nội 1 Vĩnh Phúc 2 Bắc Ninh 3 Quảng Ninh 4 Hải Dương ... 58 Cần Thơ 59 Hậu Giang 60 Sóc Trăng 61 Bạc Liêu 62 Cà Mau Name: name, Length: 63, dtype: object
full_data.name
0 Hà Nội 1 Vĩnh Phúc 2 Bắc Ninh 3 Quảng Ninh 4 Hải Dương ... 58 Cần Thơ 59 Hậu Giang 60 Sóc Trăng 61 Bạc Liêu 62 Cà Mau Name: name, Length: 63, dtype: object
The dataframe from the csv file also has the correct number of provinces. Now I'll try to have a look at the map plot before merging two dataframes together
vietnam_map.plot()
plt.show()
print(vietnam_map.shape)
print(data.shape)
merged = vietnam_map.merge(data) # Merging the dataframes
merged
(63, 5) (63, 2)
cartodb_id | id_1 | name | slug | geometry | 2020 | |
---|---|---|---|---|---|---|
0 | 17 | 33 | Kiên Giang | vietnam-kiengiang | MULTIPOLYGON (((105.40141 10.04024, 105.53898 ... | 91.58 |
1 | 62 | 49 | Quảng Ninh | vietnam-quangninh | MULTIPOLYGON (((106.53680 21.05216, 106.43977 ... | 97.06 |
2 | 47 | 11 | Bình Phước | vietnam-binhphuoc | MULTIPOLYGON (((106.75164 11.46867, 106.70483 ... | 94.29 |
3 | 44 | 12 | Bình Thuận | vietnam-binhthuan | MULTIPOLYGON (((107.50771 11.01104, 107.39390 ... | 94.45 |
4 | 26 | 18 | Cà Mau | vietnam-camau | MULTIPOLYGON (((105.26105 9.17828, 105.28011 9... | 96.24 |
... | ... | ... | ... | ... | ... | ... |
58 | 39 | 56 | Thừa Thiên - Huế | vietnam-thuathienhue | MULTIPOLYGON (((107.57778 16.57250, 107.64472 ... | 93.09 |
59 | 46 | 57 | Thanh Hóa | vietnam-thanhhoa | MULTIPOLYGON (((105.17656 19.89632, 105.15601 ... | 96.92 |
60 | 55 | 52 | Sơn La | vietnam-sonla | MULTIPOLYGON (((104.64836 21.38327, 104.73264 ... | 80.67 |
61 | 41 | 47 | Quảng Nam | vietnam-quangnam | MULTIPOLYGON (((108.47603 15.67618, 108.59167 ... | 95.72 |
62 | 53 | 45 | Phú Yên | vietnam-phuyen | MULTIPOLYGON (((109.30447 13.12111, 109.41975 ... | 94.50 |
63 rows × 6 columns
merged_full = vietnam_map.merge(full_data)
merged_full
cartodb_id | id_1 | name | slug | geometry | 2006 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | Early 2021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17 | 33 | Kiên Giang | vietnam-kiengiang | MULTIPOLYGON (((105.40141 10.04024, 105.53898 ... | 89.9 | 91.3 | 92.5 | 92.6 | 92.3 | 91.7 | 91.9 | 91.3 | 90.7 | 91.4 | 90.9 | 93.4 | 91.58 | 93.75 |
1 | 62 | 49 | Quảng Ninh | vietnam-quangninh | MULTIPOLYGON (((106.53680 21.05216, 106.43977 ... | 96.3 | 95.5 | 96.9 | 96.9 | 95.5 | 95.5 | 97.2 | 96.8 | 97.4 | 97.1 | 96.8 | 97.0 | 97.06 | 97.15 |
2 | 47 | 11 | Bình Phước | vietnam-binhphuoc | MULTIPOLYGON (((106.75164 11.46867, 106.70483 ... | 94.3 | 91.8 | 94.1 | 94.9 | 92.8 | 93.3 | 92.7 | 93.4 | 95.3 | 92.9 | 91.1 | 93.8 | 94.29 | 93.44 |
3 | 44 | 12 | Bình Thuận | vietnam-binhthuan | MULTIPOLYGON (((107.50771 11.01104, 107.39390 ... | 93.8 | 92.0 | 92.8 | 93.9 | 93.3 | 93.9 | 93.0 | 93.3 | 93.7 | 94.0 | 93.2 | 94.6 | 94.45 | 95.09 |
4 | 26 | 18 | Cà Mau | vietnam-camau | MULTIPOLYGON (((105.26105 9.17828, 105.28011 9... | 96.1 | 95.5 | 95.9 | 95.7 | 96.0 | 95.5 | 96.0 | 95.6 | 95.3 | 95.8 | 96.4 | 96.6 | 96.24 | 96.07 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
58 | 39 | 56 | Thừa Thiên - Huế | vietnam-thuathienhue | MULTIPOLYGON (((107.57778 16.57250, 107.64472 ... | 91.7 | 91.3 | 91.0 | 92.1 | 93.5 | 92.9 | 92.5 | 92.6 | 93.2 | 92.5 | 92.2 | 94.0 | 93.09 | 93.65 |
59 | 46 | 57 | Thanh Hóa | vietnam-thanhhoa | MULTIPOLYGON (((105.17656 19.89632, 105.15601 ... | 95.5 | 95.4 | 94.5 | 95.3 | 95.4 | 95.6 | 95.8 | 96.0 | 95.9 | 96.3 | 95.9 | 97.3 | 96.92 | 96.69 |
60 | 55 | 52 | Sơn La | vietnam-sonla | MULTIPOLYGON (((104.64836 21.38327, 104.73264 ... | 80.4 | 76.4 | 77.0 | 78.3 | 74.9 | 75.5 | 75.8 | 77.7 | 77.1 | 76.6 | 77.5 | 78.9 | 80.67 | 80.32 |
61 | 41 | 47 | Quảng Nam | vietnam-quangnam | MULTIPOLYGON (((108.47603 15.67618, 108.59167 ... | 93.9 | 94.6 | 92.1 | 93.7 | 95.1 | 94.9 | 94.5 | 94.8 | 95.3 | 95.7 | 95.0 | 96.2 | 95.72 | 95.84 |
62 | 53 | 45 | Phú Yên | vietnam-phuyen | MULTIPOLYGON (((109.30447 13.12111, 109.41975 ... | 95.1 | 94.1 | 94.9 | 94.1 | 95.2 | 95.2 | 93.5 | 93.2 | 93.7 | 93.8 | 93.2 | 94.7 | 94.50 | 95.04 |
63 rows × 19 columns
Actually plotting the map with annotations¶
A static map which show statistics in 2020¶
merged.plot(column = '2020', legend=True, legend_kwds={"label": "Literacy rate (%)", "orientation": "vertical"}, cmap="ocean", figsize=(15, 15))
plt.title("Literacy Rate of 15+ Population in Vietnamese Provinces in 2020")
plt.xticks([])
plt.yticks([])
for idx, row in merged.iterrows():
plt.annotate(text=row['name'], xy=row['geometry'].centroid.coords[0],horizontalalignment='center', fontsize=5, color='black')
plt.figsize=(100, 100)
plt.show()