Introduction
According to majority of scientific data, climate change is directly related to the amount of CO2 in the atmosphere, more pollution the more CO2 is produced and contaminating the air we breathe besides creating global warming and extreme weather conditions. Chronic respiratory diseases are diseases of the airways and other structures of the lung. Some of the most common are: asthma, chronic obstructive pulmonary disease, occupational lung diseases and pulmonary hypertension. Approximately fifteen million Americans have been diagnosed with COPD.
As part of this project, will try to compare the levels of CO2 with deaths atrributed due to Pollution and Respiratory diseases. The data source is from UN (http://data.un.org/Default.aspx) regarding pollution, CO2 levels and respiratory diseases across the nations of the world.
The data files sets come in separate files, the files were merged and munged using appropriate packages and functions in R. Based on the years for which both the data sets have data, information will be queried, sorted, grouped and visualized for interpretations.
Data Mining and Graphing
Pollution and mortality data:
Chronic respiratory disease data:
Data sourcing and munging
# Loading Data and setting working Directory
setwd('C:/SI/CUNY-MS/Fall_2016/608/Module 6_Project')
# loading Carbondioxide Emissions data
CO2 <- read.csv("CO2.csv", header = TRUE,
sep = ",", check.names = FALSE)
# Transforming the Data set from a wide to long format so that there is a column for all the Years(2000-2012)
mCO2 <- melt(CO2, id = c("Reference_Area", "Reference_Area_Code"))
# Rename the newly created columns
names(mCO2)[3]<-"Year"
names(mCO2)[4]<-"CO2_Metric_Tons_Per_Capita"
# loading Pollution data. Mortality rate attributed to household and ambient air pollution
pollution <- read.csv("Pollution.csv", header = TRUE,
sep = ",", check.names = FALSE)
# loading respiratory disease deaths data
respiratory <- read.csv("Resp_Disease.csv", header = TRUE,
sep = ",", check.names = FALSE)
# Merging the three data sets in to one and filter for Years 2000 and 2012 only
df1 <- sqldf("SELECT mCO2.Reference_Area, mCO2.Reference_Area_Code, mCO2.Year, mCO2.CO2_Metric_Tons_Per_Capita, pollution.Pollution_Mortality_Per_100K_Pop FROM mCO2 LEFT JOIN pollution ON mCO2.Reference_Area_Code = pollution.Reference_Area_Code AND mCO2.Year = pollution.Year WHERE mCO2.Year IN (2000, 2012)")
df2 <- sqldf("SELECT df1.Reference_Area, df1.Reference_Area_Code, df1.Year, df1.CO2_Metric_Tons_Per_Capita, df1.Pollution_Mortality_Per_100K_Pop, respiratory.Resp_Disease_Death_Rate FROM df1 LEFT JOIN respiratory ON df1.Reference_Area_Code = respiratory.Reference_Area_Code AND df1.Year=respiratory.Year")
# Lets select the top 10 economies of the World for our anaysis
countries <- c('USA', 'CAN', 'CHN', 'JPN', 'UK', 'DEU', 'IND', 'ITA', 'BRA', 'FRA')
df_top <- filter(df2, df2$Reference_Area_Code %in% countries)
# The top 10 economies do not have Pollution data for year 2000. SO picking random 10 nations where we have data for 2000 and 2012
countries_2 <- c('LBR','MLI','MOZ','NER','PNG','RWA','SLE','NGA','SUR','TUR')
df_select <- filter(df2, df2$Reference_Area_Code %in% countries_2)
# Filter data for Years 2000 & 2012 only
df_top_2000 <- subset(df_top, Year==2000)
df_top_2012 <- subset(df_top, Year==2012)
# Filter for Year 2000 & 2012
df_select_2000 <- subset(df_select, Year==2000)
df_select_2012 <- subset(df_select, Year==2012)
1. GeoChart Views for Years 2000 and 2012
1.1 Top 10 economies with CO2 metric ton per capita and Respiratory disease death rate info.
MergedID17f88ca43380
1.2 Select 10 nations with CO2 metric ton per capita and Respiratory disease death rate info.
MergedID17f881631c96
1.3 Select 10 nations with CO2 metric ton per capita and Pollutiom mortality info.
MergedID17f88543b2628
2. Plot Views
2.1 China, Brazil and India were the only nations where the CO2 levels have gone up.
Line Plot for the select 10 economies
2.3 The CO2 levels remained almost same for the underdeveloped nations, Turkey as a developing nation stands out.
2.3 Deaths attributed to Pollution came down for all these nations after a decade, for Turkey it was a drastic change.
2.4 Deaths tied to Respiratory Diseases were at same levels a decade later, Turkey saw an increase.
3. Bubble Chart Views
3.1 For the top 10 economies, India and China stand out where as the levels almost remained same over the decade for other nations here.
MergedID17f8842f0e
3.2 For the select 10 nations, Turkey as a developing nation shows progress in reducing the number of deaths attributed to Pollution.
MergedID17f883ccb7d4a
Summary/Conclusion
- As seen from the above visualizations there is no direct corelation between the CO2 levels and deaths tied to Pollution and Respiratory diseases.
- We did see some indicators where developing nations showed increase in CO2 levels and corresponding increases in death rates. however there were some instances like Turkey where the death rates attributed to pollution came down over the years.
- The data was not exhaustive across the board, we did not have data for Respiratory Diesease rekated deaths rates for years between 2000 and 2012.
- To make visualizations visible, the data sets were filtered for few nations like top economies and a random selection of 10 nations.
- For a thorough analysis and interpretation, a wider and detail visualization could be done as an extention or continuation in future.