Introduction

As part of this project, will try to compare the levels of CO2 with deaths atrributed due to Pollution and Respiratory diseases. The data source is from UN (http://data.un.org/Default.aspx) regarding pollution, CO2 levels and respiratory diseases across the nations of the world.
The data files sets come in separate files, the files were merged and munged using appropriate packages and functions in R. Based on the years for which both the data sets have data, information will be queried, sorted, grouped and visualized for interpretations.

Data Mining and Graphing

Pollution and mortality data:

Mortality rate attributed to household and ambient air pollution.

Source:
http://data.un.org/Data.aspx?q=pollution&d=SDGs&f=series%3aSH_STA_AIRP

CO2 emmissions data:

CO2 emissions (metric tons per capita).Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.

Source:
http://databank.worldbank.org/data/reports.aspx?source=2&series=EN.ATM.CO2E.PC&country=

Chronic respiratory disease data:

Number of deaths attributed to chronic respiratory disease. Data available is for years 2000 and 2012 only.

Source:
http://data.un.org/Data.aspx?q=chronic+respiratory+disease+&d=SDGs&f=series%3aSH_DTH_CRESPD

Data sourcing and munging

# Loading Data and setting working Directory
setwd('C:/SI/CUNY-MS/Fall_2016/608/Module 6_Project')

# loading Carbondioxide Emissions data
CO2 <- read.csv("CO2.csv", header = TRUE, 
    sep = ",", check.names = FALSE)


# Transforming the Data set from a wide to long format so that there is a column for all the Years(2000-2012)
mCO2 <- melt(CO2, id = c("Reference_Area", "Reference_Area_Code"))

# Rename the newly created columns 
names(mCO2)[3]<-"Year"
names(mCO2)[4]<-"CO2_Metric_Tons_Per_Capita"


# loading Pollution data. Mortality rate attributed to household and ambient air pollution

pollution <- read.csv("Pollution.csv", header = TRUE, 
    sep = ",", check.names = FALSE)


# loading respiratory disease deaths data
respiratory <- read.csv("Resp_Disease.csv", header = TRUE, 
    sep = ",", check.names = FALSE)


# Merging the three data sets in to one and filter for Years 2000 and 2012 only

df1 <- sqldf("SELECT mCO2.Reference_Area, mCO2.Reference_Area_Code, mCO2.Year, mCO2.CO2_Metric_Tons_Per_Capita, pollution.Pollution_Mortality_Per_100K_Pop FROM mCO2 LEFT JOIN pollution ON mCO2.Reference_Area_Code = pollution.Reference_Area_Code AND mCO2.Year = pollution.Year WHERE mCO2.Year IN (2000, 2012)")

df2 <- sqldf("SELECT df1.Reference_Area, df1.Reference_Area_Code, df1.Year, df1.CO2_Metric_Tons_Per_Capita, df1.Pollution_Mortality_Per_100K_Pop, respiratory.Resp_Disease_Death_Rate FROM df1 LEFT JOIN respiratory ON df1.Reference_Area_Code = respiratory.Reference_Area_Code AND df1.Year=respiratory.Year")

# Lets select the top 10 economies of the World for our anaysis
countries <- c('USA', 'CAN', 'CHN', 'JPN', 'UK', 'DEU', 'IND', 'ITA', 'BRA', 'FRA')
df_top <- filter(df2, df2$Reference_Area_Code %in% countries)

# The top 10 economies do not have Pollution data for year 2000. SO picking random 10 nations where we have data for 2000 and 2012
countries_2 <- c('LBR','MLI','MOZ','NER','PNG','RWA','SLE','NGA','SUR','TUR')
df_select <- filter(df2, df2$Reference_Area_Code %in% countries_2)


# Filter data for Years 2000 & 2012 only
df_top_2000 <- subset(df_top, Year==2000)
df_top_2012 <- subset(df_top, Year==2012)

# Filter for Year 2000 & 2012
df_select_2000 <- subset(df_select, Year==2000)
df_select_2012 <- subset(df_select, Year==2012)

1. GeoChart Views for Years 2000 and 2012

1.1 Top 10 economies with CO2 metric ton per capita and Respiratory disease death rate info.

MergedID17f88ca43380

Data: various • Chart ID: MergedID17f88ca43380 • googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts

1.2 Select 10 nations with CO2 metric ton per capita and Respiratory disease death rate info.

MergedID17f881631c96

Data: various • Chart ID: MergedID17f881631c96 • googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts

1.3 Select 10 nations with CO2 metric ton per capita and Pollutiom mortality info.

MergedID17f88543b2628

Data: various • Chart ID: MergedID17f88543b2628 • googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts

2. Plot Views

2.1 China, Brazil and India were the only nations where the CO2 levels have gone up.

Line Plot for the select 10 economies

2.3 The CO2 levels remained almost same for the underdeveloped nations, Turkey as a developing nation stands out.

2.3 Deaths attributed to Pollution came down for all these nations after a decade, for Turkey it was a drastic change.

2.4 Deaths tied to Respiratory Diseases were at same levels a decade later, Turkey saw an increase.

3. Bubble Chart Views

3.1 For the top 10 economies, India and China stand out where as the levels almost remained same over the decade for other nations here.
MergedID17f8842f0e

Data: various • Chart ID: MergedID17f8842f0e • googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts
3.2 For the select 10 nations, Turkey as a developing nation shows progress in reducing the number of deaths attributed to Pollution.
MergedID17f883ccb7d4a

Data: various • Chart ID: MergedID17f883ccb7d4a • googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts

4. Density Plot Views

4.1 The density curve for CO2 rate levels look same for years 2000 and 2012. However the maximum per-capita rates reduced from 20 to 16.

4.2 The density curve for Respiratory disease death rates look same for years 2000 and 2012. However the maximum rates reduced from around 1500 to 1250.

Summary/Conclusion

- As seen from the above visualizations there is no direct corelation between the CO2 levels and deaths tied to Pollution and Respiratory diseases.
- We did see some indicators where developing nations showed increase in CO2 levels and corresponding increases in death rates. however there were some instances like Turkey where the death rates attributed to pollution came down over the years.
- The data was not exhaustive across the board, we did not have data for Respiratory Diesease rekated deaths rates for years between 2000 and 2012.
- To make visualizations visible, the data sets were filtered for few nations like top economies and a random selection of 10 nations.
- For a thorough analysis and interpretation, a wider and detail visualization could be done as an extention or continuation in future.