Introduction

As part of this project, will try to compare the levels of CO2 with deaths atrributed due to Pollution and Respiratory diseases. The data source is from UN (http://data.un.org/Default.aspx) regarding pollution, CO2 levels and respiratory diseases across the nations of the world.
The data files sets come in separate files, the files were merged and munged using appropriate packages and functions in R. Based on the years for which both the data sets have data, information will be queried, sorted, grouped and visualized for interpretations.

Data Mining and Graphing

Pollution and mortality data:

Mortality rate attributed to household and ambient air pollution.

Source:
http://data.un.org/Data.aspx?q=pollution&d=SDGs&f=series%3aSH_STA_AIRP

CO2 emmissions data:

CO2 emissions (metric tons per capita).Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.

Source:
http://databank.worldbank.org/data/reports.aspx?source=2&series=EN.ATM.CO2E.PC&country=

Chronic respiratory disease data:

Number of deaths attributed to chronic respiratory disease. Data available is for years 2000 and 2012 only.

Source:
http://data.un.org/Data.aspx?q=chronic+respiratory+disease+&d=SDGs&f=series%3aSH_DTH_CRESPD

Data sourcing and munging

# Loading Data and setting working Directory
setwd('C:/SI/CUNY-MS/Fall_2016/608/Module 6_Project')

# loading Carbondioxide Emissions data
CO2 <- read.csv("CO2.csv", header = TRUE, 
    sep = ",", check.names = FALSE)


# Transforming the Data set from a wide to long format so that there is a column for all the Years(2000-2012)
mCO2 <- melt(CO2, id = c("Reference_Area", "Reference_Area_Code"))

# Rename the newly created columns 
names(mCO2)[3]<-"Year"
names(mCO2)[4]<-"CO2_Metric_Tons_Per_Capita"


# loading Pollution data. Mortality rate attributed to household and ambient air pollution

pollution <- read.csv("Pollution.csv", header = TRUE, 
    sep = ",", check.names = FALSE)


# loading respiratory disease deaths data
respiratory <- read.csv("Resp_Disease.csv", header = TRUE, 
    sep = ",", check.names = FALSE)


# Merging the three data sets in to one and filter for Years 2000 and 2012 only

df1 <- sqldf("SELECT mCO2.Reference_Area, mCO2.Reference_Area_Code, mCO2.Year, mCO2.CO2_Metric_Tons_Per_Capita, pollution.Pollution_Mortality_Per_100K_Pop FROM mCO2 LEFT JOIN pollution ON mCO2.Reference_Area_Code = pollution.Reference_Area_Code AND mCO2.Year = pollution.Year WHERE mCO2.Year IN (2000, 2012)")

df2 <- sqldf("SELECT df1.Reference_Area, df1.Reference_Area_Code, df1.Year, df1.CO2_Metric_Tons_Per_Capita, df1.Pollution_Mortality_Per_100K_Pop, respiratory.Resp_Disease_Death_Rate FROM df1 LEFT JOIN respiratory ON df1.Reference_Area_Code = respiratory.Reference_Area_Code AND df1.Year=respiratory.Year")

# Lets select the top 10 economies of the World for our anaysis
countries <- c('USA', 'CAN', 'CHN', 'JPN', 'UK', 'DEU', 'IND', 'ITA', 'BRA', 'FRA')
df_top <- filter(df2, df2$Reference_Area_Code %in% countries)

# The top 10 economies do not have Pollution data for year 2000. SO picking random 10 nations where we have data for 2000 and 2012
countries_2 <- c('LBR','MLI','MOZ','NER','PNG','RWA','SLE','NGA','SUR','TUR')
df_select <- filter(df2, df2$Reference_Area_Code %in% countries_2)


# Filter data for Years 2000 & 2012 only
df_top_2000 <- subset(df_top, Year==2000)
df_top_2012 <- subset(df_top, Year==2012)

# Filter for Year 2000 & 2012
df_select_2000 <- subset(df_select, Year==2000)
df_select_2012 <- subset(df_select, Year==2012)

1. GeoChart Views for Years 2000 and 2012

1.1 Top 10 economies with CO2 metric ton per capita and Respiratory disease death rate info.

MergedID17f88ca43380

Data: various • Chart ID: MergedID17f88ca43380googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts

1.2 Select 10 nations with CO2 metric ton per capita and Respiratory disease death rate info.

MergedID17f881631c96

Data: various • Chart ID: MergedID17f881631c96googleVis-0.6.1
R version 3.3.1 (2016-06-21) • Google Terms of Use • Data Policy: See individual charts

1.3 Select 10 nations with CO2 metric ton per capita and Pollutiom mortality info.

MergedID17f88543b2628