Gasoline prices and income are ever-increasing by the year in today’s economy as a result of inflation. Through visualizations, we would like to examine the monthly and yearly trends in both quantities as well as how they both affect each other. It is anticipated that as the average yearly income of a region increases, the average yearly gasoline price of that region would also increase because the consumers would have more money to spend on gasoline for their vehicles. As a result of supply and demand, the gasoline price would increase. Through these visualizations, residents of New York State can decide which city would be the most economical for them to live in if any of them happen to have a vehicle or they can decide if New York State would be the most economical region in the country to live in with a vehicle.

First, the csv file was downloaded from https://data.ny.gov/Energy-Environment/ Gasoline-Retail-Prices-Weekly-Average-by-Region-Be/nqur-w4p7 and https://data.ny.gov/Economic-Development/Quarterly-Census-of-Employment-and-Wages-Annual-Da/shc7-xcbw. Then, the csv files were loaded into R. The zoo and lubridate packages were used for converting the date column in each package to year-month format or to year format.

Different commands in the dplyr package were used to do transformations on both datasets. The gasoline prices dataset was converted to numeric format and aggregated so that the quantities could be averaged by date.

On the other hand, the yearly salaries dataset was filtered and subsetted so that only the total salary from 2007 and beyond for all industries in each region was included and so that only the year, annual average salary, and the area name would be there to work with. This dataset had to be mutated so that the dollar signs would be removed from each salary value. Then all of the values had to be arranged by year and by region.

Unlike the gasoline prices dataset, the reshape2 package was additionally used to first convert the dataset to a molten data frame and then to split the mean salaries column by region. Then, the average salaries for the cities of interest were substituted with the average salaries for the counties that the cities are located in, i.e., for the average salary of Buffalo, the salary used was the salary of Eerie county because Buffalo is in Eerie county. The reason for this is because some cities are in the same vicinity.

Then, the tidyr package was used for both datasets to gather all of the values into 3 columns - Year/Date, Region, and Average. Then, the ggplot2 package was used for both datasets to render barcharts of average salary and average gasoline prices faceted by region.

Additionally, to render scatterplots correlating gasoline prices with income, the ggpmisc package was used. First, the column containing the average annual salaries was binded with the dataset containing the gasoline prices averaged by year and region. Then, scatterplots correlating the two variables were rendered, faceted by region along with the appropriate correlation equations.

In the following bar charts depicting monthly trend in average gasoline prices, the trend appears the same for all 8 regions of New York State - Albany, Binghamton, Buffalo, Nassau, New York City, Rochester, Syracuse, and Utica.

## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

In the following bar charts depicting yearly trend in average gasoline prices, the trend appears the same for all 8 regions of New York State - Albany, Binghamton, Buffalo, Nassau, New York City, Rochester, Syracuse, and Utica.

In the following barcharts, the yearly change in the average salary is depicted for each region. Generally, the change in average salary per year is minimal. It does increase but at a slow and constant rate.

In the following correlation plots all of the correlation coefficients are close to zero. Syracuse depicts the lowest correlation coefficient because the line of best fit is horizontal. However, New York City has the steepest line of best fit, therefore implying the highest correlation coefficient between average annual income and average gasoline price.

Therefore, if any resident living in New York State wants to have a vehicle, the best region to live in would be Syracuse.

However, for residents living in New York City, a vehicle is not necessary because frequent public transportation is available.