Gapminder Analysis

gapminder country comparison

Aim: to analyse the data analyst’s favourite data set: gapminder. This is my first serious R coding.

The famous gapminder dataset has data on life expectancy, population, and GDP per capita for 142 countries from 1952 to 2007.

glimpse(gapminder) 
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
head(gapminder, 20) # look at the first 20 rows of the dataframe
## # A tibble: 20 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## 11 Afghanistan Asia       2002    42.1 25268405      727.
## 12 Afghanistan Asia       2007    43.8 31889923      975.
## 13 Albania     Europe     1952    55.2  1282697     1601.
## 14 Albania     Europe     1957    59.3  1476505     1942.
## 15 Albania     Europe     1962    64.8  1728137     2313.
## 16 Albania     Europe     1967    66.2  1984060     2760.
## 17 Albania     Europe     1972    67.7  2263554     3313.
## 18 Albania     Europe     1977    68.9  2509048     3533.
## 19 Albania     Europe     1982    70.4  2780097     3631.
## 20 Albania     Europe     1987    72    3075321     3739.

How did life expectancy has changed over the years for the country and the continent I come from?

country_data <- gapminder %>% 
            filter(country == "Turkey") 

head(country_data)
## # A tibble: 6 × 6
##   country continent  year lifeExp      pop gdpPercap
##   <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Turkey  Europe     1952    43.6 22235677     1969.
## 2 Turkey  Europe     1957    48.1 25670939     2219.
## 3 Turkey  Europe     1962    52.1 29788695     2323.
## 4 Turkey  Europe     1967    54.3 33411317     2826.
## 5 Turkey  Europe     1972    57.0 37492953     3451.
## 6 Turkey  Europe     1977    59.5 42404033     4269.
continent_data <- gapminder %>% 
            filter(continent == "Europe") #Turkey is registered under Europe before 2007

head(continent_data)
## # A tibble: 6 × 6
##   country continent  year lifeExp     pop gdpPercap
##   <fct>   <fct>     <int>   <dbl>   <int>     <dbl>
## 1 Albania Europe     1952    55.2 1282697     1601.
## 2 Albania Europe     1957    59.3 1476505     1942.
## 3 Albania Europe     1962    64.8 1728137     2313.
## 4 Albania Europe     1967    66.2 1984060     2760.
## 5 Albania Europe     1972    67.7 2263554     3313.
## 6 Albania Europe     1977    68.9 2509048     3533.
plot1 <- ggplot(data = country_data , mapping = aes(x = year, y = lifeExp))+
   geom_point() +
   geom_smooth(se = FALSE)+
   NULL 

plot1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

 plot1<- plot1 +
   labs(title = " Life Expactancy Over Time in Turkey ",
       x = " Year ",
       y = " Life Expectancy ") +
   NULL


 plot1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Producing a plot for all countries in the continent I come from.

 ggplot(continent_data, mapping = aes(x = year , y = lifeExp , colour = country, group = country)) +
   geom_point() + 
   geom_smooth(se = FALSE) +
   NULL
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Producing a life expectancy over time graph, grouped by continent.

 ggplot(gapminder , mapping = aes(x = year , y = lifeExp, colour = continent))+
   geom_point() +
   geom_smooth(se = FALSE) +
   facet_wrap(~continent) +
   theme(legend.position="none") + #remove all legends
   NULL
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

##Conclusions

Type your answer after this blockquote.

A positive life expectency trend is evident in all continents

First of all, generally speaking, for every, the life expectancy over years has been demonstrating a positive trend. This positive trend in life expectancy can be tied to the developments in medicine, biomedical engineering and technology in the world.

Africa’s stagnation after the 1990s

Africa’s graph is especially interesting since the life expectancy starts of with an average of 40 years of age, increases to 50 and shows a stagnation after 1990. Further research can be conducted to understand what happened in 1990, or how the resources have been negatively affected. As far as I know and after doing a very little confirmation research, Europe and Americas had been contributing to Africa’s well being by performing concerts and help packages before the 1990s and they had stopped doing so after the 90s. This might be one of the reasons and should be further investigated.

Increase in longevity from 1950s to 2007

Oceania is the continent with the highest longevity. Oceania starts off well with an average of 70 years of age and its average longevity is almost 80 or a bit more. Since there are not many countries in Oceania, its graph is fairly uniform and does not consist of many different data points. Europe is following Ocenia in longevity, starting off with 65 and increasing up to 80 years of age. Americas is the third in the longevity trend ranking, followed by Asia and Africa. The reason for the difference among the longevity between continents might be related to the overall GDP for each continent as well as the advances in medicine and technology.

The average approximated increase in each continent from 1950 to 2007 is as follows:

  • Africa: 40 to 54 -> increase by ~14
  • Americas: 54 to 74 -> increase by ~20
  • Asia 47 to 70 -> increase by ~23
  • Europe 63 to 77 -> increased by ~14
  • Oceania: 70 to 80 -> increased by ~10

Approximately, Asia and Americas are the continents that showed the most significant increase in longevity from the 1950s to 2007 compared to the other continents. This makes sense since the world have been witnessing a technology and production boom from the Americas and Asia for a very long time.

About the variations in each continent

Europe’s dataset is the one that caught my eye initially. There is not a very significant variance among the different European countries. But there seems to be an outlier country in the graph, and not surprisingly it is my country Turkey. It has been always debated whether Turkey belongs to Europe or Asia. Since this data set has been relatively not so new, it is recorded in Europe, as it was the case during my childhood before all the political unrest that started with the new government: aka Geopolitics. Since Turkey’s resources and the GDP are not as high as the other European countries, Turkey’s data points stick out. If the data recorders had kept recording data after the 2010s, Turkey would have been probably moved to Asia. And it might fit in well with the high variance of longevity in Asian countries. -Or we should all give up and create a Euroasia factor in the continent feature and add Turkey and Russia there. :-) -

Asia shows the highest variance among all the continents, since the GDP is not as uniform throughout the Asian countries unlike the European countries. Americans and Africas also show a high variance since there are many different countries with varying GDPs and technologies advancements in Africa, Americas and Asia.