Making use of vintage data

Data vintages are necessary when assessing the real-time performance of data analytics. Whether this may be to back-test a model or to look at whether a historical decision was the correct one, having the information that was available at the time, is essential.

EconData has been designed in order to make this kind of analysis possible. In this tutorial we will cover how to access the various vintages of GDP and how to construct a real-time time series of GDP growth.

As per usual we start by loading the necessary packages. If this is your first time using the EconData R package, you can find the installation instructions here.


Next, we create a helper function that converts the output of the EconData R package to dplyr tibble format.

format_gdp <- function(x) {
  as.yearqtr <- function(y)
    paste0(format(y, format = "%Y"), quarters(y))
  tibble(Value = x$KBP6006.R.S$OBS_VALUE) %>%
    mutate(Date = x$KBP6006.R.S %>%
           rownames() %>%
           as.Date() %>%
           as.yearqtr()) %>%
  mutate(Vintage = release$Description)

We then return all the vintages/releases of GDP from EconData. As usual, more information on the read_release function can be found by typing ?read_release in the R console.

releases <- read_release(id = "NATL_ACC",
                         agencyid = "ECONDATA",
                         version = "all",
                         provideragencyid = "ECONDATA",
                         providerid = "STATSSA")

This code contains a nested list. The parent list contains each of the versions of the StatsSA data - there is a version for each base year of GDP. For each version/base year there is a child list detailing the releases for that version. We will loop over both of these lists in order to extract all the releases. As we iterate through the loop we will request the data for that release from EconData and compile the results into a tibble using our previously defined helper function.

for (dataset in releases) {
  for(release in dataset$Releases) {
    natl_acc <- read_econdata(agencyid = "ECONDATA",
                              id = "NATL_ACC",
                              version = dataset$DataSet$Flowref[[3]],
                              key = "KBP6006.R.S",
                              provideragencyid = "ECONDATA",
                              providerid = "STATSSA",
                              releasedescription = release$Description)
    gdp_vintages <- rbind(gdp_vintages, format_gdp(natl_acc))

Finally, we pivot the data into columns and save the result as a CSV.

write_csv(gdp_vintages %>%
          arrange(Vintage, Date) %>%
          pivot_wider(id_cols = everything(),
                      names_from = Vintage,
                      values_from = Value),
          "Realtime GDP.csv")

As usual all the code in this blog post can be found on Github along with the code for the previous posts. We hope that you found this slightly more advanced tutorial useful and that you get value from EconData's vintage data.

The Codera Analytics team

Leave a comment

Your email address will not be published. Required fields are marked *