Importing EEA AQ Data with {euroaq} • euroaq

Downloading Data

Before you download any data, it may be useful to access some key metadata. The following functions are available to do so:

import_eea_stations() accesses data flow D (station metadata).
import_eea_pollutants() accesses pollutant names and identifiers.
import_eea_countries() accesses country names and identifiers.
import_eea_cities() accesses city names in a given country or countries.

Each of these functions will return R data frames, making them convenient for further analysis and filtering.

To actually download data, you may use import_eea_monitoring(). This function has four arguments:

countries, requiring a vector of code(s) from import_eea_countries().
cities, requiring a vector of code(s) from import_eea_cities().
pollutants, requiring a vector of code(s) from import_eea_pollutants().
dataset, requiring a value of either 1 (unratified, up-to-date data from data flow E2a), 2 (ratified data from data flow E1a) or 3 (Historical Airbase data).

It is recommended that users provide at least one country, as leaving this as NULL is likely to import more data than a typical R session can handle.

import_eea_monitoring() is quite an opinionated function; it uses a specific method to acquire data, renames columns to convenient English names, and binds on commonly useful metadata columns from import_eea_stations(). If a user wants more flexibility, they may wish to access the EEA’s API directly.

Data Access via API Endpoints

For more flexible data access, including asynchronous data access, there are three options, mirroring the three options outlined in https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf. Each of these has options to control where data is imported from (countries & cities), specific pollutants to import, and the datasets of origin. Two of the three also allow for users to specify a start and end date, as well as an aggregation type.

download_eea_parquet_files() will download a zip folder of parquet files to a user-specified file, returning the file path.

download_eea_parquet_files(file = "mydata.zip")

zip::unzip(zipfile = "mydata.zip", exdir = "parquets")

paths <- dir("parquets", recursive = TRUE, full.names = TRUE)

tables <- purrr::map(.x = paths, .f = arrow::read_parquet)

aqdata <- dplyr::bind_rows(tables)

download_eea_parquet_async() will initiate the construction of a zip folder of parquet files and returns a URL at which a zip file will be eventually available.

zippath <- download_eea_parquet_async()

# wait for zip to be populated

download.file(url = zippath, destfile = "mydata.zip")

zip::unzip(zipfile = "mydata.zip", exdir = "parquets")

paths <- dir("parquets", recursive = TRUE, full.names = TRUE)

tables <- purrr::map(.x = paths, .f = arrow::read_parquet)

aqdata <- dplyr::bind_rows(tables)

download_eea_parquet_urls() will return an R character vector of a list of URLs at which individual parquet files can be found. This function ignores the datetime and aggregation type arguments.

urls <- download_eea_parquet_urls()

tables <- purrr::map(.x = urls, .f = arrow::read_parquet)

aqdata <- dplyr::bind_rows(tables)