Downloading Data
Before you download any data, it may be useful to access some key metadata. The following functions are available to do so:
import_eea_stations()
accesses data flow D (station metadata).import_eea_pollutants()
accesses pollutant names and identifiers.import_eea_countries()
accesses country names and identifiers.import_eea_cities()
accesses city names in a given country or countries.
Each of these functions will return R data frames, making them convenient for further analysis and filtering.
To actually download data, you may use
import_eea_monitoring()
. This function has four
arguments:
countries
, requiring a vector of code(s) fromimport_eea_countries()
.cities
, requiring a vector of code(s) fromimport_eea_cities()
.pollutants
, requiring a vector of code(s) fromimport_eea_pollutants()
.dataset
, requiring a value of either1
(unratified, up-to-date data from data flow E2a),2
(ratified data from data flow E1a) or3
(Historical Airbase data).
It is recommended that users provide at least one country, as leaving
this as NULL
is likely to import more data than a typical R
session can handle.
import_eea_monitoring()
is quite an opinionated
function; it uses a specific method to acquire data, renames columns to
convenient English names, and binds on commonly useful metadata columns
from import_eea_stations()
. If a user wants more
flexibility, they may wish to access the EEAβs API directly.
Data Access via API Endpoints
For more flexible data access, including asynchronous data access, there are three options, mirroring the three options outlined in https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf. Each of these has options to control where data is imported from (countries & cities), specific pollutants to import, and the datasets of origin. Two of the three also allow for users to specify a start and end date, as well as an aggregation type.
-
download_eea_parquet_files()
will download a zip folder of parquet files to a user-specified file, returning the file path.
download_eea_parquet_files(file = "mydata.zip")
zip::unzip(zipfile = "mydata.zip", exdir = "parquets")
paths <- dir("parquets", recursive = TRUE, full.names = TRUE)
tables <- purrr::map(.x = paths, .f = arrow::read_parquet)
aqdata <- dplyr::bind_rows(tables)
-
download_eea_parquet_async()
will initiate the construction of a zip folder of parquet files and returns a URL at which a zip file will be eventually available.
zippath <- download_eea_parquet_async()
# wait for zip to be populated
download.file(url = zippath, destfile = "mydata.zip")
zip::unzip(zipfile = "mydata.zip", exdir = "parquets")
paths <- dir("parquets", recursive = TRUE, full.names = TRUE)
tables <- purrr::map(.x = paths, .f = arrow::read_parquet)
aqdata <- dplyr::bind_rows(tables)
-
download_eea_parquet_urls()
will return an R character vector of a list of URLs at which individual parquet files can be found. This function ignores the datetime and aggregation type arguments.
urls <- download_eea_parquet_urls()
tables <- purrr::map(.x = urls, .f = arrow::read_parquet)
aqdata <- dplyr::bind_rows(tables)