3 Accessing UK Air Quality Data
3.1 Accessing data
The UK has a surprisingly large amount of air quality data that is publicly accessible. The main UK AURN archive and regional (England, Scotland, Wales and Northern Ireland) together with Imperial College London’s London Air Quality Network (LAQN) are important and large databases of information that allow free public access. Storing and managing data in this way has many advantages including consistent data format, and underlying high quality methods to process and store the data.
openair has a core function that provides users with extensive access to UK air quality data; importUKAQ(). WSP (formerly Ricardo) has provided .RData files (R workspaces) for several important air quality networks in the UK. These files are updated on a daily basis. This approach requires a link to the Internet to work. The work of Trevor Davies at WSP is greatly appreciated in making all the data available. The networks available through importUKAQ() include:
The UK national network, the Automatic Urban and Rural Network. This is the main UK network.
-
The UK “devolved” networks:
Air Quality Scotland network.
Air Quality Wales network.
Air Quality England network of sites.
Northern Ireland network of sites.
Locally managed AQ networks in England. These are sites operated in most cases by Local Authorities but may also include monitoring from other programmes, industry and airports. The location and purpose of these sites differs from the national network which is governed by strict rules of the air quality directives. As a result there is a broad range of site types, equipment and data quality practices. For more information see here. These data represent information from about 15 different local air quality networks.
Also available is importImperial() for accessing data from the sites operated by Imperial College London1, primarily including the The London Air Quality Network.
When analysing UK air quality data, it may be useful to compare it to data from the UK’s neighbours in Europe. Historically, the saqgetr allowed for access to European monitoring data, and openair provided the importEurope() function for a simplified interface for the same data source. Unfortunately, as of February 2024, this service is no longer supported.
The European Environment Agency (EEA) now provides an air quality download service (https://eeadmz1-downloads-webapp.azurewebsites.net/), which includes an API. The openair family now includes the euroaq package, which allows you to easily access this through R - see https://openair-project.github.io/euroaq/ for more information.
Many users download hourly data from GOV UK. This is a useful facility but does have some limitations and frustrations, many of which are overcome using openair to download the data:
It is quick to select a range of sites, pollutants and periods, and it is possible to download several long time series in one go.
Storing the data as .RData objects is very efficient as they are about four times smaller than .csv files (which are already small). This means the data downloads quickly and saves bandwidth.
The function completely avoids any need for data manipulation or setting time formats, time zones etc.
Some examples of usage are shown below. First, lets load the packages we need.
3.2 Site Meta Data
3.2.1 National networks
The first question is, what sites are available and what do they measure? Users can access the details of air pollution monitoring sites using the importMeta() function. The user only needs to provide the network name and (optionally) whether all data should be returned and whether certain periods should be considered. By default only site type, latitude and longitude are returned.
aurn_meta <- importMeta(source = "aurn")
aurn_meta# A tibble: 324 × 6
source site code latitude longitude site_type
<chr> <chr> <chr> <dbl> <dbl> <chr>
1 aurn Aberdeen ABD 57.2 -2.09 Urban Backgro…
2 aurn Aberdeen Erroll Park ABD9 57.2 -2.09 Urban Backgro…
3 aurn Aberdeen Union Street Roadside ABD7 57.1 -2.11 Urban Traffic
4 aurn Aberdeen Wellington Road ABD8 57.1 -2.09 Urban Traffic
5 aurn Armagh Roadside ARM6 54.4 -6.65 Urban Traffic
6 aurn Aston Hill AH 52.5 -3.03 Rural Backgro…
7 aurn Auchencorth Moss ACTH 55.8 -3.24 Rural Backgro…
8 aurn Aylesbury A4157 AYLA 51.8 -0.794 Urban Traffic
9 aurn Ballymena Antrim Road BAAR 54.9 -6.27 Urban Traffic
10 aurn Ballymena Ballykeel BALM 54.9 -6.25 Urban Backgro…
# ℹ 314 more rows
Or return much more detailed data that includes which pollutants are measured at each site and site start / end dates. The option all = TRUE should be added.
aurn_meta <- importMeta(source = "aurn", all = TRUE)
# what comes back?
glimpse(aurn_meta)Rows: 3,062
Columns: 14
$ source <chr> "aurn", "aurn", "aurn", "aurn", "aurn", "aurn", "aurn"…
$ code <chr> "ABD", "ABD", "ABD", "ABD", "ABD", "ABD", "ABD", "ABD"…
$ site <chr> "Aberdeen", "Aberdeen", "Aberdeen", "Aberdeen", "Aberd…
$ site_type <chr> "Urban Background", "Urban Background", "Urban Backgro…
$ latitude <dbl> 57.15736, 57.15736, 57.15736, 57.15736, 57.15736, 57.1…
$ longitude <dbl> -2.094278, -2.094278, -2.094278, -2.094278, -2.094278,…
$ variable <chr> "O3", "NO", "NO2", "NOx", "SO2", "CO", "PM10", "NV10",…
$ Parameter_name <chr> "Ozone", "Nitric oxide", "Nitrogen dioxide", "Nitrogen…
$ start_date <dttm> 2003-08-01, 1999-09-18, 1999-09-18, 1999-09-18, 2001-…
$ end_date <dttm> 2021-09-20, 2021-09-20, 2021-09-20, 2021-09-20, 2007-…
$ ratified_to <dttm> 2021-09-20, 2021-09-20, 2021-09-20, 2021-09-20, 2007-…
$ zone <chr> "North East Scotland", "North East Scotland", "North E…
$ agglomeration <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ local_authority <chr> "Aberdeen City", "Aberdeen City", "Aberdeen City", "Ab…
Note that importMeta() can import information for several networks at once e.g. source = c("aurn", "saqn"). Further, a convenience value for source is "ukaq", which returns metadata for all networks accessible via importUKAQ() (AURN, AQE, SAQN, WAQN, NI, and local networks).
Often it is useful to consider sites that were open in a particular year or were open for a duration of years. This can be done using the year argument. When year is a range such as year = 2010:2020, only sites that were open across that range of years will be returned. This option is especially useful for trend analysis when there might be an interest in extracting only sites that were measuring over the period of interest. Furthermore, if all = TRUE is used, supplying a year (or years) will select only specific pollutants that were measured during the period of interest.2
For example, to check the number of sites that were open from 2010 to 2022 in the AURN and SAQN combined:
sites_2010_2022 <- importMeta(
source = c("aurn", "saqn"),
year = 2010:2022
)
nrow(sites_2010_2022)[1] 141
importMeta() also supports filtering by pollutant directly, without needing to filter the returned data frame manually. The pollutant argument accepts one or more pollutant names (case-insensitive), and "hc" can be used as a shorthand for all hydrocarbons. For example, to find all Urban Traffic sites on the AURN that measure NO2:
no2_sites <- importMeta(
source = "aurn",
all = TRUE,
pollutant = "no2",
site_type = "Urban Traffic"
)
nrow(no2_sites)[1] 93
Additional filtering arguments include site (pattern-matched site name), code (exact site code), and duplicate (set TRUE to retain sites that appear in multiple networks rather than de-duplicating them).
importMeta() as a way to select sites to import
One of the most useful aspects of importMeta() is to use it as a basis to identify site codes to then import data. For example, to import data from the AURN for sites that have been in operation from 2005 to 2020:
sites_2005_2020 <- importMeta(
source = "aurn",
year = 2005:2020
)
all_aq_data <- importUKAQ(
site = sites_2005_2020$code,
source = sites_2005_2020$source,
year = 2005:2020
)3.2.2 Finding sites near a location
importMeta() can also find sites near a specific location by supplying lat and lng coordinates. The returned data will include a distance_km column and be sorted by distance. This is useful when assessing pollution exposure at a receptor (e.g., a school or hospital) or when looking for monitors near a potential pollution source (e.g., an industrial installation or quarry).
The max_dist argument limits results to sites within a given radius (in km), and max_n caps the number of sites returned. These can be combined to find, say, the four nearest sites within 5 km. openairmaps provides the convertPostcode() function as a convenient way to obtain coordinates from a UK postcode. The below example finds the four closest monitoring stations within 5km of Great Ormond Street Hospital, a famous children’s hospital in London.
hospital_coords <- list(
lat = 51.52221,
lng = -0.119898
)
importMeta(
source = c("aurn", "aqe"),
year = 2022,
lat = hospital_coords$lat,
lng = hospital_coords$lng,
max_dist = 2,
max_n = 4
)# A tibble: 4 × 7
source site code latitude longitude site_type distance_km
<chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 aurn London Bloomsbury CLL2 51.5 -0.126 Urban Ba… 0.415
2 aqe Camden - Euston Road CD009 51.5 -0.129 Urban Tr… 0.888
3 aqe Westminster - Covent Ga… WMS05 51.5 -0.122 Urban Ba… 1.13
4 aqe Farringdon Street CT2 51.5 -0.105 Urban Tr… 1.35
3.2.3 Plot Sites on a Map
To easily visualise entire monitoring networks, consider using the openairmaps R package. This package can be installed from CRAN, similar to openair.
install.packages("openairmaps")This package contains the networkMap() function which acts as a wrapper around importMeta() and returns a detailed map, with many options for customisation. For example, sites can be clustered together to avoid clutter, and an optional “control menu” can be added to filter for certain sites (e.g., different site types, shown below). All of the filtering arguments described above — year, pollutant, site_type, lat, lng, max_dist, max_n — can be passed directly to networkMap() and are forwarded on to importMeta().
library(openairmaps)
networkMap(source = c("aurn", "aqe"), control = "site_type")openairmaps package.
When spatial filtering is applied, networkMap() clearly shows the ‘target’ coordinate as a white marker and, if relevant, the max_dist radius as a circle.
networkMap(
source = c("aurn", "aqe"),
year = 2022,
lat = hospital_coords$lat,
lng = hospital_coords$lng,
max_dist = 2
)As well as providing all of the functionality of importMeta(), networkMap() provides several additional arguments for customisation:
control: Any column of the equivalentimportMeta()dataset, which is used to create a “layer control” menu to allow readers to filter for certain sites. Thecontroloption is quite opinionated, and selects an appropriate style of layer control depending on the column selected (e.g., pollutants are switched between, whereas multiple site types can be selected at once).cluster: By default, if greater than 25 markers are plotted then they default to be clustered together until users zoom in close. This default behaviour improves the appearance and performance of the HTML map widget. Theclusterargument allows you to turn this feature on and off.provider: Any number of the leaflet providers (seeleaflet::providers). The default for this is a typical ‘street map’ and a satellite map, both of which can be toggled between.
3.3 Monitoring Data
3.3.1 Hourly data
To import data, you can use the importUKAQ() function. Some examples are below.
# import all pollutants from Marylebone Rd from 2000:2005
mary <- importUKAQ(site = "my1", year = 2000:2005)
# import nox, no2, o3 from Marylebone Road and Nottingham Centre for 2000
thedata <- importUKAQ(
site = c("my1", "nott"),
year = 2000,
pollutant = c("nox", "no2", "o3")
)
# import over 30 years of Mace Head O3 data!
o3 <- importUKAQ(site = "mh", year = 1987:2019)
# import hydrocarbon data from Marylebone Road
hc <- importUKAQ(site = "my1", year = 2008, hc = TRUE)
# Import data from the AQE network (York data in this case)
yk13 <- importUKAQ(site = "yk13", year = 2018, source = "aqe")
# Import data from the AURN *and* AQE network! Different networks are
# automatically detected
duo <- importUKAQ(site = c("my1", "yk13"), year = 2020)
# Or, `source` can be specified - useful for ambiguous codes (some 'locally
# managed' sites may have overlapping site codes w/ nationally managed
# sites, e.g., "AD1")
ad1_saqn <- importUKAQ(site = "ad1", year = 2020, source = "saqn")
ad1_eng <- importUKAQ(site = "ad1", year = 2020, source = "local")And to include basic meta data when importing air pollution data:
## default metadata
kc1 <- importUKAQ(site = "kc1", year = 2018, meta = TRUE)
glimpse(kc1)Rows: 8,760
Columns: 18
$ source <chr> "aurn", "aurn", "aurn", "aurn", "aurn", "aurn", "aurn", "aur…
$ site <chr> "London N. Kensington", "London N. Kensington", "London N. K…
$ code <chr> "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1…
$ date <dttm> 2018-01-01 00:00:00, 2018-01-01 01:00:00, 2018-01-01 02:00:…
$ co <dbl> 0.114872, 0.111043, 0.112000, 0.100512, 0.091897, 0.100512, …
$ nox <dbl> 8.32519, 8.89934, 9.41967, 9.36584, 7.21277, 7.64339, 10.173…
$ no2 <dbl> 8.11153, 8.54325, 8.99235, 8.93852, 6.94570, 7.26948, 10.013…
$ no <dbl> 0.13935, 0.23224, 0.27869, 0.27869, 0.17418, 0.24386, 0.1045…
$ o3 <dbl> 70.98040, 67.52118, 69.69982, 70.49810, 71.74542, 70.49810, …
$ so2 <dbl> NA, 2.40953, 2.49812, 2.12606, 2.39181, 2.28551, 2.23236, 2.…
$ pm10 <dbl> 12.425, 7.375, 5.625, 3.200, 3.875, 5.050, 9.400, 12.400, 15…
$ pm2.5 <dbl> 8.892, 4.363, 3.137, 1.792, 2.146, 2.618, 4.575, 6.109, 7.05…
$ ws <dbl> 5.5, 5.0, 4.8, 4.8, 5.3, 5.3, 4.4, 3.0, 2.6, 1.6, 1.6, 1.1, …
$ wd <dbl> 263.3, 256.4, 251.0, 246.8, 248.4, 248.0, 245.8, 239.5, 232.…
$ air_temp <dbl> 5.5, 5.1, 4.9, 4.7, 4.9, 5.0, 5.0, 4.6, 4.2, 3.7, 5.4, 5.7, …
$ site_type <chr> "Urban Background", "Urban Background", "Urban Background", …
$ latitude <dbl> 51.52105, 51.52105, 51.52105, 51.52105, 51.52105, 51.52105, …
$ longitude <dbl> -0.213419, -0.213419, -0.213419, -0.213419, -0.213419, -0.21…
## custom metadata - anything in `importMeta()`
kc1_zagglom <- importUKAQ(
site = "kc1",
year = 2018,
meta = TRUE,
meta_columns = c("zone", "agglomeration")
)
glimpse(kc1_zagglom)Rows: 8,760
Columns: 17
$ source <chr> "aurn", "aurn", "aurn", "aurn", "aurn", "aurn", "aurn", …
$ site <chr> "London N. Kensington", "London N. Kensington", "London …
$ code <chr> "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", …
$ date <dttm> 2018-01-01 00:00:00, 2018-01-01 01:00:00, 2018-01-01 02…
$ co <dbl> 0.114872, 0.111043, 0.112000, 0.100512, 0.091897, 0.1005…
$ nox <dbl> 8.32519, 8.89934, 9.41967, 9.36584, 7.21277, 7.64339, 10…
$ no2 <dbl> 8.11153, 8.54325, 8.99235, 8.93852, 6.94570, 7.26948, 10…
$ no <dbl> 0.13935, 0.23224, 0.27869, 0.27869, 0.17418, 0.24386, 0.…
$ o3 <dbl> 70.98040, 67.52118, 69.69982, 70.49810, 71.74542, 70.498…
$ so2 <dbl> NA, 2.40953, 2.49812, 2.12606, 2.39181, 2.28551, 2.23236…
$ pm10 <dbl> 12.425, 7.375, 5.625, 3.200, 3.875, 5.050, 9.400, 12.400…
$ pm2.5 <dbl> 8.892, 4.363, 3.137, 1.792, 2.146, 2.618, 4.575, 6.109, …
$ ws <dbl> 5.5, 5.0, 4.8, 4.8, 5.3, 5.3, 4.4, 3.0, 2.6, 1.6, 1.6, 1…
$ wd <dbl> 263.3, 256.4, 251.0, 246.8, 248.4, 248.0, 245.8, 239.5, …
$ air_temp <dbl> 5.5, 5.1, 4.9, 4.7, 4.9, 5.0, 5.0, 4.6, 4.2, 3.7, 5.4, 5…
$ zone <chr> "Greater London", "Greater London", "Greater London", "G…
$ agglomeration <chr> "Greater London Urban Area", "Greater London Urban Area"…
By default, the function returns data where each pollutant is in a separate column. However, it is possible to return the data in a tidy format (column for pollutant name, column for value) by using the option to_narrow:
my1 <- importUKAQ("my1", year = 2018, to_narrow = TRUE)It is also possible to return information on whether the data have been ratified or not using the option ratified (FALSE by default). So, add the option ratified = TRUE if you want this information.
3.3.2 Annual and other statistics
By default, all the functions above return hourly data. However, often there is a need to return data such as annual means of a long period of time. importUKAQ() (but not importImperial()) can return data for averaging times: annual, monthly, daily and for SO2 15-minute. The annual and monthly data also provide valuable information on data capture rates. The averaging statistic is selected with the data_type option. The values data_type can take include:
- “hourly” This is the default and specific site(s) must be provided.
-
“daily” Daily means returned and specific site(s) must be provided. Note that in the case of PM10 and PM2.5 daily measurements can be available from those derived from hourly measurements (using instruments such as TEOM, BAM and FIDAS) and from daily gravimetric measurements such as from a Partisol. In the returned data the gravimetric daily measurements are shown as
gr_pm10andgr_pm2.5, respectively. - “monthly” Monthly means returned. No site code is needed because all data for a particular year are returned. Data capture statistics are also given.
- “annual” Annual means returned. No site code is needed because all data for a particular year are returned. Data capture statistics are also given.
- “15_min” 15-minute SO2 concentrations returned for a specific site(s).
- “8_hour” Rolling 8-hour concentrations returned for a specific site(s) for O3 and CO.
- “24_hour” Rolling 24-hour concentrations returned for a specific site(s) for PM10 and PM2.5.
- “daily_max_8” Maximum daily rolling 8-hour maximum for O3 and CO.
- “daqi” Daily Air Quality Index (DAQI). See here for more details of how the index is defined.
Note that for annual and monthly statistics all network data is returned and the site option has no effect.
As an example, to import 5 years of annual mean data from the AURN:
uk_annual <- importUKAQ(year = 2016:2020, data_type = "annual", source = "AURN")By default, this will return data in “wide” format with a pollutant and its data capture rate in separate columns. Often it is more useful to have “narrow” format data, which is possible to select with the to_narrow option. Furthermore, it is also possible to return site meta data (site type, latitude and longitude) at the same time.
Below is an example of obtaining annual mean data for 2020.
uk_2020 <- importUKAQ(
year = 2020,
source = "aurn",
data_type = "annual",
meta = TRUE,
to_narrow = TRUE
)
uk_2020# A tibble: 3,268 × 10
source code site date species value data_capture site_type
<chr> <chr> <chr> <dttm> <chr> <dbl> <dbl> <chr>
1 aurn ABD Aberd… 2020-01-01 00:00:00 o3 45.5 0.610 Urban Ba…
2 aurn ABD Aberd… 2020-01-01 00:00:00 o3.dai… 57.5 NA Urban Ba…
3 aurn ABD Aberd… 2020-01-01 00:00:00 o3.aot… NA NA Urban Ba…
4 aurn ABD Aberd… 2020-01-01 00:00:00 o3.aot… NA NA Urban Ba…
5 aurn ABD Aberd… 2020-01-01 00:00:00 somo35 502. 0.607 Urban Ba…
6 aurn ABD Aberd… 2020-01-01 00:00:00 no 4.60 0.945 Urban Ba…
7 aurn ABD Aberd… 2020-01-01 00:00:00 no2 13.5 0.945 Urban Ba…
8 aurn ABD Aberd… 2020-01-01 00:00:00 nox 20.5 0.945 Urban Ba…
9 aurn ABD Aberd… 2020-01-01 00:00:00 so2 NA NA Urban Ba…
10 aurn ABD Aberd… 2020-01-01 00:00:00 co NA NA Urban Ba…
# ℹ 3,258 more rows
# ℹ 2 more variables: latitude <dbl>, longitude <dbl>
The pollutants returned include:
unique(uk_2020$species) [1] "o3" "o3.daily.max.8hour" "o3.aot40v"
[4] "o3.aot40f" "somo35" "no"
[7] "no2" "nox" "so2"
[10] "co" "pm10" "nv10"
[13] "v10" "pm2.5" "nv2.5"
[16] "v2.5" "gr10" "gr2.5"
[19] "o3.summer"
Now it is easy for example, to select annual mean data from 2020 for NO2 with a data capture rate of at least 80%:
filter(uk_2020, species == "no2", data_capture >= 0.8)# A tibble: 144 × 10
source code site date species value data_capture site_type
<chr> <chr> <chr> <dttm> <chr> <dbl> <dbl> <chr>
1 aurn ABD Aberde… 2020-01-01 00:00:00 no2 13.5 0.945 Urban Ba…
2 aurn ABD7 Aberde… 2020-01-01 00:00:00 no2 23.6 0.982 Urban Tr…
3 aurn ABD8 Aberde… 2020-01-01 00:00:00 no2 25.1 0.995 Urban Tr…
4 aurn AH Aston … 2020-01-01 00:00:00 no2 2.81 0.983 Rural Ba…
5 aurn ARM6 Armagh… 2020-01-01 00:00:00 no2 21.1 0.960 Urban Tr…
6 aurn BAAR Ballym… 2020-01-01 00:00:00 no2 15.6 0.893 Urban Tr…
7 aurn BALM Ballym… 2020-01-01 00:00:00 no2 10.3 0.993 Urban Ba…
8 aurn BAR3 Barnsl… 2020-01-01 00:00:00 no2 11.9 0.970 Urban Ba…
9 aurn BBRD Birken… 2020-01-01 00:00:00 no2 16.9 0.984 Urban Tr…
10 aurn BDMA Bradfo… 2020-01-01 00:00:00 no2 34.4 0.805 Urban Tr…
# ℹ 134 more rows
# ℹ 2 more variables: latitude <dbl>, longitude <dbl>
For the AURN, AQE, SAQN, WAQN and NI networks, it is also possible to return the DAQI (Daily Air Quality Index) by pollutant to save deriving it.
daqi_2020 <- importUKAQ(
year = 2020,
source = "aurn",
data_type = "daqi",
meta = TRUE
)
daqi_2020# A tibble: 148,513 × 12
source code site pollutant date concentration poll_index
<chr> <chr> <chr> <chr> <dttm> <dbl> <int>
1 aurn ABD Aberdeen no2 2020-01-01 00:00:00 39.2 1
2 aurn ABD Aberdeen pm10 2020-01-01 00:00:00 10 1
3 aurn ABD Aberdeen pm2.5 2020-01-01 00:00:00 9 1
4 aurn ABD7 Aberdeen… no2 2020-01-01 00:00:00 42.4 1
5 aurn ABD8 Aberdeen… no2 2020-01-01 00:00:00 30.6 1
6 aurn ACTH Auchenco… o3 2020-01-01 00:00:00 57 2
7 aurn ACTH Auchenco… pm10 2020-01-01 00:00:00 14 1
8 aurn ACTH Auchenco… pm2.5 2020-01-01 00:00:00 12 2
9 aurn AGRN Birmingh… no2 2020-01-01 00:00:00 23.2 1
10 aurn AGRN Birmingh… o3 2020-01-01 00:00:00 35 2
# ℹ 148,503 more rows
# ℹ 5 more variables: band <fct>, measurement_period <chr>, site_type <chr>,
# latitude <dbl>, longitude <dbl>
