Applies an adaptive Kolmogorov-Zurbenko filter to one or more pollutant columns in a data frame at multiple window sizes and returns the decomposed components. The KZA filter uses a standard KZ filter to detect structural breaks and shrinks the window near those breaks to preserve sharp features.
Arguments
- mydata
A data frame containing a
datefield inDateorPOSIXctformat. The input time series must be regular, e.g., hourly or daily.- pollutant
The name of a pollutant, e.g.,
pollutant = "o3". More than one pollutant can be supplied as a vector, e.g.,pollutant = c("o3", "nox").- m
Integer vector of maximum window sizes. Defaults to
c(25, 169, 721, 8761)(suited to hourly data). Values ofmshould be odd; even values will produce a symmetric window ofm + 1points rather thanm.- k
Integer. The number of iterations for the baseline KZ filter used to detect structural breaks.
- sensitivity
Numeric. Controls how aggressively the window shrinks at structural breaks (higher = more aggressive).
- data.thresh
Numeric (0–1). Minimum fraction of valid (non-
NA) values required within a window for a filtered value to be returned; otherwiseNAis returned. Applies to the actual window size, which is smaller near the series edges. Default is0.5(50%).- type
Used for splitting the data further. Passed to
cutData().- components
Logical. If
TRUE(default) and more than onemvalue is supplied, component columns are added by differencing adjacent filtered series.- comp.names
Character vector of names for the component columns. Must have length
length(m) + 1. Defaults toc("short", "synoptic", "intermediate", "seasonal", "trend")to match the defaultmvalues. If the length does not match, numbered names (comp_1,comp_2, ...) are used with a warning.- to_narrow
Logical. If
TRUE, return the data in tidy (long) format with acomponentcolumn and avaluecolumn instead of one column per component. Intermediate filter columns (kza_*) are dropped. Default isFALSE. Ignored whencomponents = FALSEor a singlemvalue is supplied.- ...
Passed to
cutData()for use withtype.
Value
When to_narrow = FALSE (default), a tibble with the original
columns plus intermediate filter columns (kza_{m}) and component
columns. When to_narrow = TRUE, a tidy tibble with a
component column (factor ordered fast to slow) and a value
column.
Details
With the default window sizes of 25, 169, 721 and 8761 (suited to hourly data), the function returns four intermediate filtered columns and five physical components derived by differencing:
short — daily cycle and short-term variations within it (
pollutant - kza_25)synoptic — 2–7 day weather systems (
kza_25 - kza_169)intermediate — weekly to monthly variability (
kza_169 - kza_721)seasonal — monthly to annual variability (
kza_721 - kza_8761)trend — multi-year trend (
kza_8761)
Edge effects
At the start and end of the series the filter window is silently truncated
rather than padded, so no NAs are introduced. However, values within
the affected boundary zone are averaged over fewer points than the interior
and should be interpreted with caution.
The affected length at each end of the series for a single filter pass is
floor(m / 2) observations. Because the filter is iterated k
times (each pass consuming the output of the previous one), the total
affected zone at each end is approximately k * floor(m / 2)
observations. With the default m = c(25, 169, 721, 8761) and
k = 5, the affected zones are roughly 60 h (~2.5 days), 420 h
(~17 days), 1,800 h (~75 days), and 21,900 h (~2.5 years) at each end
respectively. The trend component therefore requires at least 5–6
years of data for the interior estimates to be unaffected.
Examples
if (FALSE) { # \dontrun{
# Default: 4 window sizes, 5 descriptively named components returned
mydata <- kzaFilter(mydata, pollutant = "nox")
# Tidy long format
mydata <- kzaFilter(mydata, pollutant = "nox", to_narrow = TRUE)
# Single window size (no component decomposition)
mydata <- kzaFilter(mydata, pollutant = "nox", m = 24, k = 5)
} # }
