Utility function to filter a data frame by a date range or specific date periods (month, year, etc.). All options are applied in turn, meaning this function can be used to select quite complex dates simply.
Utility function to make it easier to select periods from a data frame before sending to a function
Usage
selectByDate(
mydata,
start = "1/1/2008",
end = "31/12/2008",
year = 2008,
month = 1,
day = "weekday",
hour = 1
)
selectByDate(
mydata,
start = "1/1/2008",
end = "31/12/2008",
year = 2008,
month = 1,
day = "weekday",
hour = 1
)
Arguments
- mydata
A data frame containing a
date
field in hourly or high resolution format.- start
A start date string in the form d/m/yyyy e.g. “1/2/1999” or in ‘R’ format i.e. “YYYY-mm-dd”, “1999-02-01”
- end
See
start
for format.- year
A year or years to select e.g.
year = 1998:2004
to select 1998-2004 inclusive oryear = c(1998, 2004)
to select 1998 and 2004.- month
A month or months to select. Can either be numeric e.g.
month = 1:6
to select months 1-6 (January to June), or by name e.g.month = c("January", "December")
. Names can be abbreviated to 3 letters and be in lower or upper case.- day
A day name or or days to select.
day
can be numeric (1 to 31) or character. For exampleday = c("Monday", "Wednesday")
orday = 1:10
(to select the 1st to 10th of each month). Names can be abbreviated to 3 letters and be in lower or upper case. Also accepts “weekday” (Monday - Friday) and “weekend” for convenience.- hour
An hour or hours to select from 0-23 e.g.
hour = 0:12
to select hours 0 to 12 inclusive.
Details
This function makes it much easier to select periods of interest from a data frame based on dates in a British format. Selecting date/times in R format can be intimidating for new users. This function can be used to select quite complex dates simply - see examples below.
Dates are assumed to be inclusive, so start = "1/1/1999"
means that
times are selected from hour zero. Similarly, end = "31/12/1999"
will
include all hours of the 31st December. start
and end
can also
be in standard R format as a string i.e. "YYYY-mm-dd", so start =
"1999-01-01"
is fine.
All options are applied in turn making it possible to select quite complex dates
Examples
## select all of 1999
data.1999 <- selectByDate(mydata, start = "1/1/1999", end = "31/12/1999 23:00")
#> Warning: All formats failed to parse. No formats found.
head(data.1999)
#> # A tibble: 0 × 10
#> # ℹ 10 variables: date <dttm>, ws <dbl>, wd <int>, nox <int>, no2 <int>,
#> # o3 <int>, pm10 <int>, so2 <dbl>, co <dbl>, pm25 <int>
tail(data.1999)
#> # A tibble: 0 × 10
#> # ℹ 10 variables: date <dttm>, ws <dbl>, wd <int>, nox <int>, no2 <int>,
#> # o3 <int>, pm10 <int>, so2 <dbl>, co <dbl>, pm25 <int>
# or...
data.1999 <- selectByDate(mydata, start = "1999-01-01", end = "1999-12-31 23:00")
#> Warning: All formats failed to parse. No formats found.
# easier way
data.1999 <- selectByDate(mydata, year = 1999)
# more complex use: select weekdays between the hours of 7 am to 7 pm
sub.data <- selectByDate(mydata, day = "weekday", hour = 7:19)
# select weekends between the hours of 7 am to 7 pm in winter (Dec, Jan, Feb)
sub.data <- selectByDate(mydata,
day = "weekend", hour = 7:19, month =
c("dec", "jan", "feb")
)
## select all of 1999
data.1999 <- selectByDate(mydata, start = "1/1/1999", end = "31/12/1999")
head(data.1999)
#> # A tibble: 6 × 10
#> date ws wd nox no2 o3 pm10 so2 co pm25
#> <dttm> <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
#> 1 1999-01-01 00:00:00 5.04 140 88 35 4 21 3.84 1.02 18
#> 2 1999-01-01 01:00:00 4.08 160 132 41 3 17 5.24 2.7 11
#> 3 1999-01-01 02:00:00 4.8 160 168 40 4 17 6.51 2.87 8
#> 4 1999-01-01 03:00:00 4.92 150 85 36 3 15 4.18 1.62 10
#> 5 1999-01-01 04:00:00 4.68 150 93 37 3 16 4.25 1.02 11
#> 6 1999-01-01 05:00:00 3.96 160 74 29 5 14 3.88 0.725 NA
tail(data.1999)
#> # A tibble: 6 × 10
#> date ws wd nox no2 o3 pm10 so2 co pm25
#> <dttm> <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
#> 1 1999-12-31 18:00:00 4.68 190 226 39 NA 29 5.46 2.38 23
#> 2 1999-12-31 19:00:00 3.96 180 202 37 NA 27 4.78 2.15 23
#> 3 1999-12-31 20:00:00 3.36 190 246 44 NA 30 5.88 2.45 23
#> 4 1999-12-31 21:00:00 3.72 220 231 35 NA 28 5.28 2.22 23
#> 5 1999-12-31 22:00:00 4.08 200 217 41 NA 31 4.79 2.17 26
#> 6 1999-12-31 23:00:00 3.24 200 181 37 NA 28 3.48 1.78 22
# or...
data.1999 <- selectByDate(mydata, start = "1999-01-01", end = "1999-12-31")
# easier way
data.1999 <- selectByDate(mydata, year = 1999)
# more complex use: select weekdays between the hours of 7 am to 7 pm
sub.data <- selectByDate(mydata, day = "weekday", hour = 7:19)
# select weekends between the hours of 7 am to 7 pm in winter (Dec, Jan, Feb)
sub.data <- selectByDate(mydata, day = "weekend", hour = 7:19, month =
c("dec", "jan", "feb"))