Competition from 16.10 (included) to 30.10 (included).
Report due for 30.10
Remember to forecast at least one day before the date
for which the forecast is intended, before 11pm at the latest.
Send your forecast before 8pm to receive
feedback information.
Use our discussion forum
for questions related to methods, implementations and potential bugs.
The goal of this competition is to forecast the number of times a R-package
will be downloaded via CRAN (one of the major
repository for R packages). We are interested in particular with
the fpp2
package.
The accuracy of the forecasts is measured by the mean squared error: $$(x_t - \hat{x}_t)^2$$
The objectives of this case study are the followings:
The number of downloads of packages on CRAN are easily obtained using the cranlogs
package.
To install it, you simply need to install the package of the course,
the cranlogs
package will be automatically installed.
Otherwise, you can follow this example:
# if not installed, install the package of the course
devtools::install_github("SMAC-Group/fc2018")
# install the cranlogs package
devtools::install_github("metacran/cranlogs")
You can check the GitHub page of this package for more information and examples. Here are some examples:
# the total number of downloads of all packages from last week
cran_downloads(when = "last-week")
# the total number of downloads of all packages for a specific window of time
cran_downloads(from = "2017-08-01", to = "2018-08-01")
# the total number of downloads of `ggplot2` for a specific window of time
cran_downloads(from = "2018-07-01", to = "2018-08-01", packages = "ggplot2")
Let’s turn our attention to the fpp2
package that we will forecast.
If you check the package’s archives, you can realize it was created on the 23rd of February 2017, so it is useless to retrieve data before. Let’s obtain the data up to today (12th of September 2018) and plot it:
fpp2 <- cran_downloads(packages = "fpp2", from = "2017-02-23", to = "2018-09-12")
# plot the time series
ggplot(fpp2, aes(x = date, y = count)) +
geom_line() + xlab("Dates") + ylab("Number of downloads on CRAN") +
ggtitle("Number of downloads of 'fpp2'")
You can notice peeks and zeros appearing. These questions will be addressed in a future lecture.
Date | Max | Lumia |
---|---|---|
16.10 | 77.31 | 131.07 |
17.10 | 91.73 | 84.98 |
18.10 | 110.93 | 268.83 |
19.10 | $\emptyset$ | 104.14 |
20.10 | 0.01 | 22.87 |
21.10 | 147 | 100 |
22.10 | $\emptyset$ | 128.33 |
23.10 | 20.17 | 7.40 |
24.10 | 78.74 | 97 |
25.10 | $\emptyset$ | 55 |
26.10 | 18.47 | 155.89 |
27.10 | 156.78 | 11 |
28.10 | 20.77 | 30 |
Table: Absolute difference between forecast and realization.