Setup a Play-by-Play Database with nflfastR
Nolan MacDonald
Source:vignettes/setup_pbp_db.Rmd
setup_pbp_db.Rmd
Install Packages
Data from nflfastR
can be stored in a database to access
locally instead of pulling play-by-play information each time. The
database takes up a significant amount of memory, as it stores data
dating back to the 1999 season. Working with the database is beneficial
as is allows you to only bring into memory the data you actually need.
Using R, it is relatively easy to work with databases as long as you are
comfortable with dplyr
functions to manipulate and tidy
data.
To use the database, the DBI
and RSQLite
packages should be installed. To install these required packages, use
the following commands in the R console:
install.packages("DBI")
install.packages("RSQLite")
Play-By-Play Database
Now that the packages have been installed, the database can be built.
nflfastR
uses the function update_db()
to work
with databases. Using update_db()
with no arguments will
build a database with default arguments, where the SQLite database will
be stored as pbp_db
that contains all of the play-by-play
data in a table called nflfastR_pbp
. To store the database
in a specified directory, use the argument dbdir
. To
specify the file name use dbname
and to change the table
name use tblname
.
If there is already an existing database and you want to rebuild it
from scratch, use the argument force_rebuild=TRUE
. This is
typically performed when there is an update to play-by-play data from
fixing a bug or you want to wipe the database and update it. If you want
to rebuild only specific seasons, you can use the command
force_rebuild=2023
or for multiple seasons,
force_rebuild=c(2022, 2023)
.
Summary of Database and Arguments:
-
update_db()
: Build or update play-by-play database-
update_db(dbdir = "path/to/save/db")
: Build play-by-play database and save to path -
update_db(dbname="db_name")
: Build play-by-play database with custom name (Default:nflfastR_pbp
) -
update_db(tblname="tblname")
: Build play-by-play database with custom table name (Default:nflfastR_pbp
) -
update_db(force_rebuild=TRUE)
: Rebuild entire play-by-play database -
update_db(force_rebuild=c(2022, 2023))
: Rebuild play-by-play database 2022-2023 seasons
-
-
Create Database
To first create and dump a database you need to load the libraries
that were recently installed. After loading the libraries,
update_db()
is utilized to create or update the database.
In this example, the database is saved in the repository’s directory,
data/pbp_db
.
Load Database
The database is pbp_db
loaded from the directory
data/pbp_db
. Once connected to the database with
dbConnect()
, the tables stored in the database are obtained
with dbListTables
. There should only be one table with the
newly constructed database, nflfastR_pbp
, as previously
discussed since the default naming schemes were used.
# Connect to database
connect <- DBI::dbConnect(RSQLite::SQLite(), "./data/pbp_db")
# List tables
tables <- DBI::dbListTables(connect)
# Close the database when finished
DBI::dbDisconnect(connect)
tables
Another option is to look at the fields (columns) that are stored in
the table. Use dbListFields()
to show the data available in
the table. For the example below, the fields are saved in a list and
head()
is used to show only the first 10 fields since there
is a large amount of data (372 fields!).
# Connect to database
connect <- DBI::dbConnect(RSQLite::SQLite(), "./data/pbp_db")
# List fields
fields <- DBI::dbListFields(connect, "nflfastR_pbp")
# Close the database when finished
DBI::dbDisconnect(connect)
head(fields, 10)
Finally, the play-by-play table is loaded from the database with
dplyr::tbl()
. Use the play-by-play information to write
code and conduct the analysis, and make sure to disconnect the database
when you are finished.
# Connect to database
connect <- DBI::dbConnect(RSQLite::SQLite(), "./data/pbp_db")
# Load pbp_db
pbp_db <- dplyr::tbl(connect, "nflfastR_pbp")
# Additional code here...
# Close the database when finished
DBI::dbDisconnect(connect)
Calling Functions with Database
Now that a database has been created and saved,
nuclearff
functions can be utilized. For example, try
obtaining advanced RB stats for a season but using the function
get_rb_adv_stats_season()
. The default arguments is to look
for a play-by-play database stored in data/pbp_db
named
pbp_db
with a table called nflfastR_pbp
. If
the user database is saved elsewhere or with different names, then
define the argument with a string. After loading the required packages,
the function can be called.
Additional Information
To determine if any data from nflverse
has been updated,
check the nflverse
Data Update and Availability Schedule.
For more information, refer to the nflfastR
package
[1] or the nflfastR documentation.