-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.Rmd
More file actions
207 lines (170 loc) · 8.36 KB
/
README.Rmd
File metadata and controls
207 lines (170 loc) · 8.36 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
output: github_document
editor_options:
markdown:
wrap: 80
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "80%", fig.align = "center",
dpi = 500
)
```
# rfars <img src="man/figures/logo.svg" align="right" width="120"/>
<!-- badges: start -->
[](https://cran.r-project.org/package=rfars)
[](https://github.com/s87jackson/rfars/actions/workflows/R-CMD-check.yaml)
[](https://CRAN.R-project.org/package=rfars)
<!-- badges: end -->
The goal of `rfars` is to facilitate transportation safety analysis by
simplifying the process of extracting data from official crash databases. The
[National Highway Traffic Safety Administration](https://www.nhtsa.gov/)
collects and publishes a census of fatal crashes in the [Fatality Analysis
Reporting
System](https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars)
and a sample of fatal and non-fatal crashes in the [Crash Report Sampling
System](https://www.nhtsa.gov/crash-data-systems/crash-report-sampling-system)
(an evolution of the [General Estimates
System](https://www.nhtsa.gov/national-automotive-sampling-system/nass-general-estimates-system)).
The [Fatality and Injury Reporting System Tool](https://cdan.dot.gov/query)
allows users to query these databases, and can produce simple tables and graphs.
This suffices for simple analysis, but often leaves researchers wanting more.
Digging any deeper, however, involves a time-consuming process of downloading
annual ZIP files and attempting to stitch them together - after first combing
through immense data dictionaries to determine the required variables and table
names.
`rfars` allows users to download the last 10+ years of FARS and GES/CRSS data
with just one line of code. The result is a full, rich dataset ready for
mapping, modeling, and other downstream analysis. Codebooks with variable
definitions and value labels support an informed analysis of the data (see
`vignette("Searchable Codebooks", package = "rfars")` for more information).
Helper functions are also provided to produce common counts and comparisons.
## Installation
You can install the latest version of `rfars` from [GitHub](https://github.com/)
with:
``` r
# install.packages("devtools")
devtools::install_github("s87jackson/rfars")
```
or the CRAN stable release with:
``` r
install.packages("rfars")
```
Then load `rfars` and some helpful packages:
```{r, echo=TRUE, warning=FALSE, message=FALSE}
library(rfars)
library(dplyr)
```
## Getting and Using Data
The `get_fars()` and `get_gescrss()` are the primary functions of the `rfars`
package. These functions download and process data files directly from [NHTSA's
FTP Site](https://www.nhtsa.gov/file-downloads?p=nhtsa/downloads/), or pull the
prepared data stored on your local machine, or (as of Version 2.0) pull the
prepared data from Zenodo. The data files hosted on Zenodo are stable, have
DOIs, and replicate the data that would be produced by `get_fars()` and
`get_gescrss()`, but in a fraction of the time.
`get_fars()` and `get_gescrss()` take the parameters `years` and `states` (FARS)
or `regions` (GES/CRSS). As the source data files follow an annual structure,
`years` determines how many file sets are downloaded or loaded, and
`states`/`regions` filters the resulting dataset. Downloading and processing
these files can take several minutes. Before downloading, `rfars` will inform
you that it's about to download files and asks your permission to do so. To skip
this dialog, set `proceed = TRUE`. You can use the `dir` and `cache` parameters
to save an RDS file to your local machine. The `dir` parameter specifies the
directory, and `cache` names the file (be sure to include the .rds file
extension).
Executing the code below will download the prepared FARS and GES/CRSS databases
for 2014-2023.
```{r, eval=F}
myFARS <- get_fars(proceed = TRUE)
myCRSS <- get_gescrss(proceed = TRUE)
```
```{r, echo=FALSE}
vignette_data <- rfars:::vignette_data$myFARS_sample
```
`get_fars()` and `get_gescrss()` return a list with six dataframes: `flat`,
`multi_acc`, `multi_veh`, `multi_per`, `events`, and `codebook`.
The tables below show records for randomly selected crashes to illustrate the
content and structure of the data. The tables are transposed for readability.
Each row in the `flat` dataframe corresponds to a person involved in a crash. As
there may be multiple people and/or vehicles involved in one crash, some
variable-values are repeated within a crash or vehicle. Each crash is uniquely
identified with `id`, which is a combination of `year` and `st_case`. Note that
`st_case` is not unique across years, for example, `st_case` 510001 will appear
in each year. The `id` variable attempts to avoid this issue. The GES/CRSS data
includes a `weight` variable that indicates how many crashes each row
represents.
```{r, results='asis', echo=FALSE}
vignette_data$flat %>%
t() %>%
knitr::kable(format = "html", caption = "The 'flat' dataframe (transposed for readability)")
```
The `multi_` dataframes contain those variables for which there may be a varying
number of values for any entity (e.g., driver impairments, vehicle events,
weather conditions at time of crash). Each dataframe has the requisite data
elements corresponding to the entity: `multi_acc` includes `st_case` and `year`,
`multi_veh` adds `veh_no` (vehicle number), and `multi_per` adds `per_no`
(person number).
```{r, results='asis', echo=FALSE}
vignette_data$multi_acc %>%
arrange(st_case) %>%
#t() %>%
knitr::kable(format = "html", caption = "The 'multi_acc' dataframe")
```
```{r, results='asis', echo=FALSE}
vignette_data$multi_veh %>%
arrange(st_case) %>%
#t() %>%
knitr::kable(format = "html", caption = "The 'multi_veh' dataframe")
```
```{r, results='asis', echo=FALSE}
vignette_data$multi_per %>%
arrange(st_case) %>%
#t() %>%
knitr::kable(format = "html", caption = "The 'multi_per' dataframe")
```
The `events` dataframe provides a sequence of events for each vehicle in each
crash. See the vignette("Crash Sequence of Events", package = "rfars") for more
information.
```{r, results='asis', echo=FALSE}
vignette_data$events %>%
arrange(st_case) %>%
#t() %>%
knitr::kable(format = "html", caption = "The 'events' dataframe")
```
The `codebook` dataframe provides a searchable codebook for the data, useful if
you know what concept you're looking for but not the variable that describes it.
`rfars` also includes pre-loaded codebooks for FARS and GESCRSS
(`rfars::fars_codebook` and `rfars::gescrss_codebook`). See
`vignette('Searchable Codebooks', package = 'rfars')` for more information.
## Counts
See `vignette("Counts", package = "rfars")` for information on the pre-loaded
`annual_counts` dataframe and the `counts()` and `compare_counts()` functions.
Also see `vignette("Alcohol Counts", package = "rfars")` for details on how BAC
values are imputed and reported in *Traffic Safety Facts*.
## Helpful Links
- [National Highway Traffic Safety Administration
(NHTSA)](https://www.nhtsa.gov/)
- [Fatality Analysis Reporting System
(FARS)](https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars)
- [Fatality and Injury Reporting System Tool
(FIRST)](https://cdan.dot.gov/query)
- [FARS Analytical User's
Manual](https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813794)
- [General Estimates System
(GES)](https://www.nhtsa.gov/national-automotive-sampling-system/nass-general-estimates-system)
- [Crash Report Sampling System
(CRSS)](https://www.nhtsa.gov/crash-data-systems/crash-report-sampling-system)
- [CRSS Analytical User's
Manual](https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813796)
- [NCSA and Other Data
Sources](https://cdan.dot.gov/Homepage/MotorVehicleCrashDataOverview.htm)
- [NHTSA FTP Site](https://www.nhtsa.gov/file-downloads?p=nhtsa/downloads/)
```{r, include=FALSE}
unlink(paste0(getwd(),"/FARS data"), recursive = TRUE)
```