I am a big fan of the data.table package in R for data manipulation. I use the csv reader function fread
all the time and enjoy the concise syntax for many basic queries of the data. There are some special symbols associated with the package that I’ve used before, especially .N
and .SD
. However, I had never used the .GRP
symbol until I faced a particular query for a class assignment recently. I am documenting how I used it in this post.
The data come from the pnwflights14 package which is modeled upon the nycflights13 package.
library(data.table)
data("flights", package = "pnwflights14")
setDT(flights)
# just analyze flights originating from Portland
flights <- flights[origin == "PDX"]
The analysis question is: by month, what new routes were added or removed? Only the dest
and month
columns are necessary to answer this query. The special symbol .GRP
is used with the data.table syntax as an index for the current group. So, I grouped the data by month and the .GRP
symbol now aligns with the month of the current group which allows for comparison with the previous month.
# flights dropped from previous month
dropped <- flights[order(month), .(dropped = setdiff(flights[month == .GRP - 1, dest], dest)), by = month]
# flights added from previous month
added <- flights[order(month), .(added = setdiff(dest, flights[month == .GRP - 1, dest])), by = month][month != 1]
I used the excellent gt package for creating a formatted table.
Routes added and dropped | ||
---|---|---|
Changes in destinations from previous month1 | ||
Dropped | Added | |
Feb | ABQ | — |
Mar | — | BOI, ABQ |
Apr | — | PHL |
May | KOA, LIH | — |
Jun | — | FAI, HOU, AUS, BWI, STL |
Jul | BOI, RNO, PSP, LMT | — |
Aug | — | — |
Sep | HOU, AUS | — |
Oct | RDM, BWI, EUG, FAI, STL | PSP |
Nov | PHL | KOA, LIH |
Dec | — | — |
Source: pnwflights R package | ||
1
For flights departing PDX in the year 2014.
|