Department of Transportation Airline DB1A and DB1B Data
Downloads
Important Information
The data available on this page are only for academic research. Use of
these data for consulting or other for-profit activities is not permitted
under the terms of use.
The domestic DB1A/DB1B data are now available without charge
from the
Department
of Transportation website for 1993-present.
NOTE: The Data Bank 1A was replaced in 2003 by Data Bank 1B.
These datasets are identical except for (1) DB1B indentifies both
the operating and the ticketing carrier while DB1A assumed they were
the same and (2) DB1B has changed the codes for seat classes. The
data sets here are based on the DB1A data through 2003Q4 and DB1B
data starting with 2003Q1, and use consistent codes for seat classes.
Below they are referred to collectively
as DB1A/DB1B. Both are a 10% sample of nearly all tickets sold,
where selection of the sample is based on the last digit of the
ticket identifier being a zero.
All of the data available here are translations/refinements of the
U.S. Department of Transportation's O&D Data Bank 1A Ticket Dollar
Value Database (DB1A and DB1B) by Severin Borenstein.
The data available here are only for domestic U.S. tickets. The full
DB1A/DB1B includes a sample of international tickets for U.S. carriers,
but access to those data requires being a U.S. citizen and receiving
authorization.
You can purchase the full DB1A/DB1B data directly from the DOT (if
you are a U.S. citizen and after you receive authorization from DOT) for a
cost of about $350 per CD. Many quarters of data fit on a CD. For more
information on this, go to the DOT's airline data webpage. Also, for more
information on the DB1A/DB1B, go to the DOT's website. The DB1A/DB1B is a
quarterly dataset. The quarter in which a ticket appears in the sample is
based on the first date of travel of the ticket. There is no further
information in the dataset about dates of travel.
All data are available here are for 1979Q1 through 2016Q3, however
there are data reliability issues for the first few years, with the worst
being 1980Q4 when EA and DL appear to have significantly under-reported.
Through much of the 1980s there are airport reporting problems with some
airlines reporting city names rather than the particular airport (e.g.,
NYC rather than LGA) for some of their tickets. These problems are not
corrected in these files.
There are two forms of the data available here:
1. Quarterly Ticket-Level Domestic DB1A/DB1B
This is a translation of the DOT DB1A/DB1B that drops the more
unusual tickets (e.g., any ticket with more than 4 coupons, one-way
ticket with more than 2 coupons) and all international tickets.
Size: About 100-200 Mb per quarter uncompressed. About 10-30 Mb per
quarter zipped.
Each file is in fixed format with the following record layout:
1 = Point of Purchase (B=Base Airport, R=Reference Airport)
2-4 = 3-letter code for base airport
5-7 = 3-letter code for reference airport
8-10 = 3-letter code for change-of-plane airport if there is one
11-12 = 2-letter code for first-segment carrier
13-14 = 2-letter code for second-segment carrier
15-16 = 2-letter fare code for first segment
17-18 = 2-letter fare code for second segment
19-22 = Fare
23-26 = First-segment distance
27-30 = Second-segment distance
31-34 = Base-to-reference nonstop distance
35-40 = Number of passengers
41-42 = Reporting carrier (assumed to be first segment carrier in DB1A)
43-45 = 3-digit numerical code for base airport (see airports.lst)
46-48 = 3-digit numerical code for reference airport (see airports.lst)
49-51 = 3-digit numerical code for change-of-plane airport (see
airports.lst)
52 = Ticket type code
SCREENING
The following records are dropped from the original DB1A:
- Any record that includes an airport outside the 50-state U.S.
- Any record with more than 4 coupons
- Any one-way ticket with more than 2 coupons and any 3 or 4
coupon ticket with more than two trip-break points
OTHER NOTES:
- Round-trip and open-jaw tickets are broken into two records,
one for each directional trip. For round-trip (closed-jaw)
tickets, the fare in DB1A is divided in half for each of the
directional trips. For open-jaw tickets, fare is divided by
proportion of ticket miles is each of the directional trips.
- Some records contain airport codes that are not included in the DOT's
Database 5 or for which no location information is present. For these
records, no distances are calculated. The records are still included
with the 3-letter airport codes, but distances are set to 0.
For these records, round-trip open-jaw fares are divided in half, rather than weighted by
share of distance, since distance is not known.
- No screening of data based on fare reasonableness has been
done.
TRIP TYPE CODES
- O = One-way ticket.
- R = One direction of a round-trip ticket.
- U = One direction of an "unbalanced" ticket (i.e., round trip
ticket
with 2-coupons in one direction and 1-coupon in other direction).
- I = Interline ticket. Change of carrier within at least one
of the
directional trips on the ticket. Used only on round-trip tickets,
because interline on one-way tickets is evident from carrier listings.
Note: Not used if outbound trip entirely on one carrier and return
entirely on a different carrier. Such tickets are not distinguishable
from one-carrier round-trip tickets in this datasets.
[Supercedes U or R].
- J = Open Jaw ticket. Trip destination on second directional
trip
of
the ticket is not equal to trip origination on first directional trip
of the ticket. [Supercedes U, R or I].
In addition to the fixed format dataset, each zip file contains four more files:
- AIRPORTS.LST is a listing of U.S. domestic airports in the
order that
corresponds to the numerical codes for airports in the dataset (e.g., ORD is the first airport listed and has the numerical code 001. The ordering is based on a list of airports by airport size from the 1980s.)
- AIRPORTCODES.DOCX is a listing of all airports worldwide and
their
codes as of 2019. This is taken from http://airportcodes.org .
- NBOE.DCT is a Stata dictionary file that reads the fixed
format data
set into Stata.
- NBOEYYQ.RPT is a file for quarter YYQ that reports various
statistics on
the production of this file from the full DB1A/DB1B, including the share of tickets that fall into various categories.
2. Aggregation of Domestic DB1A/DB1B into Market-Carrier Dataset
This is a relatively compact summary of the domestic DB1A/DB1B.
It compresses all O&D information for a carrier on a route into
one record, giving direct and change-of-plane passenger counts and
average fares for the given carrier on the route. The Market Data
file is available as a Stata dataset. It is created from the
domestic airline ticket data from the DOT DB1A/DB1B, a 10% sample of
all tickets collected by US carriers.
Size: About 500 MB uncompressed for all data, 1979Q1 to 2016Q3
(200 MB compressed).
OTHER NOTES
- Tickets with an international segment are excluded.
- First-class tickets are excluded.
- Tickets must be one-way or round-trip; open-jaw, circle trips, etc
are excluded.
- A ticket must have no more than 2 coupons for a one-way trip, no
more
than 4 coupons (and no more than 2 coupons each way) for a round-trip
ticket.
- Tickets with fare less than $20 or fares above $9998 excluded.
-
Tickets with fares more than 5 times
USDOT's Standard Industry Fare Level for
observed trip distance during observed quarter are excluded.
- Records are for one-way trips, so round-trip tickets are split into
two one-way observations
- route is a pair of airports without regard to direction
- carrier-set is one carrier and a blank if the trip is one-coupon. If
the trip is two-coupon, carrier-set is the pair of airlines with codes
listed in alphabetical order. Information about the order of flights is
not retained.
- dir-cop is a distinction between one-coupon (direct) and two-coupon
(change-of-plane) tickets. On two-coupon tickets, the location of the
change-of-plane is not retained, though the average total routing distance
for all c-o-p tickets collapsed into a single record is reported.
Record Layout:
yr -- year
qtr -- quarter
ap1 -- 3-letter alpha code of the first airport (by alphabetical ordering)
ap2 -- 3-letter alpha code of the secondt airpor (by alphabetical
ordering) - blank if one-coupon ticket
cr1 -- 2-letter alpha code of the first carrier (by alphabetical ordering)
cr2 -- 2-letter alpha code of the second carrier (by alphabetical
ordering) - blank if one-coupon ticket
pax -- number of passengers reported in record
nsdst -- non-stop distance from airport ap1 to airport ap2
avdst -- average total routing distance of passengers in this record -
equal to nsdst if one-coupon ticket
avprc -- average one-way equivalent price paid by passengers reported in
this record
cop -- 0 if one-coupon ticket, 1 if two-coupon ticket
Questions may be sent to data@nber.org
Last modified 8 December 2019 by drf