Athough the NBER NPI page is no longer maintained, those files will continue to be available. That offering j does not include information about providers that were not current in April of 2019.
CMS offers a complete file of currently eligible providers each month, but does not offer a file that includes the full content of older deactivated records. Here we offer new dataset created by concatenating 15 monthly files from April 2007 to the present day with roughtly 12 month spacing between files. Our collection of monthly files was not perfectly regular, however any provider that was active since April 2007 for at least a year will be included, and some others. We are certainly interested in obtaining files for 2005 and 2006. The omission of short-lived providers may be a source of bias in certain applications, such as studies of fraudulent providers, the presence of all reasonably persistent providers with their historical data is an improvement over using only survivors. The most recent file can be downloaded from the CMS website from the link at https://download.cms.gov/nppes/NPI_Files.html
We did include a variable source which gives the filename of the source file for the included record. If our deduplication represents a loss of information, then please contact us with an explanation and we will try to do better.
Notice that there might not be a record for year t, if there was no change that year. To extract all records valid for year 2018, try the following code:
All files are zipped. They are very fluffy. The full file takes 86GB in Stata but only 1.8GB compressed. The core variables (all but the multiplicative variables) take 18GB in Stata, but 1.8GB compressed. Possibly merging individual multiplicative datasets with the core dataset would be the most practical way to proceed.
In April of 2020 We were able to obtain weekly files back to March 9, 2015, suggesting that CMS retains these files for 5 years. They are not linked on the CMS website, but can be obtained by guessing the URL (which differs only by the date fields). We did not use the weekly files in this round.
The original .csv files have a header with variable descriptions that are not suitable as variable names in a database or statistical package. Therefore we have created variable names and turned the supplied header into variable labels.
We have not updated the crosswalks, presumably the updates are not affected since UPINs have not been issued for many years.
We are very interested in speaking with users of this data, especially users of the older offerring. Please write or call Daniel Feenberg (email@example.com, 617-863-0343). We expect to provide a more comprehensive file once we have discussed with users their needs.