Covid-19 Data Sources for Programmers

I’ve been doing analysis of Covid cases to try to understand what to expect in terms of lockdown length and disease progress, especially in Colorado and Brazil, the places I spend the most time in. There are a lot of data sources around, and it took me a few hours to find and test a number of them. I hope this saves time for anyone interested in crunching Covid numbers. If you have suggestions and tips on data sources, please open a PR or issue in my Github repo. Here we go.

John Burn-Murdoch and his team at the Financial Times have done a great job reporting visually on the pandemic. They have fewer and simpler charts than many other sites but their charts are done exquisitely and distill a lot of data to provide you the clearest picture available of each country’s situation, plus a few of the regional hotspots around the world.

Our World in Data is a wonderful project based at Oxford University that attempts to explain the world using rigorous data sources and beautiful charts. They have been producing a lot of great Covid content since the pandemic broke out. If you have some time, I suggest exploring the non-covid areas of the site as well (and if you enjoy that, I highly recommend the book Factfulness). All of their work is open sourced.

The OWID data is in a GitHub repo. Their main source is the European CDC, which publishes confirmed cases and deaths aggregated by date and country for most of the world (not just Europe) in JSON, CSV, and XML files.

Johns Hopkins University has built a wildly popular dashboard available in desktop and mobile versions. Their repository is public and it aggregates data from a variety of sources into easy to use CSV files (there’s also a JSON mirror). In addition to worldwide national totals, data is available for individual US counties and states. It includes number of cases and deaths along with recovered and active patients. Since they aggregate data from the US CDC, China CDC, European CDC and several other national institutions, this is a great way to get your hands at worldwide data.

The New York Times offers a plethora of high-quality Covid maps and visualizations. It’s not a surprise Mike Bostock, creator of the D3.js library, used to work there. The NYT open sourced a repository providing high-quality and painstakingly verified data for US cases at both the state and county level. This is probably the best source of data for analyzing number of cases and deaths in the US.

Another outstanding US data source is the Covid Tracking Project, powered by dozens of volunteers attempting to collect data on number of tests performed, positive and negative results, hospitalizations, patients in the ICU and on ventilators, and so on. They face a severe dearth of information in the US and the complete lack of centralized reporting, but they’re making the best of it. If you want to attempt more sophisticated analysis, this is a good source. But mind the gaps.

I’m sure you’ve hit Worldometer while googling covid information. They provide encyclopedic amounts of data about Covid infection worldwide through an effective bare-bones interface with good charts. Data is aggregated by country and includes deaths and active cases, both by day and totalized.

Finally, if you are interested in more regional data for other countries, there are great repositories for Spain and Italy. It’s not easy to aggregate UK data, but Tom White has a good repo. Álvaro Justen has done the same for Brazil, while research lab Fiocruz has a good web UI for Brazilian data.

If you know of other high-quality regional repos, please send a PR or GitHub issue. I’d love to expand this post with the best repos for each region.

Comments