Data
There are a ton of places to find data related to education, public policy, and administration (as well as data on pretty much any topic you want) online:
Data is Plural newsletter: Jeremy Singer-Vine sends a weekly newsletter of the most interesting public datasets he’s found. You should subscribe to it. He also has an archive of all the datasets he’s highlighted.
Google Dataset Search: Google indexes thousands of public datasets; search for them here.
J-PAL Catalog of Administrative Datasets: To assist researchers in screening potential data sources, J-PAL North America has cataloged a number of key US data sets.
Kaggle: Kaggle hosts machine learning competitions where people compete to create the fastest, most efficient, most predictive algorithms. A byproduct of these competitions is a host of fascinating datasets that are generally free and open to the public. See, for example, the European Soccer Database, the Salem Witchcraft Dataset or results from an Oreo flavors taste test.
360Giving: Dozens of British foundations follow a standard file format for sharing grant data and have made that data available online.
US City Open Data Census: More than 100 US cities have committed to sharing dozens of types of data, including data about crime, budgets, campaign finance, lobbying, transit, and zoning. This site from the Sunlight Foundation and Code for America collects this data and rates cities by how well they’re doing.
The Stanford Education Data Archive (SEDA) is “an initiative aimed at harnessing data to help us—scholars, policymakers, educators, parents—learn how to improve educational opportunity for all children.”
The Common Core of Data (CCD) is “the Department of Education’s primary database on public elementary and secondary education in the United States”.
Data from large-scale assessments such as NAEP, PISA, and TIMSS and PIRLS.
Replication files: Many academic journals now require that data (and code) be posted online, along with an article.
- Journal websites: If you read an academic article online, look for “supplemental materials” or the article’s “data appendix”. Here is a list of journals you may find helpful, many of which have adopted mandatory datasets/code deposits (or encourage them strongly).
- General interest journals, including Nature, and its partner journals Nature Human Behaviour and npj Science of Learning, as well as Science, PLoS One, and PNAS
- Psychology journals, including Psychological Science, Journal of Experimental Psychology, and Collabra
- Education policy journals, including the Journal of Research on Educational Effectiveness and Educational Evaluation and Policy Analysis
- Economics journals, including the American Economic Review, Quarterly Journal of Economics, AEJ: Applied Economics, AEJ: Economics Policy, AER: Insights, Economic Journal, Journal of the European Economic Association, Journal of Development Economics, and Journal of Human Resources
- Open Science Framework: Many researchers use this platform to register their studies; some upload articles and datasets. The OSF website includes a search function as well.
- The World Bank maintains a Reproducible Research Repository with packages from its Policy Research Working Papers, journal articles, and reports.
- The AEA RCT Registry allows you to download its metadata—that way, you can filter the Registry for any education experiments with publicly available data.
- Harvard Dataverse: This repository includes thousands of datasets from academic articles, including the repository for articles by J-PAL affiliates.
- Many researchers also post links to data and code on their websites—take a look at their lists of publications. Here is mine.
- Journal websites: If you read an academic article online, look for “supplemental materials” or the article’s “data appendix”. Here is a list of journals you may find helpful, many of which have adopted mandatory datasets/code deposits (or encourage them strongly).