Many of our readers will have heard of the push for evidence based policy making at the federal level in the United States. The recent Foundations of Evidence Based Policy Making Act and the Federal Data Strategy have provided social scientists in general and economists in particular with a new opportunity to highlight the value of their data and their empirical work. Similar opportunities have appeared in other countries.
A major challenge in highlighting the value of data, however, is that it is currently almost impossible to find out which datasets are used by which researchers on which topics. RePEc is partnering with a new initiative that is combining natural language processing and machine learning techniques to automate dataset search and discovery from social science and economics publications. Some authors will start receiving an email from Christian Zimmermann this month asking them to validate the results of machine learning models. They can also contribute any additional links to the corpus right away at this link.
We hope eventually to automate the search and discovery of datasets and highlight their value as a scholarly contribution in the same way we collect information about publications and citations. The results should help inform government agencies about the value of data that they produce and work with, empirical researchers to find and discover valuable datasets and data experts in their scientific fields, and policy makers realize the value of supporting investments in data.
Thank you in advance for your support!
I was invited to the beta-test. It really should offer something like autocomplete for standard data like the Penn World Tables, World Bank Indicators, IPUMS etc … and an option for unique data only used in this particular paper.
Good point, but such a list of defaults would first need to be established, and it may be quite long.