RePEc in June 2022

July 11, 2022

CitEc is continuing a remarkable effort at citation matching, adding several million in a month. We welcomed a few new archives: International Association on Public and NonProfit Marketing, Yildiz Social Science Review, Bulletin of Political Economy, Spanish National Markets and Competition Commission (CNMC), Universidad ORT Uruguay. We counted 428,920 file downloads and 1,735,689 abstract views last month. And we reached the following milestones:
25,000,000 matched citations
1,000,000 journal articles with extracted references
600,000 working papers with extracted references
30,000 books available online
30,000 book chapters with citations
20,000 books with citations

How publishers can ensure their data looks right on RePEc

July 4, 2022

All material indexed in RePEc is provided by the respective publishers. They make this information available using a metadata syntax defined in 1997 by RePEc and that has not changed since, except for a few additions. But adhering to this syntax is important, as errors disqualify items from indexing and other problems may leads to various issues. If something is amiss or missing, every IDEAS or EconPapers page has an email contact listed for alerting the maintainer of the relevant data.

That said, RePEc helps the maintainers in various ways so that they can address proactively with any problems. They receive each month and email with various statistics and a link to their “problems” page on the EconPapers checker (add the three-letter archive code to the URL to get more details), which shows data download problems, detected syntax issues, and bad URLs to full text. EconPapers and IDEAS also provide FAQs. Also, re-reading the intial setup instructions or the ones for new maintainers can prove useful.

The most frequent issues that appears in the EconPapers checker are:

  • RePEc archive has moved from http to https: the maintainer needs to change the URL line in the archive template and alert someone in the RePEc team about the new location to fix the download process.
  • A series or journal is missing the correspondent series template.
  • A handle (identifier) is used multiple times. Handles are supposed to uniquely and permanently define any item in RePEc. Re-using them is a source of major problems.
  • Missing end-of-line that merges two fields.

Other problems cannot be detected through an automated process. Here, maintainers need to follow appropriate conventions or check that the visuals on the RePEc sites look right. Examples are:

  • Inappropriate use of a data field. Examples are putting a working paper number in a title, adding affiliations to an author name, putting an abstract in a title, or putting keywords and JEL classifications in the abstract. Each piece of information has its own field so can appropriate bibliographic records can be created.
  • Each author needs to be in their own author name field. Lumping them together in one field makes it impossible to attribute the work to registered authors.
  • When some work is available in multiple languages or is translated, each title goes into it own title fields instead of being merged into one. Also, the mention of the language goes into the language field, not in the title.
  • Errors in character encoding leads to records with funny looking characters. This happens by cutting-and-pasting strings from a file in one encoding to a file with a different encoding. Characters with accents (é, ñ, ü, ç, å), ligatures (ff, fi, ffl, æ, ß), non-latin character sets (cyrillic, arabic), and other special characters (long hyphens, Windows quotation marks and apostrophes) are especially problematic. They also make author or citation matching more difficult. The solutions are to fix these individually in the RePEc files, and if those are encoded as UTF-8 use and .redif extension instead of .rdf (be careful not to have both files in the RePEc archive, leading to duplicated handles).
  • No HTML markups should be present. The result in RePEc services and sites in unpredictable. The only exception is to be used to separate paragraphs in an abstract. The same applies to LaTeX or TeX markup.