How publishers can ensure their data looks right on RePEc

All material indexed in RePEc is provided by the respective publishers. They make this information available using a metadata syntax defined in 1997 by RePEc and that has not changed since, except for a few additions. But adhering to this syntax is important, as errors disqualify items from indexing and other problems may leads to various issues. If something is amiss or missing, every IDEAS or EconPapers page has an email contact listed for alerting the maintainer of the relevant data.

That said, RePEc helps the maintainers in various ways so that they can address proactively with any problems. They receive each month and email with various statistics and a link to their “problems” page on the EconPapers checker (add the three-letter archive code to the URL to get more details), which shows data download problems, detected syntax issues, and bad URLs to full text. EconPapers and IDEAS also provide FAQs. Also, re-reading the intial setup instructions or the ones for new maintainers can prove useful.

The most frequent issues that appears in the EconPapers checker are:

  • RePEc archive has moved from http to https: the maintainer needs to change the URL line in the archive template and alert someone in the RePEc team about the new location to fix the download process.
  • A series or journal is missing the correspondent series template.
  • A handle (identifier) is used multiple times. Handles are supposed to uniquely and permanently define any item in RePEc. Re-using them is a source of major problems.
  • Missing end-of-line that merges two fields.

Other problems cannot be detected through an automated process. Here, maintainers need to follow appropriate conventions or check that the visuals on the RePEc sites look right. Examples are:

  • Inappropriate use of a data field. Examples are putting a working paper number in a title, adding affiliations to an author name, putting an abstract in a title, or putting keywords and JEL classifications in the abstract. Each piece of information has its own field so can appropriate bibliographic records can be created.
  • Each author needs to be in their own author name field. Lumping them together in one field makes it impossible to attribute the work to registered authors.
  • When some work is available in multiple languages or is translated, each title goes into it own title fields instead of being merged into one. Also, the mention of the language goes into the language field, not in the title.
  • Errors in character encoding leads to records with funny looking characters. This happens by cutting-and-pasting strings from a file in one encoding to a file with a different encoding. Characters with accents (é, ñ, ü, ç, å), ligatures (ff, fi, ffl, æ, ß), non-latin character sets (cyrillic, arabic), and other special characters (long hyphens, Windows quotation marks and apostrophes) are especially problematic. They also make author or citation matching more difficult. The solutions are to fix these individually in the RePEc files, and if those are encoded as UTF-8 use and .redif extension instead of .rdf (be careful not to have both files in the RePEc archive, leading to duplicated handles).
  • No HTML markups should be present. The result in RePEc services and sites in unpredictable. The only exception is to be used to separate paragraphs in an abstract. The same applies to LaTeX or TeX markup.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: