RePEc in October 2022

November 8, 2022

We had a first last month: not a single new RePEc archive. Does this mean that every working paper provider and every publisher is now participating in RePEc? We do not believe so and hope to see many more onboard. Still, we got good traffic, with 476,515 file downloads and 1,878,675 abstract views in October 2022. And we reached the following major milestone:

400,000,000 cumulative abstract views on IDEAS


How to create a (good) PDF

November 4, 2022

You may read the title of this blog and think, “Elementary.” Before making that assumption based on years of experience creating PDFs for sharing papers, take a moment to consider the notion of a good PDF. Despite its namesake of Portable Document Format, PDF isn’t fully portable. The look and feel, and sometimes even meaning of a document won’t transfer across operating systems unless fully self-contained in the file. For instance, unless native to the system rendering a font, a different font will render. For example, a document created with ITC Symbol Medium, a proprietary TrueType font may render differently between PDF viewers, losing the intended meaning. Let’s avoid this embarrassing mishap and create a good PDF.

Screen Shot 2022-11-04 at 8.55.24 AMPage from Federal Reserve Bank of Dallas 1999 Annual Report, rendered in Firefox v.103.0.1 PDF viewer. Text appears in Latin script and characters are spaced appropriately.

Screen Shot 2022-11-04 at 8.56.46 AM
Page from Federal Reserve Bank of Dallas 1999 Annual Report, rendered with PDF.js iframe v.2.9.359. Text appears in Greek rather than Latin script and characters are spaced appropriately.

Screen Shot 2022-11-04 at 9.18.43 AM
Page from Federal Reserve Bank of Dallas 1999 Annual Report, rendered with Preview v.11.0 (used on MacOS). Text appears in Latin script and characters are spaced far apart.

A short history of PDF

What is PDF? In 1991, John Warnock, co-founder of Adobe Inc. ideated a universal format to communicate visually meaningful information across operating systems. Adobe realized this technology in 1992 as the Portable Document Format, or PDF. In 2008, Adobe’s proprietary file format was standardized as ISO 32000 and is based on PDF v. 1.4. The latest version of ISO 32000 was released in 2020 and details PDF v. 2.0. PDFs are electronic documents that are either digitized or born digital—i.e., digital surrogates of a physical document or documents created with a digital editing software, respectively.

​Key components

  1. Portable look and feel between operating systems—inherent to PDF
  2. Embedded structure and semantics—enabled with Tagged PDF
  3. Fully self-contained—defined by PDF/A

The world of PDF is vast. For the purpose of this post, we’re thinking about PDF as a format for disseminating born digital scholarly papers, and a good PDF is understood as one that is accessible and self-contained.

About the Good PDF

In addition to the standard PDF, there are PDF extensions or subsets, including PDF/E, PDF/VT, PDF/X, PDF/UA, and PDF/A. PDF/E, PDF/VT, and PDF/X specify requirements that optimize publishing and printing and are largely focused on handling complex graphics and layout, whereas PDF/UA and PDF/A are more generalized to any type of content and focus on how that content exists and is presented in the PDF. Standardized in 2012 as ISO 14289, PDF/UA is a “Universally Accessible” PDF variant that requires content blocks to be tagged, making them navigable for screenreaders. Tagged PDF defines structure and semantics so that the content is not only machine readable, the order of content is meaningful. PDF/A or PDF-Archival is standardized in ISO 19005 as a format for long-term preservation of electronic documents. PDF/A is defined in four versions, as well as three levels of conformance to those versions. Together, the versions and conformance levels are flavors of PDF/A. These flavors do not suggest preference; they are simply variants that provision different levels of flexibility as to what can or cannot be contained within the file. In addition to content tagging, PDF/A limits the types of content—or objects—that can be included in a PDF.

​As aforementioned, in the world of PDF, there are two types of documents: digitized and born digital. Asserting tagging on digitized documents requires manual tagging of the file that is time consuming and often not possible due to the nature of how documents are digitized. As such, digitized documents generally forgo the tagged PDF requirement, taking the PDF/A-b (basic) conformance level that does not require tagging. This is an inherent vice of print material. Born digital content, however, can easily be tagged, as meaningful structure is built into word processing software and can be understood by PDF creation software. When possible, born digital documents should conform to PDF/A-a (accessible).

​Now that you’ve decided on the conformance level, what version should you use? Subsequent versions of ISO 19005 consider the evolution of documents and standards. For example. PDF/A-1 does not permit embedding of certain image and content objects, including JPEG2000 and 3D images, and CAD drawings. These embedded objects are permitted in later versions of 19005 as standardization, support, and uptake around those previously prohibited types of content increased. Repositories prefer and may prohibit later versions of PDF/A because they are more flexible and, thus, have been considered less preservation-friendly.

How to create a Good PDF

​Considering the type of content and your born digital file, PDF/A-1a (version 1, accessible) is the preferred PDF/A flavor for working papers. However, due to the landscape of version preference and software support PDF/A-1b (version 1, basic) may be the only possible version to achieve, as not all software support tagged PDF, and adding that structure would need to be done with a PDF creation software.

​Word processing software, such LaTeX, Microsoft Word, and LibreOffice, have built-in functions that create PDF derivatives from the native file format—i.e., .docx, .tex, and other word processed formats. Additional steps are needed to create a PDF/A, and listed are some guides for creating your PDF/A-1a or -1b.

LaTeX

Microsoft Word

LibreOffice

Because software and software uptake changes, there is no universal guide for creating a PDF/A. These guides should send you in the right direction. While software may create a seemingly good PDF/A, you can complete manual and automated validate to ensure that your PDF/A is compliant.

Validation software

Manual checklist

  • Is meaningful descriptive metadata embedded?
  • Did fonts embed as expected or are there visual discrepancies?
  • Is the content ordered correctly so that the document can be read by a screenreader?

Next steps

Create a good PDF and be a steward of accessible and sustainable research dissemination throughout the working paper lifecycle.

Further reading

​Oettler, A. (2013). PDF/A in a Nutshell 2.0. PDF Association. https://www.pdfa.org/resource/pdfa-in-a-nutshell-2-0/