How to contribute data to CitEc

November 26, 2019

CitEc is the RePEc citation indexing service. CitEc extracts reference data from documents directly or from data about references provided by publishers. Then CitEc links references to find citations between documents available in RePEc. All data produced by CitEc is freely available. It is distributed to other RePEc services to enrich the services provided to researchers and authors. It is used to build citation profiles for registered authors, for working paper series and journals. I created CitEc in 1998. Nowadays, it contains over 41 million references and 14 million citations. This may be impressive numbers. But the data comes from 1.376.000 papers. Note that there are 2.737.000 downloable items in RePEc. Thus, I have been able to process only the 50% of the downloable items.

There is still a lot of work to do. You can help. Let’s see how.

1.- Providing references

One important approach to get references is to use the full text of documents. In many cases, we can extract that from the PDF files when we have them. However, PDFs are often behind tool gated portals. That is often due to economic issues like payment licenses. Sometimes it is due technical barriers, like the archive maintainer not providing an URL with direct access to the PDF file.

In these cases, you can help by submitting the full list of references cited. You do not need to be the author of the paper to submit the references. You may, for instance, submit references for a document that cites your work.

Thus, go ahead and use the web user input form. Thanks!

2.- Providing citations

Citations are relationships between two documents. Both the citing and the cited document must have a RePEc handle, thatis, be indexed in RePEc. The majority of citations are identified automatically by CitEc software. In addition, there are several ways to contribute citations to the database when the system has failed to find them.

A.- Register with the RePEc Author Service and use the search engine to add citations to your profile

B.- If you already know the paper which cites your work, follow the instructions in our FAQ (3.5)

C.- Otherwise, you can use the main CitEc search engine to look for the cited work in the references database. Just enter one author’s
surname and publication date for the cited work. You will get two types of results: citations and references. When you see a citation, CitEc has been able to match the reference to the cited document in RePEc. When you see a reference, CitEc has not been unable to find the cited document in RePEc. In some cases, the cited document is in RePEc but the system has not found it. You can help us to solve the problem by providing the link to the cited document. Just click on the “add citation now” link and give the handle of the cited document.

Many thanks for your contribution. If you have any question contact us at citechelp. Also, if you would like to get more involved with citation analysis, contact us!


CitEc API

February 11, 2016

The CitEc project has launched an Application Programming Interface (API) to enable external applications to query the CitEc database and obtain citation data through a simple web interface. It allows to retrieve three different types of data for each document: plain, AMF (Academic Metadata Format) and citedby.

  1. Plain XML data about cites of a single document. This data should be processed by the API client before be presented to the user.
  2. AMF metadata for cites and references (if available) for the document. The XML response is an AMF record. More details about the AMF schema is available at: http://amf.openlib.org/doc/ebisu.html.
  3. Citedby shows the cites for the document. By default, the XML output is transformed through an XSLT style sheet to generate an human readable page.

The CitEc API is addressed mainly to:

  • Institutions providing data to RePEc (RePEc archives). The API could be used to insert in their web pages the number of citations of each document.
  • Researchers who want to use CitEc data in their bibliometric research. It provides an easy way to get basic data about documents and citations. Note that such researchers also could ask us to provide the data in the customized format they need in order to reduce even more the processing time.

Look at http://citec.repec.org/api.html for more information and examples.

Note that beyond CitEc, IDEAS provides also an API for other parts of the RePEc database.

Enjoy!


New format for CitEc author and series profiles

May 14, 2014

In the past weeks, CitEc, RePEc’s citation analysis website, has released new authors and series citation profiles with improved features.

Series profiles:

  • Data coverage: 1990 – 2013
  • New indicators: Cumulative number of documents published until year y, Cumulative number of citations to papers published until year y, Cumulative impact factor.
  • New graphs: Citations by publication year, cumulative citations and cumulative documents published.

Authors profiles:

  • New profile layout
  • New indicator: i10-index. Number of works with at least 10 citations.
  • Included related authors: In addition to the co-author relationships, now we include links to researchers citing and cited by the author being analyzed
  • Added a new section with recent citing documents. It is possible to identify who has cited the author in the last two years.
  • New graphs: evolution of author’s h-index and citations received by publication year.
  • Authors can upload a picture to complete their profiles

You can have a look at some examples of the new profiles:


CitEc has a new sponsor

December 12, 2013

As other RePEc services, the citation project CitEc is based on the volunteer work of the developers. Following the business model of the open source movement, CitEc is thus able to work without funding. The only costs of the Project are those related to the hosting of the server.

Since the beginning the server has been a physical machine owned by CitEc and hosted in a research institution. This year CitEc has moved to cloud computing by renting a server in a commercial company. We hope this new approach will improve the management of CitEc by reducing the problems related to technical restrictions imposed by the hosting institutions.

Over the past five years the hosting services were provided by the Valencian Economic Research Institute. We are very grateful to them for this support and look forward to continue the collaboration in the future.

Starting this year the new sponsor for CitEc is INOMICS. INOMICS is an international service for students and professionals in economics and finance. They offer a search for conferences, jobs, programs, courses and economics resources that can be accessed online (including searching through the RePEc database), or you can have your customized updates delivered to your inbox via their weekly email alert service.

We expect this partnership to be long and successful. Thanks INOMICS for your support!


New CitEc features

September 24, 2012

In the past months we have added some new features to the Citations in Economics service:

References input service

Many documents in CitEc cannot be automatically processed due to a variety of reasons: they are not open access, not in PDF format or the PDF file can not be converted to text. Although some publishers provide us access to gated references, many are still missing. Often we get requests from authors asking why a citation to one of their papers is not included in CitEc. The answer is always the same: because the citing paper has not been processed. If this is your case, it is now possible to provide CitEc with the missing references and they will be processed. We ask, though, that all references from the citing paper be provided. Incomplete reference lists will not be considered. The lists of references and the contributor will be made public. The input form can be found here or from any IDEAS abstract page.

Add citation now

In some cases a paper cites a document available in RePEc but the system is not able to identify it as a RePEc item. For each reference not automatically linked by the system, the user may now add the handle of the cited document. All citations submitted through this feature are monitored to check if it is correct or not. A link to this form can be found from any IDEAS abstract page.

Citation profiles for authors

CitEc now provides citation profiles for authors. For each registered author in the RePEc Author Service, we provide a profile with her scientific production and the number of citations of each paper. Also we provide some indicators like the h-index and information about recent co-authors. For an example look at: http://citec.repec.org/p/z/pzi1.html. Note that this is work in progress, and the statistics on this page are not yet adjusted the way they are for the ranking statistics (versioning, self-citations).

New design for series pages

We have changed the format of the citations and production graphics. Also the papers bibliographic data is presented in a clearer way. An example at: http://citec.repec.org/s/2010/miewpaper.html

Included historical data for series pages

The time series for series citation data now goes back to 1990. Citations, document production and impact factor for all years is provided.

Use of persistent URLs

Now it is possible to access the citation data for authors and documents using short and persistent URLs like:, http://citec.repec.org/RePEc:mie:wpaper:382 or http://citec.repec.org/pdu7. To create such URLs simply add to http://citec.repec.org/ the paper/article handle or RePEc Author Service Short-ID.


The citation extraction process in CitEc

January 16, 2008

CitEc is an experimental autonomous citation index, that is, it is a software system which is able to automatically extract references out of the full texts of documents and create links between citing references and cited papers.

With its last update, the CitEc database has reached almost three million references and more than one million citations between documents available in RePEc. This is an important threshold but still is far of being a complete set of citations. There are some limits in the references extracion process:

First, the system needs to have open access to a electronic version of the documents full text. Many journals listed in RePEc have restricted access and therefore are excluded of CitEc unless they grant special access or push the citations to RePEc in other ways. We are working with some publishers that kindly provide us with metadata about references. We try to get on board as many publishers as possible but unfortunately not all of them are willing to collaborate with us at this time. As a result, the data set is still made up mainly of references extracted from working papers. This has the advantage of provide the most updated data about citations since working papers contains the most recent research results.

Second, the URL provided by the RePEc archive maintainer must be correct and must point to the PDF file containing the document full text and not to an intermediate abstract page or similar. Some archives provides this kind of links to force the researchers to pass through their institutional web pages. The system is unable to follow the links to the hidden papers and they are missed in the references extraction process.

The third limit is more technical. In order to extract references, the PDFs files need to be converted into plain ASCII text. This step is key to successfully complete the process, since a good quality text representation of the document makes easier the identification of references. There are a wide variety of PDF files created in different ways and not all of them can be converted.

Finally, the systems does a parsing of the references section, which first needs to be isolated, to identify each reference and split it in its parts: title, author, year, etc. The parsing is done using pattern matching techniques which in some cases are not able to identify the full list of existing references.

As the las update as of December 31, 2007, the CitEc numbers are: 527,357 articles and working papers available in RePEc. Of them, 343,441 cannot be processed by the system due to limitations mentioned in the first two points above, namely:

101,886 have not an electronic representation

216,110 have restricted access

19,174 have not a direct link to the docuent full text

6,271 have wrong url

That leaves an amount of 183,916 documents available to be processed by CitEc. Of them, the process was successfully completed in 134,130 papers, that is the 73% of the available documents. The complete list of sources and the number of processed documents for each series or journal is available here.

All the previous considerations should be taken into account when CitEc data is used for scientific evaluation purposes. We still consider the data to be experimental.

From the point of view of RePEc archive maintainers there are a few basic steps they can take to improve the situation. For example:

  • provide direct and correct URLs to the documents full text
  • make use of the X-File-Ref to give the system an ASCII version of the references section of a particular document
  • help us to lobby the publishers and editors of the restricted journals asking them to send us metadata about references.