Buzz Blog

arXiv-ing the History of Preprints in Physics”

Tuesday, July 28, 2020

By: Hannah Pell

“The American Physical Society (APS) has a vision of the future of physics publishing, in 2020 or so.” So begins a 1993 Science article titled “Publication by Electronic Mail Takes Physics by Storm.” Burton Richter, then-president of APS and former head of the Stanford Linear Accelerator Center (SLAC), elaborated: “Any physicist, any place in the country, can turn on his computer and for free browse through the table of contents of any APS journal. [The browser] can select those things about which he wants to see an abstract, and then, after deciding what he might read, ask for the article itself and eventually pay for it like you pay your telephone bill.”

What was then a vision would in fact be our reality in 2020. In the early 1990s, physicists were on the cutting edge of revolutionizing how academic papers were shared and published. Scientists were then working within a context of seismic shifts in computational technology and seeing the early foundations of the internet. In fact, physicists were highly involved in these discoveries — Tim Berners-Lee invented the World Wide Web in 1989 while working at CERN, which was originally designed for the sole purpose to automate information sharing between universities around the world. (You can even visit a recreation of the first ever website: info.cern.ch).

Homepage of the first website — info.cern.ch — recreated by CERN.

Today, many of the browsers who Richter described back in 1993 head to the arXiv (pronounced “archive”) as their source. The arXiv is the central repository for preprints — a full draft of a research paper before its official publication in an academic journal. The arXiv is open-access, meaning that the preprints are freely accessible to anyone who wants to read them.

How did the arXiv come to play such a central role in how physicists distribute their research?

How did arXiv.org get started?
Around 1991, theoretical physicist Joanne Cohn started circulating preprints to colleagues over email as TeX files, which were a relatively new format at the time. “I'd ask the authors for their papers, send them to the people who had asked, and then send them as well to people who had asked for other papers previously, or whom I thought might be interested,” Cohn recalls. “I began to systematically expand the number of names on the list I was sending papers to. I also expanded the role of the mailing list from just a list which received papers I had, to a group of people who both received and contributed papers. In this way, it became a way for people to exchange papers more generally.” Cohn’s mailing list eventually reached almost 180 fellow physicists in more than 20 countries. Those numbers may seem trivial by today’s standards, but back then, an email list of that size quickly became difficult to handle.

There had to be a more efficient way. During the summer of 1991, Cohn was attending a string theory workshop at the Aspen Center for Theoretical Physics. Paul Ginsparg, a fellow theoretical theorist then-based at Los Alamos National Laboratory, was also in attendance. After hearing about Cohn’s efforts, he offered to automate the system. Rather than sharing the papers among one another over email (which often easily flooded physicists’ disc storage allocations), Ginsparg envisioned an online repository in which all preprints could be stored and freely accessed. That very August, Ginsparg created hep-th@xxx.lanl.gov, an online bulletin board to which physicists could submit their preprints or browse a list of preprints that were already in the system. The arXiv was born.

“Day one, something happened, day two, something happened. Day three, Ed Witten posted a paper,” said Ginsparg, who has been based at Cornell University since 2001. “That was when the entire community joined.”

A screenshot of the arXiv taken in 1994. Photo from cs.cornell.edu.

Costs before arXiv
By the time the arXiv was up and running, physicists had already long relied on preprints for keeping up with the latest findings and research trends. A “preprint culture” was already in place, especially so in the high-energy physics sub-discipline. Before Cohn’s email list and Ginsparg’s automated server, preprints were printed as physical copies and mailed around to various physics departments at different institutions. Ginsparg pointed out in a 1994 article for the Computers in Physics journal that high-energy groups “typically spent between $15,000 to $20,000 per year on photocopy, postage, and labor costs for their preprint distribution.”

“The paper preprint system had a serious shortcoming. If your institution wasn’t on other institutions’ mailing lists, you wouldn’t see the preprint. With its free online distribution, arXiv is fairer and more democratic,” Charles Day, editor-in-chief of Physics Today, noted in 2018.

In addition to the monetary costs were opportunity costs. The time it took for the papers to be physically printed and distributed could take several months. The publication timeline for traditional journals sometimes took years. Even with more recent advances in digital publishing, a 2016 article published in Nature noted that the median review time for several journals was actually increasing. All the while, other researchers could be working on identical experiments or discover new, related insights that could impact the work.

The arXiv has been invaluable for decreasing (even nullifying) these costs. Preprint servers have grown so substantially that they directly challenge aspects of the traditional publishing process. Many publishers were pressured to experiment with their own open-access models in order to compete. The establishment of the arXiv has since brought forth new opportunities and challenges within the world of science publishing.

The Pros and Cons of Preprint Servers
One of the most often-cited benefits of open-access preprint servers is how they’ve increased accessibility to and democratization of new information. Before the arXiv, access to preprints was limited only to those within institutional physics departments or those who were able to join the email lists. After, anyone with an internet connection could follow the latest developments. “[The arXiv] had an immediate impact on physicists in less developed countries, who reported feeling finally in the loop, both for timely receipt of research ideas and for equitable reading of their own contributions,” Ginsparg wrote in 2011 on the 20th anniversary of the arXiv. “I still receive messages reporting that the system provides to them more assistance than any international organization.”

However, new systems bring about new challenges. Preprints are not formally peer-reviewed before being shared. The “About the arXiv” page reminds users that the “the content of arXiv submissions are wholly the responsibility of the submitter and are presented ‘as is’ without any warranty or guarantee,” yet this doesn’t guarantee that all browsers will be able to separate the “bad” science from the good. Although it’s true that the peer review process is by no means perfect, it remains an essential step in the traditional journal publishing process — and intertwined with their prestige.

Despite the various pros and cons, one thing remains clear: the arXiv and other preprint servers have ignited debates about open-access and digital publishing, and their influence has only continued to grow.

The ArXiv Today

As of June 28, 2020, the arXiv boasts 1,723,404 total uploads. The arXiv new includes papers from several other disciplines, including computer science, electrical engineering, economics, statistics, mathematics, and quantitative biology and finance. In recent years, other versions of the arXiv have been created for biology (bioRxiv) and health sciences (medRxiv). The sheer quantity and breadth of openly accessible science in these repositories is simply beyond comprehension.


Submissions to the arXiv as of June 28, 2020: 1,723,404.
The development of the arXiv ushered physics scholarship into the future. It flourished because the “preprint culture” was already well-established within physics and because it allows physicists all over the world to share their work freely, quickly, and efficiently. Although not without its imperfections, the arXiv is the central location for physics research today. One can’t help but wonder: will there someday be another revolution in scientific information-sharing?







Posted by Rose Villatoro