by Anne E. Egger, Ph.D., Anthony Carpi, Ph.D.
Think about something you know and understand very well. Maybe you know everything about your favorite musical group, and when your friend asks you about them, you can list all of their songs and the band members’ names and maybe even something about their history. Maybe you even predict when their next big hit will come out, based on what you know. Your friend asks how you know so much, and you admit that you read a book about them, and have all their albums, and you keep up on their tour dates on their web page. You’ve been to their concerts and seen them perform. You are referencing your sources, explaining how you know the facts, and why you are so comfortable making a prediction about them – and your friend trusts your knowledge and thus gives your opinion some weight.
Scientists use references in much the same way, drawing on available information to conduct their research. But unlike you, when expressing your opinion about your favorite band, scientists are, in fact, obligated to provide the details about where they got that information. The scientific literature is designed to be a reliable archive of scientific research, providing a growing, stable base for new research investigations. When scientists present their new ideas and results to the community, they are expected to support their ideas with knowledge of the scientific literature and the work that has come before them. If they don’t show their understanding of the literature, it’s like you telling your friend that you love everything a particular band has done, even though you’ve only heard one of their songs. In short, the scientific literature is of central importance to the growth and development of science as a whole.
In its earliest stages, the scientific literature took the form of letters, books, and other writings produced and published by individuals for the purpose of sharing their research. For example, the Babylonians recorded significant astronomical events like lunar eclipses on clay tablets as early as the 6th century BCE (see our Research Methods: Description module). The Persian scientist Alhazen hand-wrote a seven-volume treatise on his experiments in the field of optics while he was under house arrest in Cairo, Egypt between 1011 and 1021 CE (see our Research Methods: Experimentation module). Much of Galileo Galilei’s ground-breaking work was published as a series of letters, such as his Letters on sunspots or the Letter to Grand Duchess Christina. Isaac Newton’s landmark Philosophiæ Naturalis Principia Mathematica was published as a series of books in 1686, largely paid for from the personal fortune of the English astronomer Edmund Halley.
Figure 1: Title page of the first issue of le Journal des Scavans.
Today, although scientists still publish books and letters, the vast majority of the scientific literature is published in the form of journal articles, a practice that started in the mid-1600s. In March 1665, the Royal Society of London (see our Scientific Institutions and Societies module) began publishing Philosophical Transactions of the Royal Society of London. The serial not only included a description of events that occurred at the weekly meetings of the Society, but it also included results from scientific investigations conducted outside of the Society meetings by its members. This publication was made available to other scientists as well as the general public, and thus it helped establish an archive of scientific research. Other journals in which scientists could publish their findings appeared around the same time. The French Journal des sçavans (translated as Journal of the savants – a “savant” is a member of a scholarly society) actually began publishing a few months before Philosophical Transactions but it did not carry scientific research reports until after (Figure 1). The Italian journal Saggi di naturali esperienzi (Essays of natural experiments) was first published in 1667 by the Accademia del Cimento in Florence. By the mid-eighteenth century, most major European cities had their own scientific society, each with its own scientific publication.
As the number of scientific journals expanded, they helped promote the progress of science itself. Whereas Newton had to seek a wealthy donor to fund the publication of his research, it was no longer the wealthiest or best-known individuals who had the ability to publish their findings. As a result, many more individuals were encouraged to take up the study of science and publish their own research. This in turn led to an explosion in the number of scientific studies that were conducted and the resulting knowledge that was generated from this research.
However, the expansion of the scientific literature also created challenges. As the knowledge base of science grew, it became more difficult to keep track of the discoveries that were made. By the eighteenth century, many journals also included abstracts or short summaries of scientific research papers published in other journals so that their readers could stay current with the latest scientific advances.
In 1945, Vannevar Bush, an American scientist and statesman, highlighted the importance of the archive of research contained within the scientific literature when, in an essay first published in The Atlantic Monthly, he wrote, “A record if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted”. Inspired by Bush’s essay, Eugene Garfield, an American scientist, founded the Institute for Scientific Information (ISI). In 1960, ISI introduced Science Citation Index, the first citation index for scientific scholarly journals. Science Citation Index makes use of the inherent linking characteristics of scientific papers: a single scientific paper contains citations to any number of earlier studies on which that work builds, and eventually it too is cited by future research studies. Thus, each published manuscript is one node in a network of citations. In making these networks explicit, Science Citation Index emphasizes a key aspect of the scientific literature – the way that it is continuously extended and builds on itself. Evidence that scientists consult that continuously growing record is seen in the reference list that accompanies every scientific journal article. Understanding how scientists utilize the scientific literature is a key component to understanding how science works.
In a lecture discussing the connections between scientific writing and scientific discovery, Frederic Holmes, an American biologist and historian of science, has said “When scientists refer to the ‘literature‘ of their fields, they have in mind something very different from what we mean when we talk of literature in general. The literature of a scientific specialty area is the accumulated corpus of research articles contained in the journals of the field, and it is regarded as the primary repository of the knowledge that defines the state of that field” (Holmes, 1987).
As Vannevar Bush noted, that literature is only useful if it is consulted, and scientists must make it clear in their own work when they have, in fact, consulted that “accumulated corpus of research articles”. You are probably familiar with the notion of citing sources, the way that, for example, a journalist indicates the experts that he or she consulted to write a news article. When scientists cite sources in their scientific journal articles, they are doing more than just showing which experts they consulted, however. Scientists consult the literature to learn all they can about a specific area of study, and then cite those articles to both acknowledge the authors as the originators of the idea they are discussing and also to help readers understand their line of reasoning in coming to their own conclusions.
Using the literature is an ongoing, iterative process for all scientists. For example, when beginning to conduct a geologic field investigation in the Warner Range in northeastern California, Anne Egger first did a search in GeoRef, a geosciences-themed database of journal articles, to see if anyone had published geologic maps or other investigations in this region. She did not want to duplicate any work that had already been done, and also wanted to see what information was already available. She first came across a paper published in 1986 by two geologists from the U.S. Geological Survey, where they presented their work on determining the ages of volcanic rocks in the region (Duffield & McKee, 1986). These data would be very useful in understanding the volcanic history of the region. In addition, she used a technique that many scientists do when searching the literature: she consulted the reference list in this paper, as it provided a wealth of additional papers for her to search. One such paper was a publication entitled “Basin Range Structure and Stratigraphy of the Warner Range, Northeastern California,” by Richard Joel Russell, published by the University of California Press in 1928 – this appeared to be the first published scientific investigation in this region (Russell, 1928). The USGS geologists had added more detail to Russell’s work, but only in the southern part of the range. Therefore, these and other resources helped Egger and her colleagues decide to focus on the central and northern parts of the range, where less was known about the geology. In addition, they helped define where there were still unanswered questions.
One such unanswered question was the origin of the sedimentary rock layers in the Warner (see Figure 2). Several geologists had noted the presence of granite cobbles in these sedimentary rock layers. Cobbles in general indicate that the sediments were deposited by a large river, but the presence of granite cobbles indicates something else: although granite is common in other parts of California, there is none nearby, so they had to be carried a long distance by that ancient river. By looking at the age and chemical make-up of the granite cobbles, Egger and her colleagues could compare them to granite in other areas and try to determine where the cobbles came from. They collected data in the field and in the laboratory, eventually preparing a scientific journal article about the work they did entitled “Provenance and paleogeographic implications of Eocene-Oligocene sedimentary rocks in the northwestern Basin and Range” (Egger, Colgan, & York, 2009).
Figure 2: Sedimentary rocks in the Warner Range. Photo by Anne Egger
The authors recognized that a number of different names had been applied to the sedimentary rocks they were investigating, and they wanted to make it clear to others how the terminology they were using fit into what others had done. In the excerpt that follows, they explain the historical progression of work in the region starting with the first investigation in 1928, and referring to articles along the way in order to show how their new work utilizes the previously established names:
|The Warner Range exposes a thick sequence of … sedimentary and volcanic rocks... The base of this sequence is primarily sedimentary and volcaniclastic; it was originally called the Lower Cedarville Formation by Russell (1928). Based on detailed field mapping in a portion of the range between Cedarville and Lake City, Martz (1970) subdivided the Lower Cedarville Formation into five units and mapped at least one unconformity within it. In their mapping of the South Warner Wilderness area between Granger Canyon and Eagleville, Duffield et al. (1976) did not subdivide the sedimentary sequence, though they alluded to the presence of at least three recognizable units based on composition, color, and vegetation. Myers (1998) and (2006) retained the nomenclature of Martz (1970) in paleofloral analyses of fossil assemblages in this sequence (Myers, 1998; 2006). Our new mapping in 2004 and 2005 confirmed the formation boundaries suggested by Martz (1970) and extended these subdivisions to the south between Cedar Pass and the South Warner Wilderness, and thus here we use those formation names.
This explicit acknowledgement of other scientists’ work shows that the authors examined the research archive in order to build on it, making use of the accumulated knowledge and understanding about the region in order to ask new questions about the sedimentary rocks. Later in the paper, the authors wanted to establish the age of the rocks they are describing. One kind of data that can help them make this determination is the fossils present in the rock, but this is not data they themselves collected. In this case, they cite papers where other scientists did look closely at the fossils:
|The Steamboat Formation includes two fossiliferous layers... At its base north of Cedarville, a well-documented floral assemblage marks the transition from the latest Eocene to Oligocene (Myers, 2006). The fossils occur in a 1 m-thick … siltstone that extends laterally (mainly to the south) approximately 7 km (Myers, 2006). … [and] include ferns and conifers that occur throughout the sequence…
Myers’ data about the fossils helped establish the age of the sedimentary rocks (Eocene to Oligocene, about 35 million years old). Building on this existing data, Egger and her co-authors could then show what the rivers were like in the region during that time. One of the kinds of data that they collected in the field is paleocurrent indicators, or measurements that show which direction the currents that deposited the sediments were flowing. In this case, they measured the orientation of granite cobbles in a channel, called imbrication (see Figure 3).
Figure 3: Joe Colgan measuring imbrication in cobbles in the field, and a close-up view of imbrications (right). The red lines indicate the orientations that the authors measured. Photos by Anne Egger
|Imbrication directions were largely consistent within a single … channel, but varied as much as 180 degrees between different channels. Data from Cottonwood Canyon exemplify this relationship: 17 measurements in a channel near the base of the exposure show a strong paleocurrent direction towards the NW, while 16 measurements in a bed approximately 30 m stratigraphically higher in the sequence show a bit more variability with an average paleocurrent to the ESE (Fig. 2). While braided rivers tend to display more consistency in their paleocurrent directions, a spread in paleocurrent directions of 180° is expected in a coarse alluvial fan or alluvial plain (e.g. Miall, 1977).
In the passage quoted above, the authors describe their own data (the measurements of the paleocurrent indicators), then suggest a possible reason or interpretation for this data – that this large variability in the orientation of the cobbles is typical of a river that is very broad and steep – an “alluvial plain”. They cite Miall to indicate that he was the first person to describe the finding that a “spread in paleocurrent directions,” or the fact that the cobbles were oriented in many directions, indicated the presence of a broad alluvial plain. Because he came to a similar conclusion in a different context, they are using the literature to find analogous situations and similar findings, to indicate that their interpretation is reasonable and show how it integrates into the existing research.
Throughout this paper and in scientific articles in general, the authors refer to the literature to do at least three key things: to indicate what other work has been done in the region or on the topic, to cite sources of data that they use, and to support their interpretation of the data (or show how their interpretation differs from previous interpretations). Citing these sources is an integral part of communicating research (see our Scientific Communication: Understanding Scientific Journals and Articles module for more information). Peer reviewers are usually familiar with the literature that authors are using, so one of their duties is to closely examine these references to see if the authors accurately describe their sources or if they missed any important sources (see our Scientific Communication: Peer Review module for more information about the peer review process).
In some cases, the literature itself can serve as source for data collection. This has been the case in paleontology, for example, where many investigations over the past several hundred years have involved publishing descriptions of fossil localities, including which species and genera are present in different rock layers. In 1982, John Sepkoski Jr. published a compilation of data of when individual species of marine fossils first appear in the rock record, and when they are no longer seen in rocks. These data came from thousands of published reports (Sepkoski, 1982). In several earlier papers, Sepkoski had analyzed these compiled data and, based on that analysis, developed new ideas about taxonomic diversity through time (for example Sepkoski, 1979). In 1984, Sepkoski and his colleague David Raup published a controversial paper on the apparent regular occurrence of mass extinction events through time (Raup & Sepkoski, 1984), based entirely on the collection of data from the published literature. This type of analysis – often called meta-analysis – could not be done without the reliable archive of research provided by the scientific literature. Meta-analysis is especially useful in fields like medicine and climate science, where the results of studies with disparate methods can be combined to yield more robust results.
Of course, our knowledge and understanding of the natural world continue to evolve, inevitably revealing some mistakes in interpretation in the existing literature, as well as causing some material and ideas to become out of date. Sepkoski recognized this likelihood, and in 1993, he published a paper entitled “Ten Years in the Library: New Data Confirm Paleontological Patterns” (Sepkoski, 1993). In that article, he notes, “As soon as the manuscript for the 1982 Compendium went to press, I began discovering new and old paleontological literature that changed times of origination and extinction … After publication…, the original data received special scrutiny from taxonomic experts, and embarrassing errors and promulgations of antiquated data were revealed.” Sepkoski collected the changes and reanalyzed the data. Interestingly, he found little difference in the conclusions about evolutionary patterns that he had published earlier (Sepkoski, 1993). For paleontology, this result has important implications – as Sepkoski states, “…the major patterns of … evolution are rather insensitive to new fossil discoveries and changes in taxonomic interpretation, indicating that analyses of transitory data can be robust, so long as a large component of the biosphere is being considered.” A similar conclusion can be drawn for the scientific literature as a whole, as well – though some mistakes get published, and our interpretations change, as a whole, the literature is robust and a reliable source of scientific data.
Staying current with the literature in one’s field is a challenge – far more research is being published every day than is possible to read. Many journals now send out email notices to subscribers when a new issue comes out, including the table of contents and links to each of the articles. This allows scientists to quickly browse a new issue and see if there is an article of relevance to their work. Very often, however, scientists have seen or heard preliminary versions of published articles through presentations at meetings or other interactions with colleagues at different institutions (see our module on Scientific Institutions and Societies).
Having access to the scientific literature is critical to doing science. Today, digital and online databases make it easier for people to search the literature and sometimes to access scientific journals articles. Access to the vast majority of journals, however, even digital journals, is limited by subscription, which may run into the thousands of dollars. As a result, scientists at institutions without the resources to pay for these subscriptions are at a disadvantage (Evans & Reimer, 2009). More recently, many journals are providing open access to their content after a set time period, often a year, in the case of Science magazine, and some provide open access from the very beginning, such as the Public Library of Science. This change reflects awareness that a diversity of viewpoints improves our scientific understanding, and that everyone should have access to the scientific literature.
The reason why access to the literature is so important is because it is a reliable archive of scientific research. The fact that it is reliable does not mean that every published paper is correct, but it means that progress in our understanding can be tracked through time. When mistakes or even fraud are discovered, a paper can be retracted, which removes it from the literature and ensures that the record continues to be reliable (see our module on Scientific Ethics). In this way, earlier ideas can be built upon or refuted, and multiple lines of evidence can accumulate that help scientists establish the “big ideas” of science – robust theories like plate tectonics, atomic theory, and evolution.hide
Duffield, W. A., & McKee, E. H. (1986). Geochronology, structure, and basin-range tectonism of the Warner Range, northeastern California. Geological Society of America Bulletin, 97(2), 142-146.
Egger, A. E., Colgan, J. P., & York, C. (2009). Provenance and paleogeographic implications of Eocene-Oligocene sedimentary rocks in the northwestern Basin and Range. International Geology Review, in press.
Evans, J. A., & Reimer, J. (2009). Open Access and Global Participation in Science. Science, 323(5917), 1025-.
Harmon, J.E., Gross, A. G. The Scientific Article: From Galileo's New Science to the Human Genome. Fathom, retrieved January 23, 2009, http://www.fathom.com/course/21722wd01730/index.html
Holmes, F. L. (1987). Scientific writing and scientific discovery. Isis, 220-235.
Raup, D. M., & Sepkoski, J. J. (1984). Periodicity of extinctions in the geologic past. Proceedings of the National Academy of Sciences of the United States of America, 81(3), 801-805.
Russell, R. J. (1928). Basin Range structure and stratigraphy of the Warner Range, northeastern California. University of California Publications in Geological Sciences, 17(11), 387-496.
Sepkoski, J. J. (1979). A kinetic model of Phanerozoic taxonomic diversity; II, Early Phanerozoic families and multiple equilibria. Paleobiology, 5(3), 222-251.
Sepkoski, J. J. (1982). A compendium of fossil marine families. Contributions in Biology and Geology, 51.
Sepkoski, J. J. (1993). Ten years in the library; new data confirm paleontological patterns. Paleobiology, 19(1), 43-51.
Anne E. Egger, Ph.D., Anthony Carpi, Ph.D. "Scientific Communication: Utilizing the Scientific Literature," Visionlearning Vol. POS-2 (7), 2009.