Analyzing any data about open access uptake – and by implication what’s not OA – can fall foul of different definitions and interpretations. In common with major indexes, we use data from Unpaywall to help us understand the access models in use across the literature. Here we unpick the definitions and color-coding in use (such as Gold and Bronze) so we can clarify the parameters of our market analysis. This discussion was originally posted as a companion to our preliminary 2020 OA uptake analysis, but supports terms used throughout our Open Access Data and Analytics Tool (OADAT).
What's in a name
The formal definition of Open Access was coined by the Budapest Open Access Initiative in 2002, which defines it as [scholarly] research literature's free availability on the public internet “without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” This is roughly equivalent to the Creative Commons Attribution (CC BY) license.
The combination of free of charge to read AND of permissive usage rights form the essence of the formal definition of Open Access. However, the contemporary use1 of “open access” can also cover just the “free of charge to read” subset of the formal definition, excluding permissive usage rights beyond fair use or similar.
More than mere pedantry
As we have discussed previously, we use data from Unpaywall2 to analyze per-article information about Open Access. The other major indexes (such as Web of Science, Scopus, and Dimensions) also use this data too.
In their paper3 examining OA uptake from an early data set, the Unpaywall team explore various definitions of open access. They state their use of open access to mean “free to read online, either on the publisher website or in an OA repository.” Given their aim to direct researchers to legitimately “open to read” articles, this definition is understandable. However, as it underpins their algorithms, and we need to take a different approach to fully analyze the market, we need to unpick it. Table 1 shows how Unpaywall classifies OA.
Table 1: OA status - Unpaywall detection model
The table below summarizes how Unpaywall sets its article OA status indicator.
The table separates the article types (rows) from the journal types (columns). The algorithm checks articles in a series of steps to determine their OA status. Note the use of colors to identify status and – as explained below – how licenses may not always be analyzed.
This gives a reasonable view of “open to read” articles. By focusing in on the “gold” and “hybrid” subset we can get to formally “Open Access” articles which include permissive reuse rights. However, as journals in this model are classed as either “fully OA” or “not” there is no separation of hybrid journals. So understanding hybrid take-up is not possible without further analysis.
Formally "Open Access" and Hybrid
To get to uptake of OA in hybrid journals, we need to separate out the journals that offer no OA options. We infer this from Delta Think’s own lists, which cover 70% of the articles indexed by Unpaywall since 2015. We can also look for patterns in Unpaywall data and make some intelligent guesses about the unknowns – e.g. we can infer a journal has no OA options if there is no OA output in it at all. We also have to correct errors in the fully OA journals. Unpaywall does a good job against a complex data set, but still we found cases where articles in supposedly fully OA journals that were classified as “hybrid” and vice versa.
The 12 combinations of article type and journal type mean we have to take decisions about how we categorise articles. Table 2, below, shows how our analysis adds formal Open Access classifications to the raw Unpaywall data.
Table 2: Open Access = free to read and free to reuse
The table separates the article types (rows) from the journal types (columns). It further splits “not fully OA” journals into “Not OA” journals (those with no OA options) from Hybrid journals.
Note that our method is not fail proof. Where we cannot identify a journal from a publisher’s price list, we infer its type based on its articles. Where Unpaywall cannot (or has not) identified open licenses, we assume a journal is not OA. We have seen examples of journals which promote themselves as “open access” but do not explicitly specify licenses for their content. In this case a journal would be classified as Not OA.
"Public Access" = free to read
There are some anomalies in the data in the real world. For example, technically, there should be no “Bronze” articles (articles without an OA license) in fully OA journals; in practice we have found some in the data. Formal OA definitions only apply to peer-reviewed content, however the Unpaywall/Crossref data also includes non-peer -reviewed content. It requires further analysis to separate the two. When we apply our corrections, we need to form a view on where to classify the free to read articles. This is shown in Table 3.
Table 3: Public Access Status
Again, the table splits article types and journal types, but this time with a focus on articles that are “open to read” without regard to their licenses.
Conclusion
For our analysis, we take “Open Access” to mean that articles can be read both free of charge and free of major restrictions on reuse. We use “Public Access” to refer to articles which are available free of charge, but which have restrictions on reuse.
Other sources may use “open access” to span both definitions as suits their purpose. We advise that readers seek clarity about which definitions are being used when interpreting data.
The role of licenses is crucial in a market context. APCs may vary depending on license type, and funder OA mandates may insist on permissive reuse rights. So we further distinguish “Repository only” for articles not available from publisher’s web sites, and “Paid access” for the paywalled ones. Together our four definitions span what we might term an article’s “Access Type.”
Finally, journals are core to publishing activities and economics. Initiatives such as Plan S take a position on parent journals as well. Our orthogonal “Journal Types” consider the major journal business models, so we can develop a full analysis.
1 Suber P. (Aug 2008) Gratis and libre open access. SPARC Open Access Newsletter, 124. http://nrs.harvard.edu/urn-3:HUL.InstRepos:4322580
2 “Unpaywall is an open source application that links every research article that has been assigned a Crossref DOI (more than 100 million in total) to the OA URLs where the paper can be read for free. It is built and maintained by Our Research (formerly Impactstory), a US-based nonprofit organization.” – The Future of OA: A large-scale analysis projecting Open Access publication and readership. Piwowar, Priem, Orr. Oct 2019. https://doi.org/10.1101/795310.
3 Piwowar, Priem, et al. (Feb 2018) The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles. PeerJ. https://doi.org/10.7717/peerj.4375.