Analyzing any data about open access uptake – and by implication what’s not OA – can fall foul of different definitions and interpretations. In common with major indexes, we use data from Unpaywall to help us understand the access models in use across the literature. Here we unpick the definitions and color-coding in use (such as Gold and Bronze) so we can clarify the parameters of our market analysis. This discussion was originally posted as a companion to our preliminary 2020 OA uptake analysis, but supports terms used throughout our Open Access Data and Analytics Tool (OADAT).

What’s in a name

The formal definition of Open Access was coined by the Budapest Open Access Initiative in 2002, which defines it as [scholarly] research literature’s free availability on the public internet “without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” This is roughly equivalent to the Creative Commons Attribution (CC BY) license.

The combination of free of charge to read AND of permissive usage rights form the essence of the formal definition of Open Access. However, the contemporary use1 of “open access” can also cover just the “free of charge to read” subset of the formal definition, excluding permissive usage rights beyond fair use or similar.

More than mere pedantry

As we have discussed previously, we use data from Unpaywall2 to analyze per-article information about Open Access. The other major indexes (such as Web of Science, Scopus, and Dimensions) also use this data too.

In their paper3 examining OA uptake from an early data set, the Unpaywall team explore various definitions of open access. They state their use of open access to mean “free to read online, either on the publisher website or in an OA repository.” Given their aim to direct researchers to legitimately “open to read” articles, this definition is understandable. However, as it underpins their algorithms, and we need to take a different approach to fully analyze the market, we need to unpick it.  Table 1 shows how Unpaywall classifies OA.

Table 1: OA status – Unpaywall detection model

The table below summarizes how Unpaywall sets its article OA status indicator.

The table separates the article types (rows) from the journal types (columns). The algorithm checks articles in a series of steps to determine their OA status. Note the use of colors to identify status and – as explained below – how licenses may not always be analyzed.

  • Unpaywall’s definition of “open access” covers anything that is “open to read”, as highlighted in the heavy outline.
  • Green: “Toll-access on the publisher page, but there is a free copy in an OA repository.” This would therefore exclude papers which are embargoed and not deposited in repositories (“Delayed OA”). This is an important subset, as papers would be affected under the zero-embargo rules such as those put forward by Plan S. The license is not checked, so “green” papers may not have permissive reuse terms.
  • Gold: “Published in an open-access journal that is indexed by the DOAJ [or in its own its own list of journals and fully OA publishers].” If the article is in such a journal, the license is not checked. For our purposes we assume a permissive license.
  • Hybrid: “Free under an open license in a toll-access journal”. Technically, this is any journal which is NOT identified as being in the fully OA list used for the Gold status. This will include hybrid journals, which offer authors a choice of open access options. However, it also includes fully closed journals, which do not offer authors a choice, but in which the publisher may occasionally and unilaterally choose to make content open access. The licenses span various CC licenses, and others; we assume they offer fully permissive reuse rights.
  • Bronze: “Free to read on the publisher page, but without a clearly identifiable [open] license.” “Bronze” is a term that appears to have been coined by Unpaywall – it most closely corresponds to public access, such as that currently required by the US OSTP. As with hybrid, this will mix genuinely hybrid journals with fully closed ones which make selected content free to read.
  • Closed: “All other articles, including those shared only on an ASN [Academic Sharing Networks, such as ResearchGate and Academia.edu] or in Sci-Hub.” Technically, legitimate access to an article is the first thing the Unpaywall algorithm looks for when classifying articles. If none is found, the article is considered closed. This means articles made open by ignoring copyright (“Black OA”) are ignored.

This gives a reasonable view of “open to read” articles. By focusing in on the “gold” and “hybrid” subset we can get to formally “Open Access” articles which include permissive reuse rights. However, as journals in this model are classed as either “fully OA” or “not” there is no separation of hybrid journals. So understanding hybrid take-up is not possible without further analysis.

Formally “Open Access” and Hybrid

To get to uptake of OA in hybrid journals, we need to separate out the journals that offer no OA options. We infer this from Delta Think’s own lists, which cover 70% of the articles indexed by Unpaywall since 2015. We can also look for patterns in Unpaywall data and make some intelligent guesses about the unknowns – e.g. we can infer a journal has no OA options if there is no OA output in it at all. We also have to correct errors in the fully OA journals. Unpaywall does a good job against a complex data set, but still we found cases where articles in supposedly fully OA journals that were classified as “hybrid” and vice versa.

The 12 combinations of article type and journal type mean we have to take decisions about how we categorise articles. Table 2, below, shows how our analysis adds formal Open Access classifications to the raw Unpaywall data.

Table 2: Open Access = free to read and free to reuse

The table separates the article types (rows) from the journal types (columns). It further splits “not fully OA” journals into “Not OA” journals (those with no OA options) from Hybrid journals.

  • The bolded box shows “formal open access” to cover only openly readable articles with OA licenses.
  • We can now add descriptive names to our Article Types (2nd column).
  • The same caveats for fully OA journals apply as before. If a journal is fully OA in our lists, we assume its contents to be permissively licensed. We have not checked license information within the Unpaywall data.
  • For Hybrid journals, any articles not under an OA license (as checked by Unpaywall) are deemed to be part of the journals’ subscription content.
  • Finally, there is a third bucket of OA content: OA articles in journals without an advertised OA option. These form a small but notable subset of output – e.g. 1 or 2% of some big-name journals.

Note that our method is not fail proof. Where we cannot identify a journal from a publisher’s price list, we infer its type based on its articles. Where Unpaywall cannot (or has not) identified open licenses, we assume a journal is not OA. We have seen examples of journals which promote themselves as “open access” but do not explicitly specify licenses for their content. In this case a journal would be classified as Not OA.

“Public Access” = free to read

There are some anomalies in the data in the real world. For example, technically, there should be no “Bronze” articles (articles without an OA license) in fully OA journals; in practice we have found some in the data. Formal OA definitions only apply to peer-reviewed content, however the Unpaywall/Crossref data also includes non-peer -reviewed content. It requires further analysis to separate the two. When we apply our corrections, we need to form a view on where to classify the free to read articles. This is shown in Table 3.

Table 3: Public Access Status

Again, the table splits article types and journal types, but this time with a focus on articles that are “open to read” without regard to their licenses.

  • Unpaywall coined the term “Bronze” to refer to articles not in fully OA journals that are free to read but without a verifiable OA license. We extend Bronze here to cover articles in any type of journal that are free to read but without a verifiable OA license.
  • This is important for understanding possible impacts on subscriptions. The more that’s free to read, the more downward pricing pressure on subscriptions. Our underlying data allows the separation of Bronze be journal type, so users of the OA DAT can further analyze free to read content in journals with subscription components.
  • The “in Fully OA” & “in Hybrid” numbers look at output “that might reasonably have an APC.”

Conclusion

For our analysis, we take “Open Access” to mean that articles can be read both free of charge and free of major restrictions on reuse. We use “Public Access” to refer to articles which are available free of charge, but which have restrictions on reuse.

Other sources may use “open access” to span both definitions as suits their purpose. We advise that readers seek clarity about which definitions are being used when interpreting data.

The role of licenses is crucial in a market context. APCs may vary depending on license type, and funder OA mandates may insist on permissive reuse rights. So we further distinguish “Repository only” for articles not available from publisher’s web sites, and “Paid access” for the paywalled ones. Together our four definitions span what we might term an article’s “Access Type.”

Finally, journals are core to publishing activities and economics. Initiatives such as Plan S take a position on parent journals as well. Our orthogonal “Journal Types” consider the major journal business models, so we can develop a full analysis.


1 Suber P. (Aug 2008) Gratis and libre open access. SPARC Open Access Newsletter, 124. http://nrs.harvard.edu/urn-3:HUL.InstRepos:4322580

2Unpaywall is an open source application that links every research article that has been assigned a Crossref DOI (more than 100 million in total) to the OA URLs where the paper can be read for free. It is built and maintained by Our Research (formerly Impactstory), a US-based nonprofit organization.” – The Future of OA: A large-scale analysis projecting Open Access publication and readership. Piwowar, Priem,  Orr. Oct 2019. https://doi.org/10.1101/795310.

3 Piwowar, Priem, et al. (Feb 2018) The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles. PeerJ. https://doi.org/10.7717/peerj.4375.