EVERYTHING IS BROKEN

but by how much exactly?

code4lib 2020 pittsburgh | eric phetteplace | @phette23 | california college of the arts

unsplash-logoAntoine GIRET

Broken Full Text Links

Man looks at butterfly that says '404 page not found' and asks 'is this full text?'

report a broken link

Diagram of a broken link report: from Summon form to Wagtail app to Google Spreadsheet

Shout out to Robert Hoyt & Fairfield University, who provided code and the general architecture for this.
Code: Summon JS | Wagtail "broken links" app

why do links break?

  • inaccurate metadata in the DL index
  • inaccurate metadata in the content provider database, causing a mismatch with the DL index
  • a disagreement between DL index and content provider over geuinely debatable metadata values, such as the title of a book review
  • a granularity mismatch: DL index and provider disagree about whether a section should be one or many articles
  • the content provider has a poor openURL implementation, causing a link to fail due to unused or missing metadata fields
  • title-level links that do not go to the full text
  • missing content, the content provider doesn't have an item even though it should
  • an item is under embargo and the DL's naïve knowledge base doesn't account for that
  • we misconfigured EZproxy
  • we misconfigured our knowledge base, our rights statements are wrong
  • we deleted a catalog record but our local holdings haven't synced with the DL index yet

stop being weird

is this real life...



or just reporting bias?

unsplash-logoUnleashed Agency

broken links experiment

a series of Node scripts to

  1. randomly select queries from real user data
  2. obtain search results for those queries
  3. test result links for resolution
  4. compile summary statistics

example: reviewing links

Summon broken link review in action

{
    "ContentType": [ "Journal Article" ],
    "hasFullText": true,
    "inHoldings": true,
    "isFullTextHit": false,
    "IsPeerReviewed": [ "true" ],
    "isPrint": false,
    "IsScholarly": [ "true" ],
    "LinkModel": [ "DirectLink" ],
    "PublicationCentury": [ "2000" ],
    "PublicationDecade": [ "2010" ],
    "SourceID": [ "proquest", "crossref" ],
    "SourceType": [ "Aggregation Database" ],
    "link_check": {
    	"destination": "example.com",
    	"resolves_to_full_text": false,
    	"full_text": true,
        "notes": "can find article using a query of only its title"
    }
}
                    

results

Only 78.5% worked 😡
eventually located full text for 54.5% of broken links.

±3.72% with a 95% confidence level, N = 469

what we're doing

  • when we're to blame it's an easy, one-time fix
  • work with Summon support, try to be systematic
    • new linking strategy: pass only numeric metadata (volume, issue, ISSN) via OpenURL & omit title
  • cut toxic 🤢 links out of your life, identify & avoid problematic:
    • content types (Book Reviews)
    • platforms (Nexis Uni)
  • re-run broken link study under CDI

Problem Origins

Scene from The Office where Dwight, Andy, & Michael are in a finger guns standoff

the library, the discovery layer, content providers, metadata providers

why it's hopeless

  • the finite nature of human life and patience
  • no one can guarantee DL indexes/linking function
    • size is itself an obstacle to integrity
    • the breadth of potential errors is such that human intervention is necessary
  • vendors won't vet each others' linking & blame each other rather than work to mutual solutions (ODI)

OpenURL is broken

broken link icon by Drishya, CC-BY
  • assumes universal access to accurate metadata
  • there are platform-specific limitations even if an index has perfect metadata
    • example: it's impossible for metadata to uniquely identify certain articles in Nexis Uni, which uses only publication and title (no date)
  • platforms let OpenURLs fail gracelessly

little has changed

The number of problems discovered in full-text items that are linked via an OpenURL is discouraging; however, the ability of the Summon Discovery Service to provide accurate access to full text is an overall positive because of its direct link functionality. More than 95% of direct-linked articles in our research led to the correct resource. One-click (OpenURL) resolution was noticeably poorer, with about 60% of requests leading directly to the correct full-text item. More alarming, we found that, of full-text requests linked through an OpenURL, a large portion—20%—fail.

"Measuring Journal Linking Success from a Discovery Service" Stuart, K., Varnum, K., & Ahronheim, J. (2015)

Links & References