Preserving
 The
 Integrity
 of
 The
 Scholarly
 Record
 
http://www.flickr.com/photos/shinez/5000985919/
Pet...
Preserving
 The
 Integrity
 of
 The
 Scholarly
 Record
 
http://www.flickr.com/photos/shinez/5000985919/
Pet...
The
 Scholarly
 Record
 &
 Serials
 …
 [a
 focus
 on
 the
 digital]
 
 
 
‘The
 Scholarly
 	...
The
 Scholarly
 Record
 &
 Serials
 …
 [a
 focus
 on
 the
 digital]
 
 
 
ConAnuing
 
 
Reso...
The
 Scholarly
 Record
 &
 Serials
 …
 [a
 focus
 on
 the
 digital]
 
 
 
ConAnuing
 
 
Reso...
The
 Scholarly
 Record
 &
 Serials
 …
 [a
 focus
 on
 the
 digital]
 
 
 
ConAnuing
 
 
Reso...
1.  What
 exactly
 is
 the
 scholarly
 record?
 
•  What
 of
 that
 now
 ‘issued
 on
 the
 Web’?...
An Article, once available in print
on-shelf locally …
… is now online & accessed
remotely,
‘anytime/anywhere’
=> Improved...
Libraries boast of ‘e-collections’,
but maybe now they only have ‘e-connections’
Picture
 credit:
 hgp://somanybooksbl...
This is a global challenge: trans-national action
%age of 132,806 ISSN issued for e-serials (December 2013)
US:
 20%
 ...
So, who is offering digital shelving?
①  Web-scale not-for-profit archiving agencies:
②  National libraries …
③  Research ...
Many archiving organisations a Good Thing
“Digital information is best preserved by replicating it at multiple
archives ru...
A
 Project
 to
 
 
Pilot
 an
 
 
E-­‐journal
 
 
PreservaAon
 
 
Registry
 
 
Service
 
Need ...
ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
user requirements
(a)
(b)
ISSN-­‐L
 as
...
ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
user requirements
(a)
(b)
ISSN-­‐L
 as
...
…
 to
 discover
 who
 is
 looking
 a5er
 what
 
thekeepers.org as Global Monitor
*New
 in
 2014*
 
...
e-­‐journals
 should
 be
 easy
 
 –
 right?
 
 
the
 Keepers
 Registry
 recorded
 
 
In
 2011,...
“Are we there yet?” … “Don’t think so”
‘Ingest Ratio’= titles being ingested by one or more Keeper
/ ‘online serials’ in I...
Evidence
 on
 what
 libraries
 care
 about
 
Using
 Title
 List
 Comparison
 tool
 in
 Members
 ...
very
 many
 ‘at
 risk’
 e-­‐journals
 from
 many
 small
 publishers
 
BIG
 
 
publishers
 
 
act...
…
 logs
 for
 the
 UK
 OpenURL
 Router*
 
•  8.5m
 full
 text
 requests
 in
 UK
 during
 2012	...
…
 logs
 for
 the
 UK
 OpenURL
 Router*
 
•  8.5m
 full
 text
 requests
 in
 UK
 during
 2012	...
Another threat to the integrity of the record
 
Language Technology Group
 
Funded by the Andrew W. Mellon Foundation
...
Link Rot
‘Link Rot’
 
+ Content Drift: What is at end of URI has changed, or gone!
http://dl00.org
2000
http://dl00.org
2004
http://dl00.org
200...
Hiberlink: Time Travel for The Scholarly Web
1.  Threat: Creating evidence on extent of ‘Reference Rot’
–  Main focus: ref...
Peter Burnhill,
EDINAhgp://www.res|ulliving.com/wp-­‐content/uploads/2013/12/Time-­‐1024x861.jpg
 
Preserving
 the
 ...
•  Robust Link - re-factor the HTML link that is returned
‘Infrastructure’ to Enable Remedy
<a href="http://www.bnf.fr">
L...
Remedy for The Integrity of The Scholarly Record
Envisage
 the
 best
 opportuniAes
 for
 IntervenAon
 to
 ma...
Hiberlink Plug-in: help authors & middle-folk do the right thing:
①  Triggers archiving of referenced web content when it
...
Time’s Up!
thekeepers.org
hiberlink.org
•  See also
•  thekeepers.blogs.edina.ac.uk
•  safenet.blogs.edina.ac.uk/
HelpDesk...
of 31

Preserving the Integrity of the Scholarly Record

Presentation delivered by Peter Burnhill at the Edinburgh Digital Preservation meeting at the National Library of Scotland on 16 February 2015.
Published on: Mar 4, 2016
Published in: Education      
Source: www.slideshare.net


Transcripts - Preserving the Integrity of the Scholarly Record

  • 1. Preserving  The  Integrity  of  The  Scholarly  Record   http://www.flickr.com/photos/shinez/5000985919/ Peter  Burnhill,    EDINA   @  University  of  Edinburgh   NaAonal  Library  of  Scotland     George  IV  Bridge     5.30pm  16th  February
  • 2. Preserving  The  Integrity  of  The  Scholarly  Record   http://www.flickr.com/photos/shinez/5000985919/ Peter  Burnhill,    EDINA   @  University  of  Edinburgh   NaAonal  Library  of  Scotland     George  IV  Bridge     5.30pm  16th  February     Take  Home  Message:   1)  Archive  Streams  of  Issued  Content   2)  Avoid  Reference  Rot
  • 3. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ‘The  Scholarly     Record’  has  a     fuzzy  edge   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’
  • 4. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ConAnuing     Resources,     inc.  Serials     ‘The  Scholarly     Record’  has  a     fuzzy  edge   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’
  • 5. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ConAnuing     Resources,     inc.  Serials     ‘The  Scholarly     Record’  has  a     fuzzy  edge   Issued  in  Parts     (Serials)   Content  changes     over  Ame     (IntegraAng)   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’
  • 6. The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]       ConAnuing     Resources,     inc.  Serials     ‘The  Scholarly     Record’  has  a     fuzzy  edge   Other  ‘resources     needed     for  scholarship’   Issued  in  Parts     (Serials)   Content  changes     over  Ame     (IntegraAng)   ‘e-­‐journals’   Websites,     Databases,     Repositories   ‘Book-­‐length  work’   ‘Gov  Docs’
  • 7. 1.  What  exactly  is  the  scholarly  record?   •  What  of  that  now  ‘issued  on  the  Web’?   •  And  what  if  we  limit  focus  to  what  could  get  an  ISSN?   2.  Whose  responsibility  is  it  to  act  as  steward?     Each  research  library;  library  consorAa;     naAonal/state  libraries/archives?   &  is  this  a  naAonal,  or  a  trans-­‐naAonal  challenge?     The  following  quesAons  are  implicit:
  • 8. An Article, once available in print on-shelf locally … … is now online & accessed remotely, ‘anytime/anywhere’ => Improved Ease of Access J But what of Continuity of Access? Will it be still be there tomorrow?
  • 9. Libraries boast of ‘e-collections’, but maybe now they only have ‘e-connections’ Picture  credit:  hgp://somanybooksblog.com/2009/03/27/library-­‐tour/   => real & present danger for the integrity of what is published as scholarly record
  • 10. This is a global challenge: trans-national action %age of 132,806 ISSN issued for e-serials (December 2013) US:  20%  UK:  8.6%   Rest  of  World:     71%   Researchers (& libraries/publishers) in any one country are dependent upon content written and published as serials in countries other than their own
  • 11. So, who is offering digital shelving? ①  Web-scale not-for-profit archiving agencies: ②  National libraries … ③  Research libraries: consortia & specialist centres … Ingesting content with archival intent … National Science Library, Chinese Academy of Sciences National Science Library, Chinese Academy of Sciences
  • 12. Many archiving organisations a Good Thing “Digital information is best preserved by replicating it at multiple archives run by autonomous organizations” B. Cooper and H. Garcia-Molina (2002) Some  bad  stuff  will  happen!
  • 13. A  Project  to     Pilot  an     E-­‐journal     PreservaAon     Registry     Service   Need to know who is looking after what & how?
  • 14. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry user requirements (a) (b) ISSN-­‐L  as  kernel  field   METADATA on extant e-serials METADATA     on preservation action Digital Preservation Agencies Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance A  Project  to     Pilot  an     E-­‐journal     PreservaAon     Registry     Service   Need to know who is looking after what & how?
  • 15. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry user requirements (a) (b) ISSN-­‐L  as  kernel  field   METADATA on extant e-serials METADATA     on preservation action Digital Preservation Agencies Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance A  Project  to     Pilot  an     E-­‐journal     PreservaAon     Registry     Service   Need to know who is looking after what & how?     The Keepers Registry "Tales  from  the     Keepers  Registry"     Serials  Review  39.1  (2013)
  • 16. …  to  discover  who  is  looking  a5er  what   thekeepers.org as Global Monitor *New  in  2014*       Library  of  Congress     and  Scholars  Portal     now  reporAng  in
  • 17. e-­‐journals  should  be  easy    –  right?     the  Keepers  Registry  recorded     In  2011,  16,558  Atles  ‘ingested  &   archived’  by  at  least  1  ‘keeper’      in  2013,  21,557          in  2014,  26,195  now  26,712       9,731  'ingested  &  archived'  by  3+   …  more  archiving  &  as  more  archives  report  into  Registry  !     Some  signs  of  Progress:   Wrigen  &  produced  by  Julie  Brown,  1989
  • 18. “Are we there yet?” … “Don’t think so” ‘Ingest Ratio’= titles being ingested by one or more Keeper / ‘online serials’ in ISSN Register = 26,195 / 136,965 [in March 2014] => 19% (We do not know about 80% of all resources having ISSN) ‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register = 9,656 / 136,965 => 7%
  • 19. Evidence  on  what  libraries  care  about   Using  Title  List  Comparison  tool  in  Members  Area  of  Keepers  Registry   As  reported  in:    P.  Burnhill  (2013)  Tales  from  The  Keepers  Registry:  Serial  Issues  About  Archiving  &  the   Web.  Serials  Review  39  (1),  3–20.  hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178,  & hgps://www.era.lib.ed.ac.uk/handle/1842/6682     In  2011/12  three  major  research  libraries  in  the  USA     (Columbia,  Cornell  &  Duke)     checked  archival  status  of  serial  Atles  regarded  as  important       ‘Ingest  RaKo’  =  22%  to  28%,  ie  about  a  quarter         =>  fate  of  c.75%  is  unknown
  • 20. very  many  ‘at  risk’  e-­‐journals  from  many  small  publishers   BIG     publishers     act  early  but   incompletely   Priority:     find  economic  way  to   archive  content  from  …
  • 21. …  logs  for  the  UK  OpenURL  Router*   •  8.5m  full  text  requests  in  UK  during  2012     =>  53,311  online  Atles  requested      Analysis  in  2013::      ‘Ingest  RaKo’  =  32%  (16,985/53,311)          =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!             Evidence  based  on  what  Researchers  Use   *  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL   resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges
  • 22. …  logs  for  the  UK  OpenURL  Router*   •  8.5m  full  text  requests  in  UK  during  2012     =>  53,311  online  Atles  requested      Analysis  in  2013::      ‘Ingest  RaKo’  =  32%  (16,985/53,311)          =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!             Evidence  based  on  what  Researchers  Use   *  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL   resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges     “I  believe  we've  …  a  problem  here.”  [John  Swigert,  Jr.]
  • 23. Another threat to the integrity of the record   Language Technology Group   Funded by the Andrew W. Mellon Foundation ‘Reference  Rot’     When  what  was  referenced  &  cited     ceases  to  say  the  same  thing,  or  ‘has  ceased  to  be’   hJp://www.snorgtees.com/this-­‐parrot-­‐has-­‐ceased-­‐to-­‐be Reference Rot = Link Rot + Content Drift “when links to web resources no longer point to what they once did”
  • 24. Link Rot ‘Link Rot’
  • 25. + Content Drift: What is at end of URI has changed, or gone! http://dl00.org 2000 http://dl00.org 2004 http://dl00.org 2005 http://dl00.org 2008 (a)  Dynamic  content   as  values  on  webpage   changes  over  Ame   (b)  StaKc  content   but  very  different  (o{en   unrelated)  web  pages
  • 26. Hiberlink: Time Travel for The Scholarly Web 1.  Threat: Creating evidence on extent of ‘Reference Rot’ –  Main focus: references (& URIs) made in Journal Articles •  "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot" –  PLOS One paper published on 26 December 2014. •  Harvard Law Library & permaCC reference rot in Supreme Court judgments •  http://www.newyorker.com/magazine/2015/01/26/cobweb –  Also looked at Reference Rot & the e-Thesis, ETD2014 2.  Remedy: Opportunities for productive intervention –  Identify workflows: preparation, publication, ingest –  Prototype tools to avoid or limit reference rot –  Pro-active or ‘transactional’ archiving as remedy •  Embedding such ‘solutions’ in existing tools & infrastructure •  Propose/test new infrastructure for temporal referencing –  supporting & using the Memento protocol
  • 27. Peter Burnhill, EDINAhgp://www.res|ulliving.com/wp-­‐content/uploads/2013/12/Time-­‐1024x861.jpg   Preserving  the  integrity   of  the  scholarly     record
  • 28. •  Robust Link - re-factor the HTML link that is returned ‘Infrastructure’ to Enable Remedy <a href="http://www.bnf.fr"> Link to the BNF </a> b)  Augment Link with a set of Datetime & location pairs <a href="http://www.bnf.fr" mset="2014-05-19, http://archive.today/zdpAn 2014-05-15 memento"> Link to the BNF </a> a)  Take simple URI - to French National Library (say)   hgp://robustlinks.mementoweb.org/
  • 29. Remedy for The Integrity of The Scholarly Record Envisage  the  best  opportuniAes  for  IntervenAon  to  make   Remedy,  to  ‘flash-­‐freeze’,  either  to  avoid  reference  rot  or  to  ‘stop   the  rot’.     3  basic  workflows:   ① Study:  PreparaAon  -­‐>  (Review)  -­‐>  Submission     ② PublicaAon:  Editorial  -­‐>  (Revision)  -­‐>  Acceptance  -­‐>  Issue       ③ Post-­‐PublicaAon:  Deposit/Ingest  -­‐>  Provide/Access  -­‐>  Use                 IdenPfy  the  Actors  involved  in:   ① ComposiAon:  author/creator   ② Public  Release:  editor/referee/copy     ③ CuraAon:  librarian  /  repository  manager  /  archivist
  • 30. Hiberlink Plug-in: help authors & middle-folk do the right thing: ①  Triggers archiving of referenced web content when it is noted in: –  Zotero - used by authors to manage references https://www.zotero.org/ –  Open Journal System (OJS) - used by OA publishers https://pkp.sfu.ca/ojs/ ②  Returns Datetime URI for archived content that can be used in the citation Two-step Remedy To Avoid Reference Rot
  • 31. Time’s Up! thekeepers.org hiberlink.org •  See also •  thekeepers.blogs.edina.ac.uk •  safenet.blogs.edina.ac.uk/ HelpDesk: edina@ed.ac.uk