Preserving the Smithsonian Institution’s Web PresenceSmithsonian Lynda Schmitz Fuhrig a...
The Mission of SI Archives Appraise, acquire, and preserve the records of the Institution Offer a range of resear...
Smithsonian’s First Home Page,1995
The Smithsonian Today
Website and Social MediaRegistry A “record” is any official recorded information, regardless of medium or characteri...
Appraising Records All records must be appraised to determine their ultimate disposition Records appraised based ...
Appraising Traditional WebsitesWebsites are public face of Smithsonian Significant historical and research value Constan...
Appraising Social MediaAccountsAll social media accounts are used differently Each account appraised individually based o...
Past Web Archiving Procedures• Files transferred from the Smithsonian’s IT office• HTTrack web crawler• Scripts u...
Heritrix• Archival web crawler• Open source• Java• Developed by Internet Archive, National Library of Norway an...
WARCWARC – Web ARChive file format International standard – ISO 28500:2009 Extension of the ARC format in use since 1996...
Crawling in Heritrix
STRI website in 1995SIA Accession 05-032
Viewing a Crawl
More To Do
Social Media Third-party issues Privacy concerns Different tools
Lessons Learned In-house archiving takes time No one-size fits all solution Master site registry requires regular...
Contacts and ResourcesLynda Schmitz FuhrigDigital Services Divisionschmitzfuhrigl@si.eduJennifer WrightArchives and Inform...
Preserving the Smithsonian Institution’s Web Presence
Preserving the Smithsonian Institution’s Web Presence
Preserving the Smithsonian Institution’s Web Presence
of 21

Preserving the Smithsonian Institution’s Web Presence

Presentation delivered by Lynda Schmitz Fuhrig, Electronic Archivist, and Jennifer Wright, Archivist, for the Smithsonian Institution Archives, at the Smithsonian Archives Fair on October 14, 2011 in Washington, DC. Although it first began capturing institutional websites in the late 1990s, the Smithsonian Institution Archives initiated a project in 2009 to capture the explosion of public websites and social media instances maintained by its many museums, research centers, and programs with the Heritrix crawler. This presentation reviews appraisal, accessioning, and capture issues in documenting the Smithsonian’s web presence in the early 21st Century.
Published on: Mar 4, 2016
Published in: Education      Technology      Business      
Source: www.slideshare.net


Transcripts - Preserving the Smithsonian Institution’s Web Presence

  • 1. Preserving the Smithsonian Institution’s Web PresenceSmithsonian Lynda Schmitz Fuhrig and Jennifer WrightInstitution Archives Oct. 14, 2011Fair
  • 2. The Mission of SI Archives Appraise, acquire, and preserve the records of the Institution Offer a range of research and reference services Establish policy and provide expert guidance on record keeping practices Create and promote products and services that broaden understanding of the Smithsonian Provide professional archival and conservation expertise
  • 3. Smithsonian’s First Home Page,1995
  • 4. The Smithsonian Today
  • 5. Website and Social MediaRegistry A “record” is any official recorded information, regardless of medium or characteristics, created, received, and maintained by a Smithsonian museum, office, or employee Websites and social media accounts must be managed as records Registry allows staff from across the Smithsonian to add and update information about all of their websites and social media accounts
  • 6. Appraising Records All records must be appraised to determine their ultimate disposition Records appraised based on administrative, legal, historical, and research value Records with long-term value are transferred to Archives
  • 7. Appraising Traditional WebsitesWebsites are public face of Smithsonian Significant historical and research value Constantly changing Crawl annually and before and after major redesigns Work with webmasters to determine if crawls should be more or less frequent
  • 8. Appraising Social MediaAccountsAll social media accounts are used differently Each account appraised individually based on content Accounts containing significant original content will be fully captured each year Accounts consisting mostly of links to other resources will be captured occasionally to document existence Method and frequency of capture may depend on terms of service and ability to avoid capturing non-Smithsonian content
  • 9. Past Web Archiving Procedures• Files transferred from the Smithsonian’s IT office• HTTrack web crawler• Scripts used to create XHTML preservation files but very manual and time-consuming
  • 10. Heritrix• Archival web crawler• Open source• Java• Developed by Internet Archive, National Library of Norway and National and University Library of Iceland
  • 11. WARCWARC – Web ARChive file format International standard – ISO 28500:2009 Extension of the ARC format in use since 1996 Container format
  • 12. Crawling in Heritrix
  • 13. STRI website in 1995SIA Accession 05-032
  • 14. Viewing a Crawl
  • 15. More To Do
  • 16. Social Media Third-party issues Privacy concerns Different tools
  • 17. Lessons Learned In-house archiving takes time No one-size fits all solution Master site registry requires regular updating
  • 18. Contacts and ResourcesLynda Schmitz FuhrigDigital Services Divisionschmitzfuhrigl@si.eduJennifer WrightArchives and Information Management Teamwrightjm@si.eduSmithsonian Institution Archives website:http://siarchives.si.edu