Preserving Social Science Data Through Archival Collaboration Micah Altman, Senior Research Scientist
This Talk… <ul><li>The roadmap: </li></ul><ul><ul><li>Past </li></ul></ul><ul><ul><li>Present </li></ul></ul><ul><ul><li>F...
What? -- Digital Social-Science Data <ul><li>DIGITAL </li></ul><ul><li>Optical: DVD, CD </li></ul><ul><li>Magnetic: Tape...
Data Access is the Key To Science <ul><li>Science is not (only) about being scientific </li></ul><ul><li>Scientific progre...
Data Access is the Key To Democracy <ul><li>Statistics = state-istics </li></ul><ul><li>The state tax authority: counting ...
How Data Is Lost <ul><li>Data Intentionally Discarded </li></ul><ul><li>“ It was just too long ago, I generally keep data ...
<ul><li>Past grants and awards </li></ul><ul><li>Private research organization </li></ul><ul><li>Polling organizations </l...
Collaboration for Preservation <ul><li>Partnership Agreements </li></ul><ul><ul><li>Agreement to establish good practice <...
Data Rescued Examples <ul><li>U.S. Information Agency Surveys </li></ul><ul><ul><li>Directly informed U.S. foreign policy...
Selected Topics & Sponsors <ul><li>Political activity, political activism, voting behavior, protest activity, voter regist...
Data-PASS Shared Catalog <ul><li>A unified catalog of the partners’ entire holdings </li></ul><ul><li>Completes the unif...
Catalog Distributed Architecture Search Shared Catalog OAI Data Mirror Metadata Catalog Harvester Online Catalog Online An...
 
The Dataverse Network* <ul><li>Includes integrated developments in web application software, networking , data citatio...
http:// dvn.iq.harvard.edu/dvn
Better Data Citations: Persisent ID’s and Universal Numeric Fingerprints <ul><ul><li>Persistent ID’s get you from a journ...
Future: Replication as Institutional Insurance <ul><li>Schema driven: capture inter-archival preservation commitments </li...
<ul><li>Preservation now follows the research life cycle </li></ul><ul><li>Future preservation should be planned at the...
For More Information Data-PASS Project: http://www.icpsr.umich.edu/DATAPASS/ Shared Catalog: http:// vdc.hmdc.harvard...
of 19

Preserving Social Science Data Through Archival Collaboration

Published on: Mar 4, 2016
Source: www.slideshare.net


Transcripts - Preserving Social Science Data Through Archival Collaboration

  • 1. Preserving Social Science Data Through Archival Collaboration Micah Altman, Senior Research Scientist
  • 2. This Talk… <ul><li>The roadmap: </li></ul><ul><ul><li>Past </li></ul></ul><ul><ul><li>Present </li></ul></ul><ul><ul><li>Future </li></ul></ul><ul><li>Collaborators & Co-conspirators: </li></ul><ul><ul><li>Ken Billen, Jonathan Crabtree, Darrell Donakowski,, Myron Gutmann, Gary King, Lois Timms-Ferrarra, Amy Pienta, Marc Maynard, </li></ul></ul>
  • 3. What? -- Digital Social-Science Data <ul><li>DIGITAL </li></ul><ul><li>Optical: DVD, CD </li></ul><ul><li>Magnetic: Tapes, ‘Floppies’ </li></ul><ul><li>Paper: cards, tapes </li></ul><ul><li>SOCIAL SCIENCE </li></ul><ul><li>Social: class, crime, social movements, culture, folklore, family </li></ul><ul><li>Economic: wealth, prosperity, labor, business, equity </li></ul><ul><li>Psychology: cognition, attitudes, stereotypes </li></ul><ul><li>Politics: justice, democracy, public policy, public administration, international conflict </li></ul><ul><li>DATA </li></ul><ul><li>Raw measurements </li></ul><ul><li>Numeric tables </li></ul><ul><li>Administrative records (& email) </li></ul><ul><li>Video and audio interviews, transcripts (& blogs) </li></ul>
  • 4. Data Access is the Key To Science <ul><li>Science is not (only) about being scientific </li></ul><ul><li>Scientific progress requires community: Competition and cooperation </li></ul><ul><li>In the pursuit of common goals </li></ul><ul><li>Without access to the same materials: no community exists </li></ul><ul><li>The value of an article that can’t be replicated: ? </li></ul><ul><li>Scholarly articles are summaries, not the actual research results </li></ul><ul><li>But: Data access is spotty by field </li></ul><ul><li>Movement to require data access with publication </li></ul><ul><li>Finding the data is still hard </li></ul><ul><li>Hard for journal editors to verify </li></ul><ul><li>If you find it, how do you know it’s the same? </li></ul><ul><li>Replication projects: most published articles in social science cannot be replicated </li></ul>
  • 5. Data Access is the Key To Democracy <ul><li>Statistics = state-istics </li></ul><ul><li>The state tax authority: counting people, estimating wealth </li></ul><ul><li>Reformers use data to assess the performance of the state </li></ul><ul><li>Science informs public policy continually </li></ul><ul><li>In modern democracy: the public needs a direct source of information </li></ul>
  • 6. How Data Is Lost <ul><li>Data Intentionally Discarded </li></ul><ul><li>“ It was just too long ago, I generally keep data for something like 10 years beyond the last time I do something with them.” </li></ul><ul><li>“ Destroyed, in accord with APA 5-year post-publication rule.” </li></ul><ul><li>Unintentional Hardware Problems </li></ul><ul><li>“ Some data were collected, but the data file was lost in a technical malfunction.” </li></ul><ul><li>Destroyed for Confidentiality Reasons </li></ul><ul><li>“ The material…was considered sensitive data. Institutional review boards.. required us to promise to destroy the data after a certain period of time...” </li></ul><ul><li>Acts of Nature </li></ul><ul><li>“ The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.” </li></ul><ul><li>Discarded or Lost in a Move </li></ul><ul><li>“ As I retired …. Unfortunately, I simply didn’t have the room to store these data sets at my house.” </li></ul><ul><li>Obsolescence </li></ul><ul><li>“ Speech recordings stored on a LISP Machine…, an experimental computer which is long obsolete.” </li></ul><ul><li>Simply Lost </li></ul><ul><li>“ For all I know, they are on a [University] server, but it has been literally years and years since the research was done, and my files are long gone.” </li></ul>Research by:
  • 7. <ul><li>Past grants and awards </li></ul><ul><li>Private research organization </li></ul><ul><li>Polling organizations </li></ul><ul><li>Journals and researcher associations </li></ul>Identifying Data at Risk
  • 8. Collaboration for Preservation <ul><li>Partnership Agreements </li></ul><ul><ul><li>Agreement to establish good practice </li></ul></ul><ul><ul><li>Preservation copies of data collected </li></ul></ul><ul><ul><li>Transfer Protocol: in case of archival failure </li></ul></ul><ul><li>Cooperating Operations </li></ul><ul><ul><li>Central database of leads for acquisition </li></ul></ul><ul><ul><li>Development of shared procedures </li></ul></ul><ul><ul><li>Review of acquisitions </li></ul></ul><ul><li>Joint “Not-bad” practices </li></ul><ul><ul><li>Identification & selection </li></ul></ul><ul><ul><li>Metadata </li></ul></ul><ul><ul><li>Security </li></ul></ul><ul><ul><li>Confidentiality </li></ul></ul><ul><li>Shared Catalog </li></ul><ul><ul><li>Unified Discovery </li></ul></ul><ul><ul><li>Content exchange </li></ul></ul><ul><ul><li>Layered Services </li></ul></ul>
  • 9. Data Rescued Examples <ul><li>U.S. Information Agency Surveys </li></ul><ul><ul><li>Directly informed U.S. foreign policy through surveys of foreign public opinion </li></ul></ul><ul><ul><li>Previously, only surveys from 1970-1990 were held in the national archives </li></ul></ul><ul><ul><li>Collaboration be NARA and Roper to create a much more complete series spanning the 1950-1990 </li></ul></ul><ul><ul><li>Surveys conducted in Europe, Latin America, Asian countries include nuclear arms control, </li></ul></ul><ul><ul><li>Recent Subjects include US-Soviet relations, US strike on Libya, Soviet Union invasion of Afghanistan, and economic matters, terrorism, economic summits, arms control, and the Soviet actions in Afghanistan, drug trafficking, democratization, and conflicts in El Salvador and Nicaragua. </li></ul></ul><ul><li>Longitudinal Study of Personality Development . </li></ul><ul><ul><li>By Jack and Jeanne Humphrey Block </li></ul></ul><ul><ul><li>The most intensive study of human personality development in existence. </li></ul></ul><ul><ul><li>Thirty year longitudinal study. </li></ul></ul><ul><ul><li>Mixed methods – quantitative, audio, video. </li></ul></ul><ul><ul><li>More than 100 instruments, and 1000’s of measures (variables) </li></ul></ul><ul><ul><li>Resulted in more than 100 publications. </li></ul></ul><ul><ul><li>(Also shows how whiny kids are more likely to grow up to be conservatives.) </li></ul></ul><ul><li>National Network of State Polls </li></ul><ul><ul><li>Diverse membership of 50 members in 38 states </li></ul></ul><ul><ul><li>Covers a tremendous range of local and national issues </li></ul></ul><ul><ul><li>Data imminently at risk </li></ul></ul>
  • 10. Selected Topics & Sponsors <ul><li>Political activity, political activism, voting behavior, protest activity, voter registration, fundraising, political alienation, relationship to the Black community, feminism, racial identity, attitudes toward abortion, attitudes toward federal programs; television viewing habits, affects of having children on the marriage, giving too much/little independence, discipline, overscheduling, overprotecting, measuring levels of success in teaching values, self-control, good citizenship, good money habits, religion, worries that parents have of the future facing their children; problems facing parents and children from drugs, sex, violence to the lack of various family and religious values; daycare, mothers working, childrearing, taxes, government spending, morals, children’s issues, economy, jobs, education, crime, health care, social security, local school administration, standardized testing, impact of poor scores on teachers, higher academic standards needed, too much/little homework, summer school., teachers, administrators, quality of academics, discipline matters, class size, level of science and math skills taught, Shakespeare, life skills, athletics, citizenship, Role of the US in the world and assessing US performance, terrorism, war in Iraq, respondent identified level of understanding of foreign affairs, US and foreign aid, assisting emerging democracies, enhancing national security, image of the US abroad, Seriousness of Welfare problems--abuse, fraud, generational, etc.; assessing list of remedies--limit duration, require job training, provide day care, unannounced visits, business tax breaks for hiring recipients, penalize recipients who have more children, etc.; profiling welfare recipients (e.g. more likely to be better/worse parents, lazy or hardworking, from troubled families; defining the American Ideal, how to teach kids what it means to be American, , national identity, appreciation of freedoms in the US, importance of voting, ashamed of nation's history of racism, job US does in teaching immigrant children, bi-lingualism, fly an American flag; most about the meaning of the rights the Constitution guarantees, assessing the level of appreciation of those rights in the US and how it is perceived to the international community; aging. Money Mangers; on union organizations, employers, and labor market institutions; tort law reforms; crime and urbanization; law and social control; natural disasters; awareness of self </li></ul><ul><li>NSF, NIH, The Danforth Foundation, The Ford Foundation, The David and Lucille Packard Foundation, and Ewing Marion Kauffman Foundation., State Farm Insurance, Ronald McDonald House Charities, Advertising Council, American Federation of Teachers, the Annenberg Institute, the George Gund Foundation, the National School Boards Association, U.S. Department of Education, GE Foundation, Nellie Mae Education Foundation, Wallace Foundation, Bill & Melinda Gates Foundation, Pew Charitable Trust, National Constitution Center, Alliance for Aging Research, American Federation for Aging Research; the MacArthur Foundation, NiMH </li></ul>
  • 11. Data-PASS Shared Catalog <ul><li>A unified catalog of the partners’ entire holdings </li></ul><ul><li>Completes the unification of social science data that was the dream of the first Council of Social Science Data Archives in 1969 </li></ul><ul><li>Discovery Services </li></ul><ul><ul><li>Simple & fielded search </li></ul></ul><ul><ul><li>Virtual collection browsing </li></ul></ul><ul><li>Metadata delivery </li></ul><ul><ul><li>Descriptive study, file, & variable information </li></ul></ul><ul><ul><li>Provenance metadata </li></ul></ul><ul><ul><li>Human and OAI interfaces </li></ul></ul><ul><li>Enhanced Delivery </li></ul><ul><ul><li>Proxy delivery </li></ul></ul><ul><ul><li>Replication </li></ul></ul><ul><ul><li>Layered analysis services </li></ul></ul>Hosted by:
  • 12. Catalog Distributed Architecture Search Shared Catalog OAI Data Mirror Metadata Catalog Harvester Online Catalog Online Analysis <ul><li>View Information on Data </li></ul><ul><li>Through Catalog </li></ul><ul><li>Link to Data at Partner Site </li></ul><ul><li>Access Data </li></ul><ul><li>With Extraction and Analysis, Through Catalog </li></ul><ul><li>Direct to Partner Sites </li></ul><XSL> Crosswalk <XSL> Crosswalk proxy proxy
  • 14. The Dataverse Network* <ul><li>Includes integrated developments in web application software, networking , data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on firmer ground. It facilitates the public preservation and distribution of persistent, authorized, and verifiable research data. </li></ul><ul><li>In production </li></ul><ul><li>Will migrate Shared Catalog this summer </li></ul><ul><li>Virtually-Hosted Archiving </li></ul><ul><li>The importance of being virtual … </li></ul><ul><ul><li>Nothing to install </li></ul></ul><ul><ul><li>Dynamic collections: local and federated </li></ul></ul><ul><li>Institutionally supported </li></ul><ul><ul><li>Persistent identifiers and citations </li></ul></ul><ul><ul><li>No worries about file formats changing, backups, etc. </li></ul></ul><ul><ul><li>All the initial setup work is done for depositor </li></ul></ul><ul><li>Depositor retain total control over </li></ul><ul><ul><li>Content </li></ul></ul><ul><ul><li>Access </li></ul></ul><ul><ul><li>Presentation </li></ul></ul>*Successor to “VDC” Developed by:
  • 15. http:// dvn.iq.harvard.edu/dvn
  • 16. Better Data Citations: Persisent ID’s and Universal Numeric Fingerprints <ul><ul><li>Persistent ID’s get you from a journal article to the data </li></ul></ul><ul><ul><li>UNF’s verify that it’s the same data as cited </li></ul></ul><ul><ul><li>Same UNF regardless of hardware, operating system, statistical software, database, or spreadsheet software. </li></ul></ul><ul><ul><li>UNF’s combine: generalized rounding (dessication), normalization (canonicalization), fingerprinting (cryptographic hash, e.g. SHA256) </li></ul></ul><ul><ul><li>Available as: C++, R-stats language, Stata, SAS, S-Plus </li></ul></ul>
  • 17. Future: Replication as Institutional Insurance <ul><li>Schema driven: capture inter-archival preservation commitments </li></ul><ul><li>Asymmetric: resource commitments proportional to holdings </li></ul><ul><li>Versioned: versioned data and citations </li></ul><ul><li>Integration: LOCKSS + DVN techology, archival workflows </li></ul>Data-PASS Syndicated Storage Project <ul><li>External Causes of Preservation Failure </li></ul><ul><ul><li>Third party attacks </li></ul></ul><ul><ul><li>Institutional funding </li></ul></ul><ul><ul><li>Change in legal regimes </li></ul></ul><ul><li>Quis custodiet ipsos custodes? </li></ul><ul><ul><li>Unintentional curatorial modification </li></ul></ul><ul><ul><li>Loss of institutional knowledge & skills </li></ul></ul><ul><ul><li>Intentional curatorial deaccessioning </li></ul></ul><ul><ul><li>Change in institutional mission </li></ul></ul>
  • 18. <ul><li>Preservation now follows the research life cycle </li></ul><ul><li>Future preservation should be planned at the beginning of the cycle </li></ul>
  • 19. For More Information Data-PASS Project: http://www.icpsr.umich.edu/DATAPASS/ Shared Catalog: http:// vdc.hmdc.harvard.edu/dataverse/DATAPASS / Dataverse Network Software: http://TheData.Org Get a dataverse hosted by IQSS: http:// dvn.iq.harvard.edu/dvn

Related Documents