PRIDE: Quality control in a proteomicsdata repositoryAttila CsordasProteomics Services TeamBiocuration ConferenceApril 2nd...
Overview who are we? what are we dealing with? manual curation and submission ...
PRIDE: http://www.ebi.ac.uk/pride The PRoteomics IDEntifications database is a centralised, primary, archival,...
Acknowledgements colleagues at the PRIDE team @pride_ebi ...
Mass spectrometryanalytical technique measuring the mass-to-charge (m/z) ratio of charged particles to determine ma...
Shotgun/bottom-up proteomics Ppeptides MS...
What is a PRIDE submission?7/23 April 2, 2012
growth ofcore data types 130 million 23 million ...
Manual curation and submission process Search Engine + spectra PRIDE ...
PRIDE Inspectorinitial assessmenton data qualityvisualise/check datasummary chartssupport for submitters &reviewers/editor...
Frequent Data Quality Issues <SearchEngine>PeptideShaker</SearchEngine> 1. syntactic problems ...
Delta m/z of detected peptide precursorsexperimental precursor ion m/z - theoretical precursor ion m/z source of delta m...
Fixing modifications based on delta m/z outliers13/23 April 2, 2012
Fixing modifications based on delta m/z outliers14/23 April 2, 2012
but the manual approach does not scale!15/23 April 2, 2012
10 times as many & big submissions/ day?16/23 April 2, 2012
single point of submission of data to the main repositories to encourage data exchange Published ...
PX submission pipeline ProteomePX Tool ...
Automated regular submission pipeline curation-submission time is ~1/6th of manual time ...
Conclusion growing amount of data growingly complex data scalability issues ...
21/23 April 2, 2012
Thanks for the attention!22/23 April 2, 2012
acsordas@ebi.ac.uk Q&A @attilacsordas23/23 April 2, 2012
of 23

Pride quality controlattilacsordasbiocuration2012

The ppt version of a talk I gave at the Biocuration 2012 Conference in Washington DC at Georgetown University in front of ~300 people.
Published on: Mar 4, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - Pride quality controlattilacsordasbiocuration2012

  • 1. PRIDE: Quality control in a proteomicsdata repositoryAttila CsordasProteomics Services TeamBiocuration ConferenceApril 2nd, 20121/23
  • 2. Overview who are we? what are we dealing with? manual curation and submission quick detour: ProteomeXchange automated curation & submission pipeline conclusion April 2, 20122/23
  • 3. PRIDE: http://www.ebi.ac.uk/pride The PRoteomics IDEntifications database is a centralised, primary, archival, public data repository for MS/MS proteomics data containing peptide ids, protein ids, mass spectra, protein expression values, metadata.3/23 April 2, 2012
  • 4. Acknowledgements colleagues at the PRIDE team @pride_ebi pride-ebi@ebi.ac.uk pride-support@ebi.ac.uk http://code.google.com/p/pride-toolsuite/ http://code.google.com/p/pride-converter-2/4/23 April 2, 2012
  • 5. Mass spectrometryanalytical technique measuring the mass-to-charge (m/z) ratio of charged particles to determine masses of particles, composition of samples/molecules and chemical structures of molecules April 2, 20125/23
  • 6. Shotgun/bottom-up proteomics Ppeptides MS/MS analysis R O sequence database Tproteins O fragmentation C MS analysis O L April 2, 2012 6/23
  • 7. What is a PRIDE submission?7/23 April 2, 2012
  • 8. growth ofcore data types 130 million 23 million 4.6 million 8/23 April 2, 2012
  • 9. Manual curation and submission process Search Engine + spectra PRIDE Converter pride xmlMascot (.dat),X!Tandem (.xml) + mgf9/23 April 2, 2012
  • 10. PRIDE Inspectorinitial assessmenton data qualityvisualise/check datasummary chartssupport for submitters &reviewers/editorsmore flexible than webinterface 10/23 April 2, 2012
  • 11. Frequent Data Quality Issues <SearchEngine>PeptideShaker</SearchEngine> 1. syntactic problems <PeptideItem> 2a. core data missing no protein/peptide identifications 2b. or metadata missing no species 3.inconsistent/incorrect data protein modifications11/23 April 2, 2012
  • 12. Delta m/z of detected peptide precursorsexperimental precursor ion m/z - theoretical precursor ion m/z source of delta m/z outliers: incorrect or missing protein modifications and charge state misassignments 12/23 April 2, 2012
  • 13. Fixing modifications based on delta m/z outliers13/23 April 2, 2012
  • 14. Fixing modifications based on delta m/z outliers14/23 April 2, 2012
  • 15. but the manual approach does not scale!15/23 April 2, 2012
  • 16. 10 times as many & big submissions/ day?16/23 April 2, 2012
  • 17. single point of submission of data to the main repositories to encourage data exchange Published Raw Reprocessed Individualsubmissions PeptideAtlas EBI PRIDE Raw files Users archiveLarge-scalesubmissions UniProt Other DBs (GPMDB, …)17/23 April 2, 2012
  • 18. PX submission pipeline ProteomePX Tool Validation Submission Publication Central Files Raw PRIDE Files XML Summary18/23 April 2, 2012
  • 19. Automated regular submission pipeline curation-submission time is ~1/6th of manual time actionable curation summary number of files: 3 Project: Combined personal saliva proteome and microbioproteome XML generator software PRIDE Converter Toolsuite 2.0- SNAPSHOTFilename size Species #Proteins #Peptides #Spectra #Unid-d PTMs % delta spectra m/z outlier22143. 3.3 GB Homo 4128 60544 184209 123665 3 0.0xml sapiens spectra spectra 19/23 April 2, 2012
  • 20. Conclusion growing amount of data growingly complex data scalability issues overcoming them by automation and new, smarter curation strategies20/23 April 2, 2012
  • 21. 21/23 April 2, 2012
  • 22. Thanks for the attention!22/23 April 2, 2012
  • 23. acsordas@ebi.ac.uk Q&A @attilacsordas23/23 April 2, 2012

Related Documents