110101
NCBI
National Center for BiotechnologyNational Center for Biotechnology
InformationInformation
• Created by Public ...
110101
NCBI NCBI is the most heavily site inNCBI is the most heavily site in
biomedicine. Why?biomedicine. Why?
300,000
20...
110101
NCBI
Data, the Next Intel InsideData, the Next Intel Inside
Growth of Searches and GenBank
0
5000
10000
15000
20000...
110101
NCBI Comparative Analysis of GenesComparative Analysis of Genes
Enables “Innovation in Assembly”Enables “Innovation...
110101
NCBI Ignoring the Central Dogma inIgnoring the Central Dogma in
Bioinformatics is Evidence of “StupidBioinformatics...
110101
NCBI It Guides “Innovative Assembly” ofIt Guides “Innovative Assembly” of
Separate ResourcesSeparate Resources
GenB...
110101
NCBI
EntrezEntrez: Pathway to Discovery: Pathway to Discovery
Amino acid
sequence similarityCoding region
features
...
110101
NCBI
Entrez Increases Discovery SpaceEntrez Increases Discovery Space
Nucleotide
sequences
Protein
sequences
Taxon
...
110101
NCBI
Entrez is Intrinsically ComponentsEntrez is Intrinsically Components
NCBI C++ Toolkit enforces common modules ...
110101
NCBI
Web Services Provide Access to EntrezWeb Services Provide Access to Entrez
Eutils supports about 5 million ser...
110101
NCBI Harnessing Collective Intelligence inHarnessing Collective Intelligence in
BioMedicineBioMedicine
110101
NCBI
Bibliographic ResourcesBibliographic Resources
PubMed – Citations and Abstracts from publishers;
MEDLINE index...
110101
NCBI
PubMed Central XMLPubMed Central XML
Why XML?
• Preserves structure of an article
• Lends itself to intelligen...
110101
NCBI
PMC2PMC2
Content is converted to a standard XML format on ingest and then stored
and rendered from the one for...
110101
NCBI
Harvard E-journal Archiving ProjectHarvard E-journal Archiving Project
The Mellon Foundation funded the Harvar...
110101
NCBI NLM Journal Article DTDsNLM Journal Article DTDs
Establishing Standards from PracticeEstablishing Standards fr...
110101
NCBI
AdoptionAdoption
Highwire Press
JStor’s Electronic Archiving Initiative
Australia’s Commonwealth Scientific an...
110101
NCBI
SupportSupport
Complete documentation for both DTDs available
online.
Established public discussion lists for ...
110101
NCBI
Portable PubMed Central (pPMC)Portable PubMed Central (pPMC)
Provides a local mirror of PMC content
Updated da...
110101
NCBI
Previously published books
What’s on the Bookshelf?What’s on the Bookshelf?
Previously published books
New col...
110101
NCBI Diabetes
• Health information with links to molecular data
• NIDDK advisors on content
• ~ 10,000 users per mo...
110101
NCBI
BooksBooks
• Authoring in MS Word
• Simple mark-up based on Word styles
• WordML to XML conversion
110101
NCBI
110101
NCBI
BioMedicine Moves to the WebBioMedicine Moves to the Web
Electronic Authoring and Distribution of Articles
• L...
110101
NCBI
Influenza Anti-viral CompoundsInfluenza Anti-viral Compounds
110101
NCBI
Influenza Anti-viral CompoundsInfluenza Anti-viral Compounds
110101
NCBI
Influzena Anti-viral/Protein BindingInfluzena Anti-viral/Protein Binding
110101
NCBI
Influenza Neuraminidase GeneInfluenza Neuraminidase Gene
110101
NCBI
Influenenza Genome ProjectInfluenenza Genome Project
110101
NCBI
Influenza Assembly ArchiveInfluenza Assembly Archive
of 30

National Center for Biotechnology Information

Published on: Mar 3, 2016
Published in: Technology      Health & Medicine      
Source: www.slideshare.net


Transcripts - National Center for Biotechnology Information

  • 1. 110101 NCBI National Center for BiotechnologyNational Center for Biotechnology InformationInformation • Created by Public Law 100-607 in 1988 as part of National Library of Medicine at NIH to: • Create automated systems for knowledge about molecular biology, biochemistry, and genetics. • Perform research into advanced methods of analyzing and interpreting molecular biology data. • Enable biotechnology researchers and medical care personnel to use the systems and methods developed. • Builders and providers of GenBank, Entrez, Blast, PubMed. Online systems host about 1.8 million users per day at peak rates of 3,200 web hits a second. • Center for basic research and training in computational biology.
  • 2. 110101 NCBI NCBI is the most heavily site inNCBI is the most heavily site in biomedicine. Why?biomedicine. Why? 300,000 200,000 100,000 NCBI Web Traffic – 1997-2006 400,000 January1998 500,000 600,000 700,000 January1999 January2000 January2001 January2002 January2003 January2004 January2005 January2006 722,000 Unique IPs a Day 91 Million Web Hits a Day 3200 Peak Web Hits a Second 1.5 Terabytes FTP a Day 1.8 Million Unique Users a Day
  • 3. 110101 NCBI Data, the Next Intel InsideData, the Next Intel Inside Growth of Searches and GenBank 0 5000 10000 15000 20000 25000 30000 35000 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 SearchesperDay 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 Megabases GenBank (Megabases) Searches/Day (BLAST & Text)
  • 4. 110101 NCBI Comparative Analysis of GenesComparative Analysis of Genes Enables “Innovation in Assembly”Enables “Innovation in Assembly” Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPC 697 Yeast 657 RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISLMAQIGCFVPC 716 E.coli 584 RHPVVEQVLNEPFIANPLNLSPQRR-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642 Colon cancer gene sequence 3000 Myr 1000 Myr 500 Myr HumanFlyWormYeastBacteria Mouse
  • 5. 110101 NCBI Ignoring the Central Dogma inIgnoring the Central Dogma in Bioinformatics is Evidence of “StupidBioinformatics is Evidence of “Stupid Design”Design” G e n e G e n e G e n e G e n e S tr u c tu r e M a tu r e P e p ti d e P r o P e p ti d e m R N A T r a n s c r i p t C h r o m o s o m e G e n e ti c s G e n o m e s O r g a n i s m s F u n c ti o n D i s e a s e
  • 6. 110101 NCBI It Guides “Innovative Assembly” ofIt Guides “Innovative Assembly” of Separate ResourcesSeparate Resources GenBank RefSeq Human Genome Bacterial Genome Virus Genome MMDB PubMed UniGene(s) LocusLink OMIM Taxonomy GEO PopSet BLAST Entrez ePCR Sequin G e n e G e n e G e n e G e n e S tr u c tu r e M a tu r e P e p tid e P r o P e p ti d e m R N A T r a n s c r i p t C h r o m o s o m e G e n e tic s G e n o m e s O r g a n is m s F u n c ti o n D i s e a s e
  • 7. 110101 NCBI EntrezEntrez: Pathway to Discovery: Pathway to Discovery Amino acid sequence similarityCoding region features Nucleotide sequence similarity Term frequency statistics Literature citations in sequence databases Literature citations in sequence databases MEDLINE abstracts Nucleotide sequences Protein sequences
  • 8. 110101 NCBI Entrez Increases Discovery SpaceEntrez Increases Discovery Space Nucleotide sequences Protein sequences Taxon Phylogeny 3-D Structure MMDB 3 -D Structure PubMed abstracts Complete Genomes PubMed Entrez Genomes Publishers Genome Centers
  • 9. 110101 NCBI Entrez is Intrinsically ComponentsEntrez is Intrinsically Components NCBI C++ Toolkit enforces common modules in internal pipelines, external applications, and web components. Entrez has common model for Booleans and Summaries. Unique models for deep data. New projects can be easily added or extended. Long standing use of the “productotype” keeps NCBI agile, but (fairly) robust.
  • 10. 110101 NCBI Web Services Provide Access to EntrezWeb Services Provide Access to Entrez Eutils supports about 5 million service requests a day SOAP versions support about 38,000 service requests a day (0.8%) similar to Amazon experience with REST and SOAP Eutils allows outside sites to recreate Entrez and NCBI does not know who or why Current NCBI Sequence Viewer uses Eutils itself
  • 11. 110101 NCBI Harnessing Collective Intelligence inHarnessing Collective Intelligence in BioMedicineBioMedicine
  • 12. 110101 NCBI Bibliographic ResourcesBibliographic Resources PubMed – Citations and Abstracts from publishers; MEDLINE indexing PMC – PubMed Central, full text journal articles from publishers (and NIHMS). pPMC – portable mirror of PMC content NIHMS – NIH Manuscript Submission System for Public Access policy NLM DTD – Modular DTD for bibliographic material pNIHMS – portable NIHMS XML Authoring System – MS Word/XML authoring Bookshelf – Books and monographs in XML from publishers and authors.
  • 13. 110101 NCBI PubMed Central XMLPubMed Central XML Why XML? • Preserves structure of an article • Lends itself to intelligent processing • Human readable – not dependent on technology • Is based on SGML, a publishing industry standard • Portable and migratable
  • 14. 110101 NCBI PMC2PMC2 Content is converted to a standard XML format on ingest and then stored and rendered from the one format. But, What format?
  • 15. 110101 NCBI Harvard E-journal Archiving ProjectHarvard E-journal Archiving Project The Mellon Foundation funded the Harvard Library to study the feasibility of using one DTD for archiving journal articles. Harvard commissioned Inera, Inc. for the E-Journal Archive DTD Feasibility Study. • Conclusion – yes, it is feasible, but the right DTD does not exist. Recommendations from the study were used in modified PMC DTD. NCBI collaborated with Harvard to broaden the scope of the new PMC DTD to accommodate journals from all disciplines (not just life sciences).
  • 16. 110101 NCBI NLM Journal Article DTDsNLM Journal Article DTDs Establishing Standards from PracticeEstablishing Standards from Practice Archiving and Interchange DTD Purpose is to preserve journal’s intellectual content Written for • ease of conversion (from other DTDs) • completeness (union of current journal DTDs) Journal Publishing DTD A subset of the Archiving DTD Written for • authoring article content • initial tagging of non-XML content • creating consistent structures
  • 17. 110101 NCBI AdoptionAdoption Highwire Press JStor’s Electronic Archiving Initiative Australia’s Commonwealth Scientific and Industrial Research Organization PLoS and other PMC contributors Atypon Systems (over 150 titles) and other conversion vendors and journal service providers Wiley, Nature, Blackwell common format (PXI)
  • 18. 110101 NCBI SupportSupport Complete documentation for both DTDs available online. Established public discussion lists for user questions Generic transformations to HTML and PDF forms of articles Public XML validation tool Working group of leaders in printing and markup industries provides advice on changes to Tagset
  • 19. 110101 NCBI Portable PubMed Central (pPMC)Portable PubMed Central (pPMC) Provides a local mirror of PMC content Updated daily from NCBI Multiple site archiving Provides rendering of PMC XML into HTML Provides searching through NCBI EUtils Provides for controlled local content in presentation Provides first step toward collaborative archiving Collaboration with Microsoft on support
  • 20. 110101 NCBI Previously published books What’s on the Bookshelf?What’s on the Bookshelf? Previously published books New collections Previously published books New collections New content
  • 21. 110101 NCBI Diabetes • Health information with links to molecular data • NIDDK advisors on content • ~ 10,000 users per month • “…a truly valuable resource…” Gene Barrett, President, American Diabetes Association Obesity
  • 22. 110101 NCBI BooksBooks • Authoring in MS Word • Simple mark-up based on Word styles • WordML to XML conversion
  • 23. 110101 NCBI
  • 24. 110101 NCBI BioMedicine Moves to the WebBioMedicine Moves to the Web Electronic Authoring and Distribution of Articles • Linking and annotating factual data as a side effect • Ability to mine data and text together • Richer data “between” supported databases High Throughput Biology generates large datasets stored in public repositories • Common factual data roadmap • Greater transparency • Greater incidental collaboration for discovery New “private” sites for discussion on this armature New products arise from a public infrastructure
  • 25. 110101 NCBI Influenza Anti-viral CompoundsInfluenza Anti-viral Compounds
  • 26. 110101 NCBI Influenza Anti-viral CompoundsInfluenza Anti-viral Compounds
  • 27. 110101 NCBI Influzena Anti-viral/Protein BindingInfluzena Anti-viral/Protein Binding
  • 28. 110101 NCBI Influenza Neuraminidase GeneInfluenza Neuraminidase Gene
  • 29. 110101 NCBI Influenenza Genome ProjectInfluenenza Genome Project
  • 30. 110101 NCBI Influenza Assembly ArchiveInfluenza Assembly Archive

Related Documents