Knowing what we’re
talking about
Robert Stevens
Bio-health Informatics Group
School of Computer Science
University of manc...
We have an item of data
• 27
• 27 what?
• Units, with what is 27
associated?
• Even if I told you, would
we interpret what...
• text
27mm
• text
tail of
27mm
Mouse tail of 27 mm
• … and we can carry on:
Mouse strain, where was
it raised, on what was it
fed, times, dates, etc.
etc...
What is knowledge?
Heterogeneity is rife
• We agree on units (more or less)…
• We don’t agree on much else when it comes to
labels for the en...
Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation...
The Ogden Triangle
“Roast Beef“
Concept
[Ogden, Richards, 1923]
• Humans require words (or at least symbols) to communicat...
We need to know what we’re talking about…
• … if we don’t, our data are useless
• Ifg we are to interpret our data then we...
Manchester Mercury
January 1st 1754 Executed 18
Found Dead 34
Frighted 2
Kill'd by falls and other accidents 55
Kill'd the...
A World of Instances
• The world (of information) is made up of things and lots of them
• Instances, individuals, objects,...
We Put things into
Categories
• All these instances hang about making our world
• Putting these things into categories is ...
We have Labels for the
Categories and their
Instances
• We label categories with symbols: Words
• “Lion” is a category of ...
A Controlled Vocabulary• A specified set of words and phrases for the categories
in which we place instances
• Natural lan...
We also like to Relate Things
Together
• Categories have subcategories
• Instances in one category can be related
in some ...
Categories and sub-
categories
biopolymer
polypeptide Nucleic acid
enzyme
DNA
RNA
Describing Category
Membership
• We can make conditions that any instance must fulfil in order to be a
member of a particu...
Relationships
• These conditions made from a property and a
successor relationship
• isPartOf, hasPart
• isDerivedFrom
• D...
A Structured Controlled
Vocabulary
• Not only can we agree on the
labels we give categories
• Can also agree on how the
in...
A Stronger Definition
• a set of logical axioms designed to account for the intended meaning
of a formal vocabulary used t...
So what is an ontology?
Catalog/
ID
Thesauri
Terms/
glossary
Informal
Is-a
Formal
Is-a
Formal
instance
Frames
(properties)...
What does it all mean
anyway
• To interpret our data we need to know what it is we’re talking
about
• We need to decide th...
All this knowledge needs
representing
• We want this knowledge in a computational form
• To make the knowledge available f...
Web Ontology Language
(OWL)
• W3C recommendation for ontologies for the Semantic
Web
• OWL-DL mapped to a decidable fragme...
What are we saying?
Person
WomanMan
is-ais-a
• Are all instances of Man instances of Person?
• Can an instance of Person b...
What are we saying?
• What kinds of class can fill “has chromosome”?
• How many “Y chromosome” are present?
• Does their h...
OWL represents
classes of
instances
A
B
C
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin domain
• Having a fibronectin domain does not a pho...
Uses of ontologies
Ontologies in software
Problems Ontologies in
Biology Try To Solve
• Provenance – where did it come from, who did it?
• Reproducibility – can I r...
The rise and rise of
ontologies
What are the prospects for
ontologies
of 34

Knowing what we’re talking about

Invited talk at CSIR, pretoria,2013
Published on: Mar 4, 2016
Published in: Science      Education      Technology      
Source: www.slideshare.net


Transcripts - Knowing what we’re talking about

  • 1. Knowing what we’re talking about Robert Stevens Bio-health Informatics Group School of Computer Science University of manchester Oxford Road Manchester United Kingdom M13 9PL Robert.Stevens@manchester.ac.uk
  • 2. We have an item of data • 27 • 27 what? • Units, with what is 27 associated? • Even if I told you, would we interpret what I said in the same way? 27
  • 3. • text 27mm
  • 4. • text tail of 27mm
  • 5. Mouse tail of 27 mm • … and we can carry on: Mouse strain, where was it raised, on what was it fed, times, dates, etc. etc. • All this data is necessary to interpret my original number • Even if that metadata exists, we have to agree on the things the numbers describe mouse tail of 27mm
  • 6. What is knowledge?
  • 7. Heterogeneity is rife • We agree on units (more or less)… • We don’t agree on much else when it comes to labels for the entities in our domain • If we don’t know what we’re talking about…. • It’s difficult to interpret and exchange data and the results from data
  • 8. Categories and Category Labels GO:0000368 U2-type nuclear mRNA 5' splice site recognition spliceosomal E complex formation spliceosomal E complex biosynthesis spliceosomal CC complex formation U2-type nuclear mRNA 5'-splice site recognition
  • 9. The Ogden Triangle “Roast Beef“ Concept [Ogden, Richards, 1923] • Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is only indirectly possible. We do it by creating concepts that refer to things. • The relation between symbols and things has been described in the form of the meaning triangle:
  • 10. We need to know what we’re talking about… • … if we don’t, our data are useless • Ifg we are to interpret our data then we need to know what entities it describes • We need to share data and re-use it • We need to find data; compare data; analyse data • We need to know what we know….
  • 11. Manchester Mercury January 1st 1754 Executed 18 Found Dead 34 Frighted 2 Kill'd by falls and other accidents 55 Kill'd themselves 36 Murdered 3 Overlaid 40 Poisoned 1 Scalded 5 Smothered 1 Stabbed 1 Starved 7 Suffocated 5 Aged 1456 Consumption 3915 Convulsion 5977 Dropsy 794 Fevers 2292 Smallpox 774 Teeth 961 Bit by mad dogs 3 Broken Limbs 5 Bruised 5 Burnt 9 Drowned 86 Excessive Drinking 15 List of diseases & casualties this year 19276 burials 15444 christenings Deaths by centile
  • 12. A World of Instances • The world (of information) is made up of things and lots of them • Instances, individuals, objects, tokens, particulars. • The Earth is a kind of Planet • Robert Stevens (NE 67 41 58 A) is a Person • All the individual Alpha Haemoglobins in my many Instances of Red Blood Cell • Each cell instance in my Body has copies of some 30,000 Genes • A Word, language, idea, etc. • This Table, those Chairs, • Any Thing with “A”, “The”, “That”, etc. before it….
  • 13. We Put things into Categories • All these instances hang about making our world • Putting these things into categories is a fundamental part of human cognition • Psychologists study this as concept formation • The same instances are put into a category
  • 14. We have Labels for the Categories and their Instances • We label categories with symbols: Words • “Lion” is a category of big cat with big teeth • Gene, Protein, Cell, Person, Hydrolase Activity, etc. • …and, as we’ve already seen, each category can have many labels and any particular label can refer to more than one category • Semantic Heterogeneity • “A lion” is an instance in that category • Does the category “Lion” exist? • Lions exist, but the category could just be a human way of talking about lions • … we like putting things into categories
  • 15. A Controlled Vocabulary• A specified set of words and phrases for the categories in which we place instances • Natural language definitions for those words and phrases • A glossary defines, but doesn’t control • The Uniprot keywords define and control • Control is placed upon which labels are used to represent the categories (concepts) we’ve used to describe the instances in the world • …, but there is nothing about how things in these categories are related Biopolymer DNA Enzyme Nucleic acid mRNA Polypeptide snRNA tRNA
  • 16. We also like to Relate Things Together • Categories have subcategories • Instances in one category can be related in some way to instances in another • Can relate instances to each other in many different ways • Is-a, part-of, develops-from, etc.axes • We can use these relationships to classify categories • Things in category A are part is • If all instances in category A are also in category B then As are kinds of Bs Biopolymer Nucleic Acid Polypeptide Enzym e DNA RNA tRNA mRNA smRNA
  • 17. Categories and sub- categories biopolymer polypeptide Nucleic acid enzyme DNA RNA
  • 18. Describing Category Membership • We can make conditions that any instance must fulfil in order to be a member of a particular category • A Phosphatase must have a phosphatase catalytic domain • A Receptor must have a transmembrane domain • A codon has three nucleotide residues • A limb has part that is a joint • A man has a Y chromosome and an X chromosome • A woman has only an X chromosome
  • 19. Relationships • These conditions made from a property and a successor relationship • isPartOf, hasPart • isDerivedFrom • DevelopsFrom • isHomologousTo • …and many, many more
  • 20. A Structured Controlled Vocabulary • Not only can we agree on the labels we give categories • Can also agree on how the instances of categories are related • And agree on the labels we give he relations • Structure aids querying and captures knowledge with greater fidelity Biopolymer Nucleic Acid Polypeptide Enzym e DNA RNA tRNA mRNA smRNA Gene transcribedFrom
  • 21. A Stronger Definition • a set of logical axioms designed to account for the intended meaning of a formal vocabulary used to describe a certain (conceptualisation of) reality [described in an information system) [Guarino 1998] • “conceptualisation of” inserted by me • “Logical axioms” means a formal definition of meaning of terms in a formal language • Formal language—something a computer an reason with • Use symbols to make inferences • Symbols represent things and their relationships • Making inferences about things computationally
  • 22. So what is an ontology? Catalog/ ID Thesauri Terms/ glossary Informal Is-a Formal Is-a Formal instance Frames (properties) General Logical constraints Value restrictions Disjointness, Inverse, partof Gene Ontology Mouse Anatomy EcoCyc PharmGKB TAMBIS Arom After Chris Welty et al
  • 23. What does it all mean anyway • To interpret our data we need to know what it is we’re talking about • We need to decide the things that we’re talking about and agree upon them • We need to agree on how to recognise those entities • We need to know how they are related to one another • Ontologies are a mechanism for describing those entities and their definitions • There’s more to knowledge representation than ontologies…
  • 24. All this knowledge needs representing • We want this knowledge in a computational form • To make the knowledge available for software (and humans) • To help us develop and manage the (often) complex artefacts Building ontologies is hard (getting all those relationships in the right place) The Web Ontology Language (OWL) is a W3C recommendation for ontologies on the Semantic Web and in semantically enabled applications A knowledge representation language with a strict semantics that is amenable to autoamted reasoning
  • 25. Web Ontology Language (OWL) • W3C recommendation for ontologies for the Semantic Web • OWL-DL mapped to a decidable fragment of first order logic • Classes, properties and instances • Boolean operators, plus existential and universal quantification • Rich class expressions used in restriction on properties – hasDomain some (ImnunoGlobinDomain or FibronectinDomain)
  • 26. What are we saying? Person WomanMan is-ais-a • Are all instances of Man instances of Person? • Can an instance of Person be both a Man and an instance of Woman? • Can there be any more kinds of Person?
  • 27. What are we saying? • What kinds of class can fill “has chromosome”? • How many “Y chromosome” are present? • Does their have to be a “Y chromosome”? • What properties are sufficient to be a Man and which are simply necessary? Y chromosomeMan has-chromosome Y chromosomeMan has-chromosome X chromosomehas-chromosome autosomehas-chromosome 1 1 44
  • 28. OWL represents classes of instances A B C
  • 29. Necessity and Sufficiency • An R2A phosphatase must have a fibronectin domain • Having a fibronectin domain does not a phosphatase make • Necessity -- what must a class instance have? • Any protein that has a phosphatase catalytic domain is a phosphatase enzyme • All phosphatase enzymes have a catalytic domain • Sufficiency – how is an instance recognised to be a member of a class?
  • 30. Uses of ontologies
  • 31. Ontologies in software
  • 32. Problems Ontologies in Biology Try To Solve • Provenance – where did it come from, who did it? • Reproducibility – can I repeat and find results reported? • Sharing – can others understand your data? • Integration – can I readily take multiple (thousands of) data sets and use them without preparation? • New knowledge – can we infer new knowledge as a sum of current knowledge (computationally)?
  • 33. The rise and rise of ontologies
  • 34. What are the prospects for ontologies