Natural Language Queries over
Heterogeneous Linked Data Graphs:
A Distributional-Compositional Semantics Approach
André Fr...
Talking to your (Big) Data
Motivation
Shift in the Database Landscape
 Heterogeneous, complex and large-scale databases.
 Very-large and dynamic “schemas”.
ci...
Databases for a Complex World
How do you query data on this scenario?
Vocabulary Problem for Databases
Query: Who is the daughter of Bill Clinton married to?
Semantic Gap
Possible representati...
Semantics for a Complex World
Formal World
Real World
Distributional Semantics
Query Approach
Does it work?
Addressing the Vocabulary Problem for
Databases (with Distributional Semantics)
Gaelic: direction
Solution (Video)
More Complex Queries (Video)
Treo Answers Jeopardy Queries (Video)
http://bit.ly/1hWcch9
Evaluation
 102 natural language queries (Test Collection: QALD 2011).
 Avg. query execution time: 1.52 s (simple querie...
Comparative Evaluation
Query Approach
Distributional Semantics
“Words occurring in similar (linguistic) contexts are
semantically related.”
 If we can equate ...
Distributional Semantic Model
function (number of times that the words occur in c1)
c1
0.7
0.5
husband
spouse
cn
c2
ch...
Semantic Relatedness
c1
husband
spouse
Works as a semantic ranking function
θ
cn
c2
child
Approach Overview
Query
Query Analysis
Query Features
Query Planner
Query Plan
Core semantic approximation &
composit...
Approach Overview
Query
Query Analysis
Query Features
Query Planner
Query Plan
Core semantic approximation &
composit...
Ƭ-Space
r
p
e
Core Operations
Query
Core Operations
Query
Search &
Composition
Operations
Search and Composition Operations

Instance search
- Proper nouns
- String similarity + node cardinality

Class (unar...
Core Principles
 Minimize the impact of Ambiguity, Vagueness, Synonymy.
 Address the simplest matchings first (heuristic...
Question Analysis
Transform natural language queries into triple
patterns
“Who is the daughter of Bill Clinton married to?...
Query Plan
Map query features into a query plan.
A query plan contains a sequence of core operations.
(INSTANCE)
(PREDICA...
Instance Search
Query:
Bill Clinton
daughter
Instance Search
Linked
Data:
:Bill_Clinton
married to
Predicate Search
Query:
Linked
Data:
Bill Clinton
daughter
married to
:child
:Bill_Clinton
:Chelsea_Clinton
:religi...
Predicate Search
Query:
Bill Clinton
daughter
married to
Which properties are semantically related to „daughter‟?
Lin...
Navigate
Query:
Linked
Data:
Bill Clinton
daughter
:child
:Bill_Clinton
:Chelsea_Clinton
married to
Navigate
Query:
Linked
Data:
Bill Clinton
daughter
:child
:Bill_Clinton
:Chelsea_Clinton
(PIVOT ENTITY)
married to
Predicate Search
Query:
Linked
Data:
Bill Clinton
daughter
:spouse
:child
:Bill_Clinton
married to
:Chelsea_Clinto...
Results
Conclusions
 The compositional-distributional model supports a schemaagnostic natural language query mechanism over a lar...
of 35

Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach

The demand to access large amounts of heterogeneous structured data is emerging as a trend for many users and applications. However, the effort involved in querying heterogeneous and distributed third-party databases can create major barriers for data consumers. At the core of this problem is the semantic gap between the way users express their information needs and the representation of the data. This work aims to provide a natural language interface and an associated semantic index to support an increased level of vocabulary independency for queries over Linked Data/Semantic Web datasets, using a distributional-compositional semantics approach. Distributional semantics focuses on the automatic construction of a semantic model based on the statistical distribution of co-occurring words in large-scale texts. The proposed query model targets the following features: (i) a principled semantic approximation approach with low adaptation effort (independent from manually created resources such as ontologies, thesauri or dictionaries), (ii) comprehensive semantic matching supported by the inclusion of large volumes of distributional (unstructured) commonsense knowledge into the semantic approximation process and (iii) expressive natural language queries. The approach is evaluated using natural language queries on an open domain dataset and achieved avg. recall=0.81, mean avg. precision=0.62 and mean reciprocal rank=0.49.
Published on: Mar 3, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach

  • 1. Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach André Freitas and Edward Curry Insight Centre for Data Analytics International Conference on Intelligent User Interfaces Haifa, 2014
  • 2. Talking to your (Big) Data
  • 3. Motivation
  • 4. Shift in the Database Landscape  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”. circa 2014 circa 2000 10s-100s attributes 1,000s-1,000,000s attributes
  • 5. Databases for a Complex World How do you query data on this scenario?
  • 6. Vocabulary Problem for Databases Query: Who is the daughter of Bill Clinton married to? Semantic Gap Possible representations Semantic approximation = Commonsense Knowledge
  • 7. Semantics for a Complex World Formal World Real World Distributional Semantics Query Approach
  • 8. Does it work?
  • 9. Addressing the Vocabulary Problem for Databases (with Distributional Semantics) Gaelic: direction
  • 10. Solution (Video)
  • 11. More Complex Queries (Video)
  • 12. Treo Answers Jeopardy Queries (Video) http://bit.ly/1hWcch9
  • 13. Evaluation  102 natural language queries (Test Collection: QALD 2011).  Avg. query execution time: 1.52 s (simple queries) – 8.53 s (all queries). Dataset (DBpedia 3.7 + YAGO): 45,767 predicates, 5,556,492 classes and 9,434,677 instances
  • 14. Comparative Evaluation
  • 15. Query Approach
  • 16. Distributional Semantics “Words occurring in similar (linguistic) contexts are semantically related.”  If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus).  This can then be used as a surrogate of its semantic representation.
  • 17. Distributional Semantic Model function (number of times that the words occur in c1) c1 0.7 0.5 husband spouse cn c2 child Commonsense is here
  • 18. Semantic Relatedness c1 husband spouse Works as a semantic ranking function θ cn c2 child
  • 19. Approach Overview Query Query Analysis Query Features Query Planner Query Plan Core semantic approximation & composition operations Ƭ-Space Database Distributional semantics Large-scale unstructured data Commonsense knowledge
  • 20. Approach Overview Query Query Analysis Query Features Query Planner Query Plan Core semantic approximation & composition operations Ƭ-Space RDF Data Explicit Semantic Analysis (ESA) Wikipedia Commonsense knowledge
  • 21. Ƭ-Space r p e
  • 22. Core Operations Query
  • 23. Core Operations Query Search & Composition Operations
  • 24. Search and Composition Operations  Instance search - Proper nouns - String similarity + node cardinality  Class (unary predicate) search - Nouns, adjectives and adverbs - String similarity + Distributional semantic relatedness  Property (binary predicate) search - Nouns, adjectives, verbs and adverbs - Distributional semantic relatedness  Navigation  Extensional expansion - Expands the instances associated with a class.  Operator application - Aggregations, conditionals, ordering, position   Disjunction & Conjunction Disambiguation dialog (instance, predicate)
  • 25. Core Principles  Minimize the impact of Ambiguity, Vagueness, Synonymy.  Address the simplest matchings first (heuristics).  Semantic Relatedness as a primitive operation.  Distributional semantics as commonsense knowledge.
  • 26. Question Analysis Transform natural language queries into triple patterns “Who is the daughter of Bill Clinton married to?” Bill Clinton daughter married to PODS (INSTANCE) (PREDICATE) (PREDICATE) Query Features
  • 27. Query Plan Map query features into a query plan. A query plan contains a sequence of core operations. (INSTANCE) (PREDICATE) (PREDICATE)  (1) INSTANCE SEARCH (Bill Clinton)  (2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)  (3) e1 <- NAVIGATE (Bill Clintion, p1)  (4) p2 <- SEARCH PREDICATE (e1, married to)  (5) e2 <- NAVIGATE (e1, p2) Query Features Query Plan
  • 28. Instance Search Query: Bill Clinton daughter Instance Search Linked Data: :Bill_Clinton married to
  • 29. Predicate Search Query: Linked Data: Bill Clinton daughter married to :child :Bill_Clinton :Chelsea_Clinton :religion :Baptists :almaMater ... (PIVOT ENTITY) :Yale_Law_School (ASSOCIATED TRIPLES)
  • 30. Predicate Search Query: Bill Clinton daughter married to Which properties are semantically related to „daughter‟? Linked Data: :child :Bill_Clinton :Chelsea_Clinton :religion ... :Baptists sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 :almaMater :Yale_Law_School sem_rel(daughter,alma mater)=0.001
  • 31. Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton married to
  • 32. Navigate Query: Linked Data: Bill Clinton daughter :child :Bill_Clinton :Chelsea_Clinton (PIVOT ENTITY) married to
  • 33. Predicate Search Query: Linked Data: Bill Clinton daughter :spouse :child :Bill_Clinton married to :Chelsea_Clinton (PIVOT ENTITY) :Mark_Mezvinsky
  • 34. Results
  • 35. Conclusions  The compositional-distributional model supports a schemaagnostic natural language query mechanism over a large schema (open domain) database  Comprehensive and accurate semantic matching - Avg. recall=0.81, map=0.62, mrr=0.49  Medium-high expressivity - 80% of queries answered  Interactive query execution time - Avg. 1.52 s (simple queries) – 8.53 s (all queries) / query  Better recall and query coverage compared to baselines with equivalent precision  Low adaptation effort for new datasets

Related Documents