Natural Language Processing
With Graph Databases
DataDay Texas
January 2016
William Lyon
@lyonwj
About
Software Developer @Neo4j
william.lyon@neo4j.com
@lyonwj
lyonwj.com
William Lyon
Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Gr...
Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Gr...
Intro to Graph Databases / Neo4j
Charts
Charts Graphs
Neo4j
Graph Database
• Property graph data model
• Nodes and relationships
• Native graph processing
• Cypher query langua...
The Whiteboard Model Is the Physical Model
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerso...
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relatio...
Cypher: Graph Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROP...
“So what does this have to do with NLP?”
“Am I in the wrong talk?”
“I thought this was going to be about text processing….”
Natural Language Processing With Graphs
Natural Language Processing With Graphs
Uncovering meaning from text using a graph data model.
Representing Text As A Graph
“Nearly all text processing starts
by transforming text into vectors.”
- Matt Biddulph
www.ha...
Representing text as a graph
Text Adjacency Graph
Representing text as a graph
Text Adjacency Graph
My cat eats fish on Saturday.
Convert to array of words
Iterate with counter variable i,
from 0 to number of words - 2
Get or create node for
words at index i and i+1
Create :NEXT relationship
Representing A Text Corpus As A Graph
Add followship frequency
Add word counts
Query Word frequency
Query Word pair frequencies (colocation)
NLP Tasks
Mining Word Associations
Word Associations
• Paradigmatic
• words that can be substituted
• “Monday” <—> “Thursday”
• “cat” <—> “dog”
• Syntagmatic...
Computing Paradigmatic Similarity
1. Represent each word by its context
2. Compute context similarity
3. Words with high c...
Paradigmatic Similarity
1. Represent each word by its context
Paradigmatic Similarity
1. Represent each word by its context
Paradigmatic Similarity
1. Represent each word by its context
Left1 Right1
Paradigmatic Similarity
2. Compute context similarity
Paradigmatic Similarity
2. Compute context similarity
Paradigmatic Similarity
2. Compute context similarity
www.lyonwj.com/2015/06/16/nlp-with-neo4j/

Paradigmatic Similarity
3. Find words with high context similarity
http://earthlab.uoi.gr/theste/index.php/theste/article/...
Paradigmatic Similarity
Example
http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/
https://github.com/johnymontana/nlp-graph...
Graph Based Summarization
and Keyword Extraction
image credit: https://en.wikipedia.org/wiki/PageRank
https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
http...
Summarization
Opinion mining
• Opinion mining
• Summarize major opinions
• Concise and readable
• Major complaints /
compliments
http://kavita-ganesan.com/opinosis
1.Graph based representation of
review corpus
2.Find and score candidate
summaries
3.Se...
Opinion Mining - Example
• Best Buy API
• Product reviews by SKU
Opinion Mining - Example
Opinion Mining - Example
Opinion Mining - Example
1.Graph based representation
of review corpus
2.Find and score candidate
summaries
3.Select top s...
Opinion Mining - Example
Find highest ranked paths of 2-5 words
Opinion Mining - Demo
“Easy to read in sunlight”
“Comfortable great sound quality”
“I love this washer”
Opinion Mining - Demo
“Bought this smart TV for the price”
“Easy to use this vacuum”
Opinion Mining - Demo
• iPython notebook
https://github.com/johnymontana/nlp-graph-notebooks
Content Recommendation
Content recommendation
“Networks give structure to the conversation
while content mining gives meaning.”
http://breakthrou...
Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the...
Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the...
The article graph - data model
Building the article graph
• Articles users have shared
• Extract keywords using newspaper3k
python library
• Insert in th...
The article graph - example
What are the keywords of the articles I liked?
Summary
• Property graph model
• Represent text as a graph
• Word associations
• Opinion mining
• Content recommendation
Resources
graphdatabases.com
Resources
• http://kavita-ganesan.com/opinosis
• http://jexp.de/blog/2015/01/natural-language-
analytics-made-simple-and-v...
Opinion Mining
• “Opinosis: A Graph Based Approach to Abstractive
Summarization of Highly Redundant Opinions”
• - Kavita G...
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
of 71

Natural Language Processing with Graph Databases and Neo4j

Originally presented at DataDay Texas in Austin, this presentation shows how a graph database such as Neo4j can be used for common natural language processing tasks, such as building a word adjacency graph, mining word associations, summarization and keyword extraction and content recommendation.
Published on: Mar 3, 2016
Published in: Data & Analytics      
Source: www.slideshare.net


Transcripts - Natural Language Processing with Graph Databases and Neo4j

  • 1. Natural Language Processing With Graph Databases DataDay Texas January 2016 William Lyon @lyonwj
  • 2. About Software Developer @Neo4j william.lyon@neo4j.com @lyonwj lyonwj.com William Lyon
  • 3. Agenda • Brief intro to graph databases / Neo4j • Representing text as a graph • NLP tasks • Mining word associations • Graph based summarization and keyword extraction • Content recommendation
  • 4. Agenda • Brief intro to graph databases / Neo4j • Representing text as a graph • NLP tasks • Mining word associations • Graph based summarization and keyword extraction • Content recommendation Survey of NLP methods with graphs
  • 5. Intro to Graph Databases / Neo4j
  • 6. Charts
  • 7. Charts Graphs
  • 8. Neo4j Graph Database • Property graph data model • Nodes and relationships • Native graph processing • Cypher query language
  • 9. The Whiteboard Model Is the Physical Model
  • 10. Relational Versus Graph Models Relational Model Graph Model KNOWS KNOWS KNOWS ANDREAS TOBIAS MICA DELIA Person FriendPerson-Friend ANDREAS DELIA TOBIAS MICA
  • 11. Property Graph Model Components Nodes • The objects in the graph • Can have name-value properties • Can be labeled Relationships • Relate nodes by type and direction • Can have name-value properties CAR DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: 
 Jan 10, 2011 brand: “Volvo” model: “V70” LOVES LOVES LIVES WITH OW NS PERSON PERSON
  • 12. Cypher: Graph Query Language CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} ) LOVES Dan Ann LABEL PROPERTY NODE NODE LABEL PROPERTY
  • 13. “So what does this have to do with NLP?” “Am I in the wrong talk?” “I thought this was going to be about text processing….”
  • 14. Natural Language Processing With Graphs
  • 15. Natural Language Processing With Graphs Uncovering meaning from text using a graph data model.
  • 16. Representing Text As A Graph “Nearly all text processing starts by transforming text into vectors.” - Matt Biddulph www.hackdiary.com
  • 17. Representing text as a graph Text Adjacency Graph
  • 18. Representing text as a graph Text Adjacency Graph
  • 19. My cat eats fish on Saturday.
  • 20. Convert to array of words
  • 21. Iterate with counter variable i, from 0 to number of words - 2
  • 22. Get or create node for words at index i and i+1
  • 23. Create :NEXT relationship
  • 24. Representing A Text Corpus As A Graph
  • 25. Add followship frequency
  • 26. Add word counts
  • 27. Query Word frequency
  • 28. Query Word pair frequencies (colocation)
  • 29. NLP Tasks
  • 30. Mining Word Associations
  • 31. Word Associations • Paradigmatic • words that can be substituted • “Monday” <—> “Thursday” • “cat” <—> “dog” • Syntagmatic • words that can be combined with each other • “cold”, “weather” • colocations
  • 32. Computing Paradigmatic Similarity 1. Represent each word by its context 2. Compute context similarity 3. Words with high context similarity likely have paradigmatic relation
  • 33. Paradigmatic Similarity 1. Represent each word by its context
  • 34. Paradigmatic Similarity 1. Represent each word by its context
  • 35. Paradigmatic Similarity 1. Represent each word by its context Left1 Right1
  • 36. Paradigmatic Similarity 2. Compute context similarity
  • 37. Paradigmatic Similarity 2. Compute context similarity
  • 38. Paradigmatic Similarity 2. Compute context similarity www.lyonwj.com/2015/06/16/nlp-with-neo4j/
  • 39. Paradigmatic Similarity 3. Find words with high context similarity http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus
  • 40. Paradigmatic Similarity Example http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/ https://github.com/johnymontana/nlp-graph-notebooks https://class.coursera.org/textanalytics-001
  • 41. Graph Based Summarization and Keyword Extraction
  • 42. image credit: https://en.wikipedia.org/wiki/PageRank https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf https://github.com/summanlp/textrank Keyword Extraction
  • 43. Summarization Opinion mining
  • 44. • Opinion mining • Summarize major opinions • Concise and readable • Major complaints / compliments
  • 45. http://kavita-ganesan.com/opinosis 1.Graph based representation of review corpus 2.Find and score candidate summaries 3.Select top scoring candidates as summary
  • 46. Opinion Mining - Example • Best Buy API • Product reviews by SKU
  • 47. Opinion Mining - Example
  • 48. Opinion Mining - Example
  • 49. Opinion Mining - Example 1.Graph based representation of review corpus 2.Find and score candidate summaries 3.Select top scoring candidates as summary
  • 50. Opinion Mining - Example Find highest ranked paths of 2-5 words
  • 51. Opinion Mining - Demo “Easy to read in sunlight” “Comfortable great sound quality” “I love this washer”
  • 52. Opinion Mining - Demo “Bought this smart TV for the price” “Easy to use this vacuum”
  • 53. Opinion Mining - Demo • iPython notebook https://github.com/johnymontana/nlp-graph-notebooks
  • 54. Content Recommendation
  • 55. Content recommendation “Networks give structure to the conversation while content mining gives meaning.” http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/ - Preriit Souda
  • 56. Using Data Relationships for Recommendations Content-based filtering Recommend items based on what users have liked in the past Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others Movie Person Person RATED SIMILARITY rating: 7 value: .92
  • 57. Using Data Relationships for Recommendations Content-based filtering Recommend items based on what users have liked in the past Movie Person Person RATED SIMILARITY rating: 7 value: .92
  • 58. The article graph - data model
  • 59. Building the article graph • Articles users have shared • Extract keywords using newspaper3k python library • Insert in the graph • Scrape additional articles https://github.com/johnymontana/nlp-graph-notebooks
  • 60. The article graph - example
  • 61. What are the keywords of the articles I liked?
  • 62. Summary • Property graph model • Represent text as a graph • Word associations • Opinion mining • Content recommendation
  • 63. Resources
  • 64. graphdatabases.com
  • 65. Resources • http://kavita-ganesan.com/opinosis • http://jexp.de/blog/2015/01/natural-language- analytics-made-simple-and-visual-with-neo4j/ • https://github.com/johnymontana/nlp-graph-notebooks
  • 66. Opinion Mining • “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions” • - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University of Illinois at Urbana-Champaign • Multi-sentence compression: Finding shortest paths in word graphs • - Proceedings of the 23rd International Conference on Computational Linguistics. COLING 10. Beijing, Cina Aug23-27, 2010. Katy Fillipova

Related Documents