The Web is not a PERSON, Berners-Lee is not an ORGANIZATION, andAfrican-Americans are notLOCATIONS:An Analysis of the Perf...
The Background Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)
The Background Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE) Various...
The Background Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE) Various...
The Background “There are no well-established standards for evaluation of NER.”
The Background “There are no well-established standards for evaluation of NER.” ◦ Criteria for NER system changes...
The Background KDM wanted to identify MWEs…
The Background KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this.
The Background KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this. IE ...
The Background So, they want to test NER systems, and provide a unit test based on the problems encountered
EvaluationCompared three NER taggers: Stanford: ◦ CRF, 100m training corpus; University of Illinois (LBJ): ◦ Reg...
Evaluation Agreement on Classification
Evaluation Agreement on Classification Ambiguity in Discourse
Evaluation Agreement on Classification Ambiguity in Discourse Stanford vs. LBJ on internal ETS 425m corpus All three ...
Stanford vs. LBJ NER reported as 85-95% accurate.
Stanford vs. LBJ NER reported as 85-95% accurate. Same number for both: 1.95m for Stanford, 1.8m for LBJ (7.6% differe...
Stanford vs. LBJ Agreement:
Stanford vs. LBJ Ambiguity:
Stanford vs. LBJ vs.IdentiFinder Agreement:
Stanford vs. LBJ vs.IdentiFinder Agreement:
Stanford vs. LBJ vs.IdentiFinder Differences: ◦ How they are tokenized ◦ Number of entities recognized overall
Stanford vs. LBJ vs.IdentiFinder Ambiguity:
Unit Test Created two documents that can be used as texts ◦ Different cases for true positives of PERSON, LO...
Unit Test Created two documents that can be used as texts ◦ Terms with prepositions (Mass. Inst. Of Tech.) ...
One NE Tag per Discourse Unusual for multiple occurrences of a token in a document to be different entities True for h...
One NE Tag per Discourse Stanford, LBJ have features for non- local dependencies to help with this. KDM: Two other uses...
Discussion There are guidelines for NER – but we need standards. The community should focus on PERSON, ORGANISATION, ...
Discussion To improve intrinsic evaluation for NER: 1. Create test sets for divers domains. 2. Use standardize...
Conclusion 90% accuracy not real. We need to use only entities that are agreed on by multiple taggers. Even in cases w...
Cheers/PERSONRichard/ORGANISATION thanks theMword Class/LOCATION for listening tohis talk about Berners-Lee/MISC
of 30

Named Entity Recognition - ACL 2011 Presentation

Given for the Multiw
Published on: Mar 3, 2016
Published in: Education      Technology      
Source: www.slideshare.net


Transcripts - Named Entity Recognition - ACL 2011 Presentation

  • 1. The Web is not a PERSON, Berners-Lee is not an ORGANIZATION, andAfrican-Americans are notLOCATIONS:An Analysis of the Performance ofNamed-Entity RecognitionRobert Krovetz (Lexicalresearch.com), Paul Deane, NitinMadnani (ETS)A Review by RichardLittauer (UdS)
  • 2. The Background Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)
  • 3. The Background Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE) Various competitions
  • 4. The Background Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE) Various competitions Recently: ◦ non-English languages ◦ improving unsupervised learning methods
  • 5. The Background “There are no well-established standards for evaluation of NER.”
  • 6. The Background “There are no well-established standards for evaluation of NER.” ◦ Criteria for NER system changes for competitions ◦ Proprietary software
  • 7. The Background KDM wanted to identify MWEs…
  • 8. The Background KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this.
  • 9. The Background KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this. IE derives Recall and Precision from Information Retrieval NER is just a small part of this, so is rarely evaluated independently
  • 10. The Background So, they want to test NER systems, and provide a unit test based on the problems encountered
  • 11. EvaluationCompared three NER taggers: Stanford: ◦ CRF, 100m training corpus; University of Illinois (LBJ): ◦ Regularized average perceptron, Reuters 1996 News Corpus; BBN IdentiFinder (IdentiFinder): ◦ HMMs, commercial
  • 12. Evaluation Agreement on Classification
  • 13. Evaluation Agreement on Classification Ambiguity in Discourse
  • 14. Evaluation Agreement on Classification Ambiguity in Discourse Stanford vs. LBJ on internal ETS 425m corpus All three on American National Corpus
  • 15. Stanford vs. LBJ NER reported as 85-95% accurate.
  • 16. Stanford vs. LBJ NER reported as 85-95% accurate. Same number for both: 1.95m for Stanford, 1.8m for LBJ (7.6% difference) However, errors:
  • 17. Stanford vs. LBJ Agreement:
  • 18. Stanford vs. LBJ Ambiguity:
  • 19. Stanford vs. LBJ vs.IdentiFinder Agreement:
  • 20. Stanford vs. LBJ vs.IdentiFinder Agreement:
  • 21. Stanford vs. LBJ vs.IdentiFinder Differences: ◦ How they are tokenized ◦ Number of entities recognized overall
  • 22. Stanford vs. LBJ vs.IdentiFinder Ambiguity:
  • 23. Unit Test Created two documents that can be used as texts ◦ Different cases for true positives of PERSON, LOCATION, ORGANIZATION ◦ Entirely upper case not NE (Ex. AAARGH) ◦ Punctuated terms not NE ◦ Terms with Initials ◦ Acronyms (some expanded, some not) ◦ Last names in close proximity to first names
  • 24. Unit Test Created two documents that can be used as texts ◦ Terms with prepositions (Mass. Inst. Of Tech.) ◦ Terms with location and organization (Amherst College) Provided freely online.
  • 25. One NE Tag per Discourse Unusual for multiple occurrences of a token in a document to be different entities True for homonyms An exception: Location + sports team
  • 26. One NE Tag per Discourse Stanford, LBJ have features for non- local dependencies to help with this. KDM: Two other uses for NLD: ◦ Source of error in evaluation ◦ A way to identify semantically related entities These should be treated as exceptions
  • 27. Discussion There are guidelines for NER – but we need standards. The community should focus on PERSON, ORGANISATION, LOCATION, and MISC. ◦ Harder to deal with than Dates, Times. ◦ Disagreement between taggers. ◦ MISC is necessary. ◦ These have important value elsewhere.
  • 28. Discussion To improve intrinsic evaluation for NER: 1. Create test sets for divers domains. 2. Use standardized sets for different phenomena. 3. Report accuracy for POL separately. 4. Establish uncertainty in the tagging system.
  • 29. Conclusion 90% accuracy not real. We need to use only entities that are agreed on by multiple taggers. Even in cases where they both disagree (Hint: Future work.) Unit test downloadable.
  • 30. Cheers/PERSONRichard/ORGANISATION thanks theMword Class/LOCATION for listening tohis talk about Berners-Lee/MISC