Aisha Kalsoom
Tools and techniques to help researchers cope with the information overload
are therefore needed.
NER tools can be applied...
Concepts, meaning
and representation
Names in text
represent real-life
concepts in our mind
Concept denoted by a
gene name...
• Clone during mapping phase in Human GENOME Project had
up to 15 different names
• FLT4 has four names: PCL; FLT41; LMPH1...
 Lesar, U. and Hakenberg, J. (2005), ‘What makes a gene name? Named entity
recognition in the biomedical literature’, Bri...
of 5

Name Entity Recognition problems in biomedical literature

Named Entity Recognition is one of the vast techniques in Natural Language Processing. NER techniques can be applied on biomedical data but there are some problems which are mentioned in the presentation.
Published on: Mar 3, 2016
Published in: Education      
Source: www.slideshare.net


Transcripts - Name Entity Recognition problems in biomedical literature

  • 1. Aisha Kalsoom
  • 2. Tools and techniques to help researchers cope with the information overload are therefore needed. NER tools can be applied to find all kind of entities, such as gene or protein names, diseases and drugs, mutations or properties of protein structures. Medline database contained approx. 15 million scientific abstracts with a growth rate of about 400,000 articles per year. Identification of proteins or genes is important to find out protein interaction networks.
  • 3. Concepts, meaning and representation Names in text represent real-life concepts in our mind Concept denoted by a gene name is usually not clearly defined No community-wide agreement to name particular gene Supermarket Sonic Hedgehog gene in human p53 2WRU
  • 4. • Clone during mapping phase in Human GENOME Project had up to 15 different names • FLT4 has four names: PCL; FLT41; LMPH1A;VEGFR3 Many genes and proteins have more than one name • Cbp/p300- interactive transactivator • CCAAT/enhancer binding protein, C/EBP alpha Inconsistent use of variations of names • BioCreative Corpus of expert tagged gene names consist of 53% of all names consist of more than one token • HumanT-cell leukaemia lymphotropic virus type 1Tax protein Multi-word names Acronyms are homonyms • SEC stands for • surface epithelial cell • size exclusion chromatography • Selenocystein
  • 5.  Lesar, U. and Hakenberg, J. (2005), ‘What makes a gene name? Named entity recognition in the biomedical literature’, Briefings in Bioinformatics,Vol. 6(4), pp. 357-369.  http://www.bioinformatics.org/textknowledge/acronym.php?textfield=SEC&sub =search  http://www.rcsb.org/pdb/explore/explore.do?structureId=2WRU