Back to projects

Extracting Biomedical Relations from Text

  • Goal: Develop algorithms using natural language processing techniques to extract biomedical relationships from text.
  • Funding: R01 LM005652 , Grant title: Text mining for high-fidelity curation and discovery of gene-drug-phenotype relations.
  • Links: EBC code is available on github; Gene-Gene application for Deepdive is also available on github.

Publication of results is the primary goal of all scientific research. With this focus on publication, the biomedical literature has become the ultimate source of all known information about drugs, genes, and other biomedical entities. While high quality curated databases provide structured relationships for browsing and download, the majority of these relationships remain buried int the biomedical literature. Our lab is focused on developing text mining approaches for extracting these biomedical relationships from text, both in an unsupervised and supervised manner.

Recent algorithms and text mining tools include:

  • The Ensemble Biclustering for Classification (EBC) algorithm to automatically cluster biomedical relationships from text. Code available at github.
  • A gene-gene extractor for the system DeepDive (collaboration with Chris's Re at Stanford University). Code available at github.

More information about DeepDive can be found at

Want to talk to us about our projects? Let us know!