Skip to search boxSkip to navigationSkip to main content

High-Performance Biomedical Association Mining with MapReduce

  • Yanqing Jia(Author)
    ,
  • Yun Tianb(Author)
    ,
  • Fangyang Shend(Author)
    ,
  • John Tranc(Author)
Research Output: Chapter in Book/Report/Conference proceeding Conference contribution

Abstract

MapReduce has been applied to data-intensive applications in different domains because of its simplicity, scalability and fault-tolerance. However, its uses in biomedical association mining are still very limited. In this paper, we investigate using MapReduce to efficiently mine the associations between biomedical terms extracted from a set of biomedical articles. First, biomedical terms were obtained by matching text to Unified Medical Language System (UMLS) Metathesaurus, a biomedical vocabulary and standard database. Then we developed a MapReduce algorithm that could be used to calculate a category of interestingness measures defined on the basis of a 2x2 contingency table. This algorithm consists of two MapReduce jobs and takes a stripes approach to reduce the number of intermediate results. Experiments were conducted using Amazon Elastic MapReduce (EMR) with an input of 3610 articles retrieved from two biomedical journals. Test results indicate that our algorithm has linear scalability.