Experimental evaluations of MapReduce in biomedical text mining
- ,
- Yun Tianb(Author),
- Fangyang Shend(Author),
- John Tranc(Author)
- ,
- bEastern Washington University,
- cFrontier Behavioral Health,
- dNY City College of Tech
Abstract
In this paper, we demonstrate our development of two biomedical text mining applications: biomedical literature search (BLS) and biomedical association mining (BAM). While the former requires less computations, the latter is more computationally intensive. Experimental studies were conducted using Amazon Elastic MapReduce (EMR) with an input of 33,960 biomedical articles from TREC (Text REtrieval Conference) 2006 Genomics Track. Our experiment results indicated that both applications’ scalabilities were not linear in term of the number of computing nodes.Meanwhile, BAM achieved better scalability than BLS since BLS performed less computations and were primarily dominated by overheads such as JVM startup, scheduling, disk I/O, etc. These observations imply that existingMapReduce framework may not be suitable for on-line systems such as literature search that needs quick response.
