Josef Küng, Trong Nhan Phan, Khanh Tran Dang,
"eHSim: An Efficient Hybrid Similarity Search with MapReduce"
: Advanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on Advanced Information Networking and Applications, IEEE, Seite(n) 422-429, 2016
eHSim: An Efficient Hybrid Similarity Search with MapReduce
Sprache des Titels:
Advanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on Advanced Information Networking and Applications
n this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.