Josef Küng, Tran Khanh Dang, Trong Nhan Phan,
"An Elastic Approximate Similarity Search in Very Large Datasets with MapReduce"
: Data Management in Cloud, Grid and P2P Systems,, Serie Lecture Notes in Computer Science, Vol. 8648, Springer, Berlin, Heidelberg, Seite(n) 44-57, 11-2014, ISBN: 978-3-319-10066-1
An Elastic Approximate Similarity Search in Very Large Datasets with MapReduce
Sprache des Titels:
Data Management in Cloud, Grid and P2P Systems,
The outbreak of data brings an era of big data and more challenges than ever before to traditional similarity search which has been spread to a wide range of applications. Furthermore, an unprecedented scale of data being processed may be infeasible or may lead to the paralysis of systems due to the slow performance and high overheads. Dealing with such an unstoppable data growth paves the way not only to similarity search consolidates but also to new trends of data-intensive applications. Aiming at scalability, we propose an elastic approximate similarity search that efficiently works in very large datasets. Moreover, our proposed scheme effectively adapts itself to the well-known similarity searches with pairwise documents, pivot document, range query, and k-nearest neighbour query. Last but not least, these methods, together with our filtering strategies, are implemented and verified by experiments on real large data collections in Hadoop showing their promising effectiveness and efficiency.