MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce
Yaobin He; Haoyu Tan; Wuman Luo; Huajian Mao; Di Ma; Shengzhong Feng; Jianping Fan
2011
会议名称2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
会议地点Tainan, Taiwan
英文摘要Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scale up of our work are very efficient.
收录类别EI
语种英语
内容类型会议论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/3588]  
专题深圳先进技术研究院_数字所
作者单位2011
推荐引用方式
GB/T 7714
Yaobin He,Haoyu Tan,Wuman Luo,et al. MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce[C]. 见:2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011. Tainan, Taiwan.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace