MIA: Metric Importance Analysis for Big Data Workload Characterization
Zhibin Yu; Wen Xiong; Lieven Eeckhout; Zhengdong Bei; Avi Mendelson; Chengzhong Xu
刊名IEEE Transactions on Parallel and Distributed Systems (TPDS)
2017
文献子类期刊论文
英文摘要Data analytics is at the foundation of both high-quality products and services in modern economies and societies. Big data workloads run on complex large-scale computing clusters, which implies significant challenges for deeply understanding and characterizing overall system performance. In general, performance is affected by many factors at multiple layers in the system stack, hence it is challenging to identify the key metrics when understanding big data workload performance. In this paper, we propose a novel workload characterization methodology using ensemble learning, called Metric Importance Analysis (MIA), to quantify the respective importance of workload metrics. By focusing on the most important metrics, MIA reduces the complexity of the analysis without losing information. Moreover, we develop the MIA-based Kiviat Plot (MKP) and Benchmark Similarity Matrix (BSM) which provide more insightful information than the traditional linkage clustering based dendrogram to visualize program behavior (dis)similarity. To demonstrate the applicability of MIA, we use it to characterize three big data benchmark suites: HiBench, CloudRank-D and SZTS. The results show that MIA is able to characterize complex big data workloads in a simple, intuitive manner, and reveal interesting insights. Moreover, through a case study, we demonstrate that tuning the configuration parameters related to the important metrics found by MIA results in higher performance improvements than through tuning the parameters related to the less important ones.
URL标识查看原文
语种英语
内容类型期刊论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/12535]  
专题深圳先进技术研究院_数字所
作者单位IEEE Transactions on Parallel and Distributed Systems (TPDS)
推荐引用方式
GB/T 7714
Zhibin Yu,Wen Xiong,Lieven Eeckhout,et al. MIA: Metric Importance Analysis for Big Data Workload Characterization[J]. IEEE Transactions on Parallel and Distributed Systems (TPDS),2017.
APA Zhibin Yu,Wen Xiong,Lieven Eeckhout,Zhengdong Bei,Avi Mendelson,&Chengzhong Xu.(2017).MIA: Metric Importance Analysis for Big Data Workload Characterization.IEEE Transactions on Parallel and Distributed Systems (TPDS).
MLA Zhibin Yu,et al."MIA: Metric Importance Analysis for Big Data Workload Characterization".IEEE Transactions on Parallel and Distributed Systems (TPDS) (2017).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace