MIA: Metric Importance Analysis for Big Data Workload Characterization | |
Zhibin Yu; Wen Xiong; Lieven Eeckhout; Zhengdong Bei; Avi Mendelson; Chengzhong Xu | |
刊名 | IEEE Transactions on Parallel and Distributed Systems (TPDS) |
2017 | |
文献子类 | 期刊论文 |
英文摘要 | Data analytics is at the foundation of both high-quality products and services in modern economies and societies. Big data workloads run on complex large-scale computing clusters, which implies significant challenges for deeply understanding and characterizing overall system performance. In general, performance is affected by many factors at multiple layers in the system stack, hence it is challenging to identify the key metrics when understanding big data workload performance. In this paper, we propose a novel workload characterization methodology using ensemble learning, called Metric Importance Analysis (MIA), to quantify the respective importance of workload metrics. By focusing on the most important metrics, MIA reduces the complexity of the analysis without losing information. Moreover, we develop the MIA-based Kiviat Plot (MKP) and Benchmark Similarity Matrix (BSM) which provide more insightful information than the traditional linkage clustering based dendrogram to visualize program behavior (dis)similarity. To demonstrate the applicability of MIA, we use it to characterize three big data benchmark suites: HiBench, CloudRank-D and SZTS. The results show that MIA is able to characterize complex big data workloads in a simple, intuitive manner, and reveal interesting insights. Moreover, through a case study, we demonstrate that tuning the configuration parameters related to the important metrics found by MIA results in higher performance improvements than through tuning the parameters related to the less important ones. |
URL标识 | 查看原文 |
语种 | 英语 |
内容类型 | 期刊论文 |
源URL | [http://ir.siat.ac.cn:8080/handle/172644/12535] |
专题 | 深圳先进技术研究院_数字所 |
作者单位 | IEEE Transactions on Parallel and Distributed Systems (TPDS) |
推荐引用方式 GB/T 7714 | Zhibin Yu,Wen Xiong,Lieven Eeckhout,et al. MIA: Metric Importance Analysis for Big Data Workload Characterization[J]. IEEE Transactions on Parallel and Distributed Systems (TPDS),2017. |
APA | Zhibin Yu,Wen Xiong,Lieven Eeckhout,Zhengdong Bei,Avi Mendelson,&Chengzhong Xu.(2017).MIA: Metric Importance Analysis for Big Data Workload Characterization.IEEE Transactions on Parallel and Distributed Systems (TPDS). |
MLA | Zhibin Yu,et al."MIA: Metric Importance Analysis for Big Data Workload Characterization".IEEE Transactions on Parallel and Distributed Systems (TPDS) (2017). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论