Datasize-Aware High Dimensional ConfigurationsAuto-Tuning of In-Memory Cluster Computing
Zhibin Yu; Zhendong Bei; Xuehai Qian
2018
会议日期2018
会议地点Williamsburg, VA, USA
英文摘要In-MemoryclusterComputing(IMC)frameworks(e.g.,Spark) have become increasingly important because they typically achievemorethan10 × speedupsoverthetraditionalOn-Disk cluster Computing (ODC) frameworks for iterative and in- teractive applications. Like ODC, IMC frameworks typically run the same given programs repeatedly on a given cluster with similar input dataset size each time. It is challenging to build performance model for IMC program because: 1) the performance of IMC programs is more sensitive to the size of input dataset, which is known to be difficult to be incorpo- rated into a performance model due to its complex effects on performance; 2) the number of performance-critical configu- ration parameters in IMC is much larger than ODC (more than40vs.around10),thehighdimensionalityrequiresmore sophisticated models to achieve high accuracy. To address this challenge, we propose DAC, a datasize- aware auto-tuning approach to efficiently identify the high dimensionalconfigurationforagivenIMCprogramtoachieve optimal performance on a given cluster. DAC is a significant advance over the state-of-the-art because it can take the size of input dataset and 41 configuration parameters as the pa- rameters of the performance model for a given IMC program, —unprecedentedinpreviouswork.Itismadepossiblebytwo key techniques: 1) Hierarchical Modeling (HM), which com- bines a number of individual sub-models in a hierarchical manner;2)GeneticAlgorithm(GA)isemployedtosearchthe optimal configuration. To evaluate DAC, we use six typical Spark programs, each with five different input dataset sizes. The evaluation results show that DAC improves the perfor- mance of six typical Spark programs, each with five different input dataset sizes compared to default configurations by a factor of 30.4× on average and up to 89× . We also report that the geometric mean speedups of DAC over configurations by default, expert, and RFHOC are 15.4× , 2.3× , and 1.5× , respectively.
语种英语
URL标识查看原文
内容类型会议论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/14158]  
专题深圳先进技术研究院_数字所
推荐引用方式
GB/T 7714
Zhibin Yu,Zhendong Bei,Xuehai Qian. Datasize-Aware High Dimensional ConfigurationsAuto-Tuning of In-Memory Cluster Computing[C]. 见:. Williamsburg, VA, USA. 2018.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace