[关键词]
[摘要]
数据中心是重要的信息基础设施,也是企业互联网应用的关键支撑.然而,目前数据中心的服务器资源利用率较低(仅为10%~20%),导致大量的资源浪费,带来了极大的额外运维成本,成为制约各大企业提升计算效能的关键问题.混部(colocation),即将在线作业与离线作业混合部署,以空闲的在线集群资源满足离线作业的计算需求,作为一种重要的技术手段,混部能够有效提升数据中心资源利用率,成为当今学术界和产业界的研究热点.分析了在线作业与离线作业的特征,探讨了在离线作业间性能干扰等混部所面临的技术挑战,从性能干扰模型、作业调度、资源隔离与资源动态分配等方面就在离线混部技术进行了综述,并以业界典型混部管理系统为例探讨了在离线混部关键技术在产业界的应用及其效果,最后对未来的研究方向进行了展望.
[Key word]
[Abstract]
Data center is not only an important IT infrastructure, but also a key support for enterprise Internet application. However, the resource utilization of data center is pretty low (only 10%~20%), which leads to a large amount of waste of resources, brings a huge extra operation and maintenance cost, and becomes a key problem restricting enterprises to improve the computing efficiency. By colocating online services and offline tasks, colocation can effectively improve the resource utilization rate of data center, which has become a research hotspot in academia and industry. This paper analyzes the characteristics of online services and offline tasks, and discusses the technical challenges faced by the performance interference between services and jobs. This paper summarizes the key technologies from the aspects of performance interference model, job scheduling, resource isolation and dynamic resource allocation, and discusses the application and effect of colocation systems in the industry with four typical colocation system. At the end of this paper, the future research direction is presented.
[中图分类号]
[基金项目]
广东省重点领域研发计划(2020B010164003)