广东省重点领域研究计划(2020B010165002); 国家自然科学基金青年项目(61802448); 广东省自然科学基金面上项目(2019A1515012229); 广州市基础与应用基础研究项目(202002030328)
微服务因其敏捷的开发方式、快速的部署方式, 逐渐成为以云为基础的软件系统的主流架构方式之一. 但是, 微服务系统结构复杂, 动辄上百个服务实例, 而且服务之间的调用关系异常复杂, 当微服务系统中出现异常时, 难以定位故障根因. 为了解决这个问题, 端到端请求追踪(trace)成为微服务系统监控的标配. 然而现有的分布式请求追踪实现方式对应用程序具有侵入性, 严重依赖于开发者对请求追踪的经验, 无法在运行时控制追踪功能的开启和关闭. 这些不足不仅会增加开发者的负担, 而且限制了分布式请求追踪技术的实际应用. 设计并实现了对程序开发者透明的请求追踪系统Trace++, 能够自动生成追踪代码, 利用动态代码插桩技术将追踪代码注入到运行中的应用程序. Trace++对程序低侵入, 对开发者透明, 能够灵活控制追踪功能的开启和关闭. 此外, Trace++的自适应采样方法有效减少了请求追踪产生的开销. 在微服务系统TrainTicket上的实验结果证明, Trace++能够准确发现服务依赖关系. 在开启请求追踪时, 性能开销接近于源代码插桩, 在关闭请求追踪时无性能开销. 此外, Trace++的自适应采样方法在采样到具有代表性样本的同时减少了89.4%的追踪数据.
Microservice is becoming the mainstream architecture of the cloud-based software systems because of its agile development and rapid deployment. However, the structure of a microservice system is complex, it often has hundred of service instances. Moreover, the call relationship between services is extremely complex. When an anomaly occurs in the microservice system, it is difficult to locate the root causes of the anomaly. The end-to-end request tracing method becomes the standard configuration of a microservice system to solve this problem. However, current methods of distributed request tracing are intrusive to applications and heavily rely on the developers’ expertise in request tracing. Besides, it is unable to start or stop the tracing functionality at runtime. These defects not only increase the burden of developers but also restrict the adoption of distributed request tracing technique in practice. This study designs and implements a transparent request tracing system named Trace++, which can generate tracing code automatically and inject the generated code into the running application by using dynamic code instrumentation technology. Trace++ is low intrusive to programs, transparent to developers, and can start or stop the tracing functionality flexibly. In addition, the adaptive sampling method of Trace++ effectively reduces the cost of request tracing. The results of the experiments conducted on TrainTicket, a microservice system, show that Trace++ can discover the dependencies between services accurately and its performance cost is close to the source code instrumentation method when it starts request tracing. When the request tracing functionality is stopped, Trace++ incurs no performance cost. Moreover, the adaptive sampling method can preserve the representative trace data while 89.4% of trace data are reduced.