Abstract:With the advent of the big data era, massive volumes of user data have empowered numerous data-driven industry applications, such as smart grids, intelligent transportation, and product recommendations. In scenarios where real-time data is crucial, the business value embedded within data rapidly diminishes over time. Consequently, data analysis systems require high throughput and low latency. Stream-based big data processing systems, exemplified by Apache Flink, have been widely adopted. Flink enhances system throughput by parallelizing computing tasks across cluster nodes. However, existing research indicates that Flink suffers from single-point performance weaknesses and poor cluster scalability. To improve the throughput of streaming big data processing systems, researchers have focused on optimizations in control plane design, system operator implementation, and inter-task information sharing. However, there is still a lack of attention to the data flow in streaming analysis applications. These applications, driven by event streams and employing stateful processing functions, include low voltage detection in smart grids and advertisement campaign analysis in product recommendations. This paper analyzes the data flow characteristics of typical streaming analysis applications, identifying three scalability bottlenecks and proposing corresponding optimization strategies: Key-level Watermark Strategy, Dynamic Load Distribution Strategy and Low-Overhead Data Exchange Strategy. Based on these optimization techniques, this paper implements Trilink based on Flink and applies it to low voltage detection applications, bridge arch crowns monitoring application and the Yahoo Streaming Benchmark. Experimental results show that compared to native Flink, the modified system, Trilink, achieves more than a 6-fold increase in throughput in a single-machine environment and over a 1.6-fold improvement in horizontal scaling acceleration in an 8-node setup.