Abstract:Graph anomaly detection, as a critical task in graph data mining, aims to identify anomalous nodes that significantly differ from the majority in a network. Existing methods for graph anomaly detection typically adopt dataset-specific training paradigms, i.e., training a separate model for each dataset. However, such approaches lack generalization ability across datasets and incur high training costs. To overcome these limitations, recent studies have begun to explore the generalization potential of residual features. These features are derived by computing the difference between a node's own representation and the representation aggregated from its neighbors, effectively filtering out dataset-specific semantic information while preserving information closely related to anomalous patterns. Despite initial progress in this direction, the modeling of residual features still faces the following key challenges: First, when computing the difference between the node’s representations before and after neighborhood propagation, the sparsity of neighbors and potential structural noise can impair the reliability of the results. Second, the representations rely on Graph Neural Network (GNN) to learn local structural relationships, which makes it difficult to capture global dependencies that are also beneficial for anomaly detection, thereby limiting the expressive power of residual features. To address these issues, this paper proposes GRAD, a novel method that jointly captures Global and local Residual information for generalizable graph Anomaly Detection. Specifically, based on GNN for modeling local node relationships, GRAD introduces a linear Transformer module that captures global structural correlations among nodes in the feature space without relying on the original graph topology, thereby producing node representations with global awareness. Then, GRAD transforms the representations into residuals between each node and its neighbors from both global and local perspectives, and integrates them to form dataset-agnostic node representations. Extensive experiments on public graph datasets from diverse domains demonstrate the effectiveness of GRAD.