Abstract:Graph anomaly detection, as a critical task in graph data mining, aims to identify anomalous nodes that significantly differ from the majority of nodes in a network. Existing methods for graph anomaly detection typically adopt dataset-specific training paradigms, i.e., training a separate model for each dataset. However, such methods lack generalization capability across datasets and incur high training costs. To overcome these limitations, recent studies have begun to focus on the generalization potential of residual features. Such features are obtained by computing the difference between a node’s own representation and the representation after neighborhood propagation, which can largely offset dataset-specific semantic information and thus retaingeneral information closely related to anomalous patterns. Despite initial progress in this direction, the modeling of residual features still faces the following key challenges: First, when computing the difference between a node’s representations before and after neighborhood propagation, the sparsity of neighbors and potential structural noise affect the reliability of the results to some extent. Second, the computation of representations relies on graph neural network (GNN) to learn local relationships, which makes it difficult to model global relationships that are also beneficial for anomaly detection, thus limiting the expressive power of residual features. To address these issues, this study proposes GRAD, a generalizable graph anomaly detection method via joint perception of global and local residual information. Specifically, based on GNN for modeling local node relationships, GRAD introduces a linear Transformer module to model global structural correlations among nodes in the feature space without relying on the original graph structure, thus obtaining node representations with global awareness. Then, GRAD transforms the representations into residuals between each node and its neighbors from both global and local perspectives, and integrates them to form dataset-independent general node representations. Extensive experiments on multiple public graph datasets from different domains verify the effectiveness of GRAD.