Abstract:Brain-inspired computing chips of various architectures are emerging, and the inference/training/learning algorithms of spiking neural network (SNN) and the efficient simulation of biological neural networks have become research hotspots. Meanwhile, efficiently executing applications with different computation/memory-access characteristics on various chips remains a significant challenge, which is crucial for establishing arobust brain-inspired computing ecosystem. The success of the general-purpose computing ecosystem indicates that a flexible, scalable, and reusable compiler infrastructure is an effective solution to this problem. This study proposes BIVM, a compilation framework for brain-inspired computing, along with its proof-of-concept implementation. Based on the multi-level intermediate representation (MLIR) framework of domain specific architecture (DSA), multi-layer IRs customized for SNNs are designed, including an SNN dialect, middle-layer IRs composed mainlyof MLIR’s inherent dialects, and the underlying IRs for various target chips. To address challenges such as the large architectural differences and varying granularity of hardware primitives in brain-inspired chips, BIVM leverages MLIR’s progressivity feature. This allows for the mixing of different abstraction levels and concepts (e.g. combining fine-grained instructions with coarse-grained computation based on the crossbar structure specific to certain back-ends), enabling software module reuse and reducing compiler development costs, ultimately leading to high productivity. In addition, the framework provides flexibility to combine various levels of compilation optimizations, including widely-used SNN-specific optimizations (e.g. exploring computing sparsity and improving parallelism) and low-level optimizations tailored to different back-ends, ensuring performance portability. The current BIVM prototype supports back-ends such as general-purpose processors (control-flow architecture), SNN accelerator chips (FPGAs) with a hybrid control-/data-flow architecture, and data-flow chip designs based on ReRAM (resistive random-access memory, a widely-used neuromorphic device). It can optimize and compile deep SNN and biological neural network simulation applications into executables tailored for these chips. Comprehensive testing and performance comparisons demonstrate the potential of this compilation framework in achieving high productivity, portability, and performance.