Abstract:With the rapid growth and further application of deep learning, the scale of DL training keeps increasing and memory insufficiency has become one of the major bottlenecks threatening the DL availability. Memory swapping is the key mechanism that alleviates this memory problem of DL training. This mechanism leverages the "time-varying" memory requirement of DL training when moving the data between specific computing accelerating device memory and external storage. By replacing an accumulated memory requirement with an "instant" one, DL training can be durably supported on computing accelerating devices. This paper surveys on the memory swapping mechanism for DL training from an aspect of the time-varying memory requirement. We start with introductions to a swapping-out mechanism combining the operator characteristics, and a swapping-in mechanism combining the data dependency. Then we present a joint DL performance driven swapping decision mechanism. Finally, we make prospects forecast for the research works in this field.