2016 conference paper

Enabling efficient preemption for SIMT architectures with lightweight context switching

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 898–908.

By: Z. Lin n , L. Nyland* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Source: NC State University Libraries
Added: August 6, 2018

Context switching is a key technique enabling preemption and time-multiplexing for CPUs. However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics processing units (GPUs), it is challenging to support context switching due to the massive number of threads, which leads to a huge amount of architectural states to be swapped during context switching. The architectural state of SIMT processors includes registers, shared memory, SIMT stacks and barrier states. Recent works present thread-block-level preemption on SIMT processors to avoid context switching overhead. However, because the execution time of a thread block (TB) is highly dependent on the kernel program. The response time of preemption cannot be guaranteed and some TB-level preemption techniques cannot be applied to all kernel functions. In this paper, we propose three complementary ways to reduce and compress the architectural states to achieve lightweight context switching on SIMT processors. Experiments show that our approaches can reduce the register context size by 91.5% on average. Based on lightweight context switching, we enable instruction-level preemption on SIMT processors with compiler and hardware co-design. With our proposed schemes, the preemption latency is reduced by 59.7% on average compared to the naive approach.