PERFORMANCE ENHANCEMENT OF CUDA APPLICATIONS BY OVERLAPPING DATA TRANSFER AND KERNEL EXECUTION

Print

ABSTRACT

The CPU-GPU combination is a widely used heterogeneous computing system in which the CPU and GPU have different address spaces. Since the GPU cannot directly access the CPU memory, prior to invoking the GPU function the input data must be available on the GPU memory. On completion of GPU function, the results of computation are transferred to CPU memory. The CPU-GPU data transfer happens through PCI-Express bus. The PCI-E bandwidth is much lesser than that of GPU memory. The speed at which the data is transferred is limited by the PCI-E bandwidth. Hence, the PCI-E acts as a performance bottleneck. In this paper two approaches are discussed to minimize the overhead of data transfer, namely, performing the data transfer while the GPU function is being executed and reducing the amount of data to be transferred to GPU.  The effectiveness of these approaches on the execution time of a set of CUDA applications is realized using CUDA streams. The results of our experiments show that the execution time of applications can be minimized with the proposed approaches.

FULL TEXT

HOW TO CITE THIS PAPER

 

Raju, K., & Chiplunkar, N. N. (2021). Performance enhancement of CUDA applications by overlapping data transfer and kernel execution. Applied Computer Science, 17(3), 5-18. https://doi.org/10.23743/acs-2021-17
Raju, K., and Niranjan N. Chiplunkar. "Performance Enhancement of Cuda Applications by Overlapping Data Transfer and Kernel Execution." Applied Computer Science 17, no. 3 (2021): 5-18.
K. Raju and N. N. Chiplunkar, "Performance enhancement of CUDA applications by overlapping data transfer and kernel execution," Applied Computer Science, vol. 17, no. 3, pp. 5-18, 2021, doi: 10.23743/acs-2021-17.
Raju K, Chiplunkar NN. Performance enhancement of CUDA applications by overlapping data transfer and kernel execution. Applied Computer Science. 2021;17(3):5-18.