Ouyang M, Zhang F. CUDA-Optimized Inference Engine for Large-Scale Language Models: Design, Kernels, and Latency Improvements. Journal of Theory and Practice in Engineering and Technology [Internet]. 2025 Sep. 4 [cited 2025 Nov. 16];2(5):1-9. Available from: https://woodyinternational.com/index.php/jtpet/article/view/291