Ouyang, M. and Zhang, F. (2025) “CUDA-Optimized Inference Engine for Large-Scale Language Models: Design, Kernels, and Latency Improvements”, Journal of Theory and Practice in Engineering and Technology, 2(5), pp. 1–9. Available at: https://woodyinternational.com/index.php/jtpet/article/view/291 (Accessed: 12 September 2025).