[1]
M. Ouyang and F. Zhang, “CUDA-Optimized Inference Engine for Large-Scale Language Models: Design, Kernels, and Latency Improvements”, Journal of Theory and Practice in Engineering and Technology, vol. 2, no. 5, pp. 1–9, Sep. 2025.