OUYANG, Mark; ZHANG, Fengrui. CUDA-Optimized Inference Engine for Large-Scale Language Models: Design, Kernels, and Latency Improvements. Journal of Theory and Practice in Engineering and Technology, [S. l.], v. 2, n. 5, p. 1–9, 2025. Disponível em: https://woodyinternational.com/index.php/jtpet/article/view/291. Acesso em: 12 jul. 2026.