Return to Article Details
CUDA-Optimized Inference Engine for Large-Scale Language Models: Design, Kernels, and Latency Improvements
Download
Download PDF