Multi-Token Prediction: Bridging Training-Inference Mismatch in LLMs
Table of Links
Abstract and 1. Introduction
2. Method
3. Experiments on real data
3.1. Benefits scale with model size and 3.2. Faster inference
3.3. Learning global patterns with multi-byte prediction...