Boosting LLM Decode Throughput: vAttention vs. PagedAttention
Table of Links
Abstract and 1 Introduction
2 Background
2.1 Large Language Models
2.2 Fragmentation and PagedAttention
3 Issues with the PagedAttention Model and 3.1 Requires re-writing the attention...