Key-Value Cache Compression for Memory-Efficient Large Language Model Inference

Research Area
RPI Principal Investigators
Mohammad Mohammadi Amiri
IBM Principal Investigators
Pin-Yu Chen, Tejaswini Pedapati, Subhajit Chaudhury
Project Year
Back to top