Efficient Deployment of Large Language Model over Heterogeneous Computing Systems Read more about Efficient Deployment of Large Language Model over Heterogeneous Computing Systems
Efficient Hardware Acceleration of CoFrNets Read more about Efficient Hardware Acceleration of CoFrNets
Efficient Chiplet-based Memory Architecture for AI Hardware Accelerator Read more about Efficient Chiplet-based Memory Architecture for AI Hardware Accelerator
Hardware–Software Co-Design for Unified Pruning and Mixed-Precision Compression of Vision–Language Model Read more about Hardware–Software Co-Design for Unified Pruning and Mixed-Precision Compression of Vision–Language Model
Enabling Efficient Inference and High Accuracy by Exploring Novel Linear-type Attention and KV Cache Optimization Read more about Enabling Efficient Inference and High Accuracy by Exploring Novel Linear-type Attention and KV Cache Optimization
AutoComp: Automated Compression & Deployment for Foundation Models Read more about AutoComp: Automated Compression & Deployment for Foundation Models
Rethinking Retrieval Signals via Hybrid Retrieval Heads Read more about Rethinking Retrieval Signals via Hybrid Retrieval Heads
Automated Design and Optimization of Enterprise-Scale AI Agent Systems Read more about Automated Design and Optimization of Enterprise-Scale AI Agent Systems
Holistic Alignment of Agentic LLM Systems via Lightweight System-Level Objectives Read more about Holistic Alignment of Agentic LLM Systems via Lightweight System-Level Objectives