AI Hardware-Software Co-design

Deep neural networks (DNNs) have driven significant breakthroughs but also raised concerns due to their increasing computational and energy demands. With the limitations of Moore’s Law and Dennard scaling, energy-efficient solutions using emerging hardware, approximate computing, and in-memory techniques are crucial for AI systems. This research focuses on hardware-software co-design strategies to improve AI efficiency across platforms like data centers, edge, and embedded devices. Key areas include optimizing training and inference algorithms for large models, model quantization, compression techniques, distributed training, and hardware-aware neural architecture. Additionally, the exploration of analog AI accelerators aims to enhance efficiency, addressing challenges like precision and integration while focusing on energy performance and model accuracy

Projects

2025

Project RPI Principal Investigators IBM Principal Investigators
Bringing AI Intelligence to 5G/6G Edge Platform Ish Jain, Ali Tajer Alberto Gracia, Kaoutar El Maghraoui
Co-Designing Analog AI System and Accelerator for Large Foundation Models Liu Liu, Meng Wang Sidney Tsai, Kaoutar El Maghraoui
Holistic Algorithm-Architecture Co-Design of Approximate Computing for Scalable Foundation Models Tong Zhang, Liu Liu Swagath Venkataramani, Sanchari Sen
Low-precision Distributed Accelerated Methods and Library Development for Training and Fine-tuning Foundation Models Yangyang Xu, George Slota Jie Chen, Naigang Wang
Closing the Accuracy Gap in Analog In-memory Training: Device-dependent Algorithms and Hyperparameter Search Tianyi Chen, Liu Liu Tayfun Gokmen, Omobayode Fagbohungbe
Optimization of Hardware-based Neural Network Accelerators for Fluorescence Lifetime in Biomedical Applications Xavier Intes, Vikas Pandey Karthik Swaminathan
Model Optimization and Hardware-aware Neural Architecture Search for Spatiotemporal Data Mining Yinan Wang, Liu Liu Kaoutar El Maghraoui
Efficient Deployment of Large Language Model over Heterogeneous Computing Systems Meng Wang, Tong Zhang Kaoutar El Maghraoui, Naigang Wang

2026

Project RPI Principal Investigators IBM Principal Investigators
Rethinking Retrieval Signals via Hybrid Retrieval Heads Stacy Patterson Wei Sun, Radu Florian, Yulong Li
Integrated Sensing and Communication with AI-RAN Platform Ish Jain, Ali Tajer Alberto Gracia, Kaoutar El Maghraoui, Arun Paidimarri
AutoComp: Automated Compression & Deployment for Foundation Models Ruimin Ke Kaoutar El Maghraoui, Naigang Wang
Enabling Efficient Inference and High Accuracy by Exploring Novel Linear-type Attention and KV Cache Optimization Yangyang Xu, George Slota Jie Chen, Naigang Wang
Hardware–Software Co-Design for Unified Pruning and Mixed-Precision Compression of Vision–Language Model Meng Wang, Liu Liu Kaoutar El Maghraoui, Pin-Yu Chen
Efficient Chiplet-based Memory Architecture for AI Hardware Accelerator Kanad Basu, Liu Liu Pradip Bose, Karthik Swaminathan, Nandhini Chandramoorthy, Gracen Wallace, Xin Zhang
Efficient Hardware Acceleration of CoFrNets Kanad Basu Amit Dhurandhar, Ruchir Puri, Pradip Bose, Karthikeyan Natesan Ramamurthy, Karthik Swaminathan
Efficient Deployment of Large Language Model over Heterogeneous Computing Systems Tong Zhang, Meng Wang Kaoutar El Maghraoui, Naigang Wang
Exploring Analog-Aware Learning and Architectures with Hardware Support for Next-Generation Foundation Models Liu Liu Sidney Tsai, Kaoutar El Maghraoui
Hardware–Software Co-Design of Efficient Spatiotemporal Transformers and Mixture-of-Experts on IBM Hardware Yinan Wang, Liu Liu Kaoutar El Maghraoui, Pin-Yu Chen
KV-cache Management for Improving Run-time efficiency of Large Reasoning Models Mohammad Mohammadi Amiri Pin-Yu Chen,Tejaswini Pedapati, Subhajit Chaudhury, Keerthiram Murugesan, Kaoutar El Maghraoui, Naigang Wang, and Charlie Liu

2024

Project RPI Principal Investigators IBM Principal Investigators
Algorithmic Innovations and Architectural Support towards In-Memory Training on Analog AI Accelerators Tianyi Chen, Liu Liu Tayfun Gokmen, Malte J. Rasch
Low-precision second-order-type distributed methods for training and fine-tuning foundation Yangyang Xu, George Slota Jie Chen, Mayank Agarwal, Yikang Shen, Naigang Wang
Optimization of Hardware-based Neural Networks Accelerators for Fluorescence Lifetime Biomedical Applications Xavier Intes Karthik Swaminathan
Structured & Robust Neural Network Pruning on Low-Precision Hardware for Guaranteed Learning Performance for Complex Time-Series Datasets Christopher Carothers, Meng Wang Kaoutar El Maghraoui, Pin-Yu Chen, Naigang Wang
Co-Designing Analog AI System and Accelerator for Large Foundation Models Liu Liu, Meng Wang  Sidney Tsai, Kaoutar El Maghraoui
Holistic Algorithm-Architecture Co-Design of Approximate Computing for Scalable Foundation Models Tong Zhang, Liu Liu Swagath Venkataramani, Sanchari Sen
Back to top