MPI4AI: Enhancing Performance and Productivity of AI Science through Next-generation High Performance Communication Abstractions is funded by the NSF’s Cyberinfrastructure for Sustained Scientific Innovation (CSSI) program. MPI4AI is a multi-university collaboration between Tennessee Technological University, the University of Tennessee, Knoxville, Stony Brook University, and the Illinois Institute of Technology. This project improves how massively parallel computers run large-scale artificial intelligence (AI) applications by enhancing the Message Passing Interface (MPI), a widely used standard for coordinating work across many computers in parallel programs. Advancements from this project will target performance bottlenecks in AI patterns like neural architecture search, large language model inference, and large-scale data-parallel training. By forwarding these enhancements to the upcoming MPI-5 and MPI-6 standards, the project ensures long-term impact across academic research and industrial AI workflows, lowering the cost of running large AI workloads and broadening access to scalable AI infrastructure.
This project is supported by NSF awards 2514054, 2514056, 2514055 and 2514057