跳转至

MSCCLang: Microsoft Collective Communication Language

ABSTRACT

Machine learning models with millions or billions of parameters are increasingly trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, collective communication becomes a bottleneck. Custom collective algorithms optimized for both particular network topologies and applicationspecific communication patterns can alleviate this bottleneck and help these applications scale. However, implementing correct and efficient custom algorithms is challenging.

拥有数百万或数十亿参数的机器学习模型日益在大型多GPU系统上进行训练和服务。随着模型规模的扩大以及在更多GPU上的执行,集体通信逐渐成为瓶颈。针对特定网络拓扑结构和应用特定通信模式优化的自定义集体算法可以缓解这一瓶颈,并帮助这些应用程序实现规模化。然而,实现正确且高效的自定义算法是具有挑战性的。

This paper introduces MSCCLang, a system for programmable GPU communication. MSCCLang provides a domain specific language for writing collective communication algorithms and an optimizing compiler for lowering them to an executable form, which can be executed efficiently and flexibly in an interpreter-based runtime. We used MSCCLang to write novel collective algorithms for AllReduce and AllToAll that are up to 1.9× and 1.3× faster than hand-optimized implementations, respectively.

本文介绍了MSCCLang,一种用于可编程GPU通信的系统。MSCCLang提供了一种用于编写集体通信算法的领域特定语言,以及一个将其降低到可执行形式的优化编译器,该编译器可以在基于解释器的运行时环境中高效且灵活地执行。我们使用MSCCLang编写了新的AllReduce和AllToAll集体算法,分别比手工优化的实现快至1.9倍和1.3倍。

CCS CONCEPTS

  • Software and its engineering → Domain specific languages; Compilers; Communications management;
  • Theory of computation → Concurrency.

KEYWORDS

GPU / CC (Collective Communication) / Compilers