Tsinghua & Qingcheng Unveil Open-Source ‘Chitu’ AI Inference Engine

Beijing – In a significant stride towards democratizing access to powerful AI, Tsinghua University’s High-Performance Computing Institute, in collaboration with Qingcheng Zhiji, has announced the open-source release of Chitu (赤兔), a high-performance inference engine designed for large-scale AI models. This development directly addresses the escalating costs and inefficiencies that often plague the deployment of these models, particularly during the critical inference phase.

The Chitu engine, named after the legendary Red Hare steed, boasts robust hardware adaptability. Unlike many existing solutions heavily reliant on specific hardware architectures like NVIDIA’s Hopper, Chitu supports a wide range of NVIDIA GPUs, from the latest flagship models to older generations. Crucially, it also provides optimized support for domestically produced Chinese chips, signaling a commitment to fostering a more independent and diversified AI ecosystem.

The performance gains offered by Chitu are substantial. Tests conducted on an A800 cluster while deploying the DeepSeek-R1-671B model revealed a remarkable 50% reduction in GPU utilization and a 3.15x increase in inference speed compared to some foreign open-source frameworks. This translates to significant cost savings and improved efficiency for organizations deploying large language models.

Chitu’s versatility extends beyond hardware compatibility. It supports a full spectrum of deployment scenarios, ranging from pure CPU-based inference to single-GPU setups and large-scale cluster deployments. This scalability makes it suitable for a diverse range of applications and organizational sizes, from small startups to large enterprises.

Key Features of the Chitu Inference Engine:

Diverse Computing Power Adaptation: Supports a wide range of NVIDIA GPUs and provides optimized support for domestic chips, breaking the dependency on specific architectures.
Scalability for All Scenarios: Offers scalable solutions from CPU-only deployments to single-GPU and large-scale cluster deployments, meeting the needs of different scales and scenarios.
Low-Latency Optimization: Optimizes model inference speed for latency-sensitive applications, such as financial risk control, reducing response times.
High-Throughput Optimization: Increases the number of requests processed per unit time in high-concurrency scenarios, such as intelligent customer service.
Small Memory Optimization: Reduces the memory footprint per card, allowing enterprises to achieve higher inference performance with fewer hardware resources.
Long-Term Stable Operation: Designed for reliable performance in real-world production environments.

The low-latency optimization capabilities of Chitu are particularly beneficial for applications requiring rapid response times, such as financial risk management. Conversely, its high-throughput optimization is ideal for handling high-volume requests in scenarios like intelligent customer service. Furthermore, Chitu’s small memory optimization reduces the GPU memory footprint, enabling companies to achieve higher inference performance with fewer hardware resources.

The open-source release of Chitu represents a significant contribution to the global AI community. By providing a high-performance, adaptable, and scalable inference engine, Tsinghua University and Qingcheng Zhiji are empowering researchers and developers to more easily and affordably deploy large-scale AI models, accelerating innovation and driving broader adoption of AI technologies. The engine is designed for long-term stable operation, making it a reliable choice for real-world production environments. This initiative underscores the growing importance of open-source collaboration in advancing the field of artificial intelligence and fostering a more inclusive and accessible AI landscape.

>>> Read more <<<