Nvidia Unveils Eagle 2.5 New Vision-Language Model Takes Flight

作者智能小编

4 月 25, 2025 #aimodel, #每日AI快讯

90年代的黄河路

NVIDIA has recently introduced Eagle 2.5, a groundbreaking vision-language model (VLM) designed for long-context multimodal learning. Despite its relatively small size of 8 billion parameters, Eagle 2.5 demonstrates exceptional capabilities in processing high-resolution images and extended video sequences, rivaling the performance of significantly larger models like Qwen 2.5-VL-72B and InternVL2.5-78B.

This achievement is attributed to NVIDIA’s innovative training strategies: Information-Prioritized Sampling and Progressive Post-Training. Information-Prioritized Sampling employs techniques like image region retention and automatic degradation sampling to ensure the integrity of images and optimize visual details. Progressive Post-Training gradually expands the context window, enabling the model to maintain stable performance across varying input lengths.

Key Features and Capabilities of Eagle 2.5:

Long Video and High-Resolution Image Understanding: Eagle 2.5 excels at processing large-scale videos and high-resolution images. It can handle long video sequences (up to 512 frames) and achieves a remarkable score of 72.4% on the Video-MME benchmark, comparable to models with significantly larger parameter counts.
Diverse Task Support: The model demonstrates outstanding performance in various video and image understanding tasks. It achieves scores of 74.8%, 77.6%, and 66.4% on video benchmarks such as MVBench, MLVU, and LongVideoBench, respectively. Furthermore, it excels in image understanding tasks like DocVQA, ChartQA, and InfoVQA, achieving scores of 94.1%, 87.5%, and 80.4%.

Eagle 2.5 represents a significant advancement in the field of vision-language models, offering a powerful and efficient solution for processing complex visual data. Its compact size and impressive performance make it a promising tool for a wide range of applications, from video analysis and image recognition to document understanding and data visualization.

>>> Read more <<<