1 篇博文含有标签「MLLM」

Rex-Omni

2026年4月20日 · 阅读需 1 分钟

GoCoding

Rex-Omni is a 3B-parameter multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, keypointing, and visual prompting into a single next point prediction framework.

Rex-Omni 是一个 3B 参数多模态模型，它将视觉感知任务（包括物体检测、OCR、指向、关键点定位和视觉提示）统一到一个单一的下一点预测框架中。

主页: https://rex-omni.github.io/
代码: https://github.com/IDEA-Research/Rex-Omni