|
EN
Multimodal LLMs
Multimodal Large Language Models (MLLMs) are machine learning models capable of understanding and generating content across multiple modalities, including text, images, video, audio, and more. By integrating data from different modalities, they enable cross-modal information understanding and generation, and are widely used in areas such as virtual assistants and content creation.
Multimodal Large Language Models (MLLMs)
Model Version Institution
Doubao Doubao ByteDance
ERNIE Bot ERNIE Bot V3.2.0 Baidu
Qwen 2.5 Qwen V2.5.0 Alibaba
SenseChat 5 SenseChat-5 SenseTime
Spark Spark iFlytek
Gemini 1.5 Pro Gemini 1.5 Pro Alpha (Google)
GPT-4o GPT-4o OpenAI
Leaderboards
  • Image Generation
  • Image Understanding