Simple-MLLM

Simple-MLLM is a simple locally deployed Multimodal Large Model (MLLM) practice. Let's get your hands on the magic of MLLM, just on your local machine!

😎这是一个在本地部署的简易多模态大模型（MLLM）的实例，支持包括图片、文字以及语音（正在更新）多种模态的输入。

🤩如果你对多模态大模型（MLLM）感兴趣但是不知如何上手，来看看这个项目：一个简易的项目让你感受到多模态大模型的魔力👍

😎This is a simple example of deploying Multimodal Large Model (MLLM) locally with support for multiple modal inputs including image, text and voice (being updated).

🤩If you are interested in Multimodal Large Models (MLLM) but don't know how to get started. Take a look at this project: a simple project that will give you a taste of the magic of MLLM 👍

Overview

The input is img as below:

And then I ask the model:"描述”, after that the checkpoints are loaded. Finally:

The output is description as below:

Based Structure：

Socket
Qwen2-VL-3B
Whisper（being updated）

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Introduce		Introduce
audio		audio
imgs		imgs
server		server
video		video
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
run.sh		run.sh
test_client.py		test_client.py
test_video_thread.py		test_video_thread.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple-MLLM

Overview

Based Structure：

About

Releases

Packages

Languages

License

Gnonymous/Simple-MLLM

Folders and files

Latest commit

History

Repository files navigation

Simple-MLLM

Overview

Based Structure：

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages