Show Lab

All

76 repositories

VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Python
•
Apache License 2.0
•2•84•7•0•Updated Dec 26, 2024Dec 26, 2024
FQGAN
Public
FQGAN: Factorized Visual Tokenization and Generation
Python
•
Other
•0•38•0•0•Updated Dec 26, 2024Dec 26, 2024
Show-o
Public
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•46•1.1k•33•1•Updated Dec 26, 2024Dec 26, 2024
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
21•364•0•0•Updated Dec 25, 2024Dec 25, 2024
Awesome-Robotics-Diffusion
Public
1•6•0•0•Updated Dec 25, 2024Dec 25, 2024
ShowUI
Public
Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
vision-language-action gui-agents computer-use
Jupyter Notebook
•
Apache License 2.0
•39•707•4•0•Updated Dec 25, 2024Dec 25, 2024
MovieBench
Public
Python
•1•32•0•0•Updated Dec 24, 2024Dec 24, 2024
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
209•3.7k•1•0•Updated Dec 23, 2024Dec 23, 2024
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
12•294•0•1•Updated Dec 23, 2024Dec 23, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
19•527•1•0•Updated Dec 23, 2024Dec 23, 2024
DiffSim
Public
Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Python
•0•7•0•0•Updated Dec 20, 2024Dec 20, 2024
computer_use_ootb
Public
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Python
•
Apache License 2.0
•96•1.1k•23•4•Updated Dec 17, 2024Dec 17, 2024
IDProtector
Public
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
0•3•0•0•Updated Dec 16, 2024Dec 16, 2024
ROICtrl
Public
Code for ROICtrl: Boosting Instance Control for Visual Generation
Python
•0•100•0•0•Updated Dec 10, 2024Dec 10, 2024
videogui
Public
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•1•24•0•0•Updated Dec 10, 2024Dec 10, 2024
VideoSwap
Public
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Python
•14•362•1•0•Updated Dec 6, 2024Dec 6, 2024
Show-1
Public
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
•
Other
•62•1.1k•8•7•Updated Nov 15, 2024Nov 15, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•18•255•7•0•Updated Nov 12, 2024Nov 12, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•2•65•1•0•Updated Nov 10, 2024Nov 10, 2024
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•2•78•0•0•Updated Nov 7, 2024Nov 7, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•2•14•1•0•Updated Oct 30, 2024Oct 30, 2024
Exo2Ego-V
Public
0•9•1•0•Updated Oct 29, 2024Oct 29, 2024
watermark-steganalysis
Public
Python
•0•3•0•0•Updated Oct 24, 2024Oct 24, 2024
EvolveDirector
Public
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python
•0•44•0•0•Updated Oct 14, 2024Oct 14, 2024
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•32•1•0•Updated Oct 1, 2024Oct 1, 2024
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•5•0•0•Updated Sep 22, 2024Sep 22, 2024
RingID
Public
Python
•0•22•1•0•Updated Aug 30, 2024Aug 30, 2024
MotionDirector
Public
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
video-generation diffusion-models text-to-video text-to-motion text-to-video-generation motion-customization
Python
•
Apache License 2.0
•54•861•23•0•Updated Aug 21, 2024Aug 21, 2024
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•31•269•19•0•Updated Aug 15, 2024Aug 15, 2024
X-Adapter
Public
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Python
•
Apache License 2.0
•42•747•17•3•Updated Aug 14, 2024Aug 14, 2024