Official Implementation of paper:
[ICCV2025]Advancing Visual Large Language Model for Multi-granular Versatile Perception
Checkpoints can be downloaded at MVP-LM-checkpoint
pip install -r requirements.txt
pip install -e /PATH-to-mvp-lm/mvp-lm/psalm/model/mask_decoder/Openseed_Simplify/modeling/pixel_decoder/ops
pip install torch==1.13.1+cu116
pip install torchaudio==0.13.1+cu116
pip install torchvision==0.14.1+cu116
The source code of the shell script required to run the code contains many personalized logic elements, aiming to improve efficiency, establish a standard operating procedure (SOP) for each execution, and automate the model training and inference process. You may extract the most essential command for launching the Python process and incorporate your own automation logic as needed.
This project is based on the following great works: Mask2Former, PSALM and OpenSeeD.