Vicuna chatbot

Author: s | 2025-04-24

★★★★☆ (4.2 / 970 reviews)

Download nutrimex

In contrast, Vicuna is an open-source chatbot framework that allows developers to build and deploy chatbots with ease. Vicuna Features. Vicuna is an open-source chatbot framework that offers a range of features for

Download easy audio mixer

Vicuna: An Open-Source Chatbot

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language ModelsDeyao Zhu* (On Job Market!), Jun Chen* (On Job Market!), Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. *Equal ContributionKing Abdullah University of Science and Technology NewsWe now provide a pretrained MiniGPT-4 aligned with Vicuna-7B! The demo GPU memory consumption now can be as low as 12GB.Online DemoClick the image to chat with MiniGPT-4 around your imagesExamplesMore examples can be found in the project page.IntroductionMiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer.We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted.To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset.The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100.MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4.Getting StartedInstallation1. Prepare the code and the environmentGit clone our repository, creating a python environment and ativate it via the following commandgit clone MiniGPT-4conda env create -f environment.ymlconda activate minigpt42. Prepare the pretrained Vicuna weightsThe current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B.Please refer to our instruction hereto prepare the Vicuna weights.The final weights would be in a single folder in a structure similar to the following:vicuna_weights├── config.json├── generation_config.json├── pytorch_model.bin.index.json├── pytorch_model-00001-of-00003.bin... Then, set the path to the vicuna weight in the model config filehere at Line 16.3. Prepare the pretrained MiniGPT-4 checkpointDownload the pretrained checkpoints according to the Vicuna model you prepare.Checkpoint Aligned with Vicuna 13BCheckpoint Aligned with Vicuna 7BDownladDownloadThen, set the path to the pretrained checkpoint in the evaluation config filein eval_configs/minigpt4_eval.yaml at Line 11.Launching Demo LocallyTry out our demo demo.py on your local machine by runningpython demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0To save GPU memory, Vicuna loads as 8 bit by default, with a beam search width of 1.This configuration requires about 23G GPU memory for Vicuna 13B and 11.5G GPU memory for Vicuna 7B.For more powerful GPUs, you can run the modelin 16 bit by setting low_resource to False in the config fileminigpt4_eval.yaml and use a larger beam search width.Thanks @WangRongsheng, you can also run our code on ColabTrainingThe training of MiniGPT-4 contains two alignment stages.1. First pretraining stageIn the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasetsto align the vision and language model. To download and prepare the datasets, please checkour first stage dataset preparation instruction.After the first stage, the visual features are mapped and can be understood by the languagemodel.To launch the first stage. In contrast, Vicuna is an open-source chatbot framework that allows developers to build and deploy chatbots with ease. Vicuna Features. Vicuna is an open-source chatbot framework that offers a range of features for Building chatbots with Vicuna-13B - An article on using Vicuna to create chatbots; Comparing LLMs for chat: LLaMA v2 vs Vicuna - A comparison between LLaMA's new version 2 and Vicuna; LangChain chat models: an overview - LangChain is a popular framework for building chat applications Building chatbots with Vicuna-13B - An article on using Vicuna to create chatbots; Comparing LLMs for chat: LLaMA v2 vs Vicuna - A comparison between LLaMA's new version 2 and Vicuna; LangChain chat models: an overview - LangChain is a popular framework for building chat applications; Also published here The Silent Hero Behind LMSYS Vicuna and Chatbot Arena. This April, LMSYS developed the popular Vicuna chatbot models and made them publicly available. Since then, Vicuna has been served in Chatbot Arena for millions of users. Initially, LMSYS FastChat adopted a HF Transformers based serving backend to serve the chat demo. As the demo became GitHub - lm-sys/FastChat: The release repo for Vicuna: An Open Chatbot Impressing GPT-4 An open platform for training, serving, and evaluating large language model based chatbots. 🔥 We released Vicuna: An SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models (CVPR-2024 Highlight)[Paper][Project Page][Demo] 🔥🔥 2024.04. SmartEdit is released!🔥🔥 2024.04. SmartEdit is selected as highlight by CVPR-2024!🔥🔥 2024.02. SmartEdit is accepted by CVPR-2024!If you are interested in our work, please star ⭐ our project.SmartEdit Framework SmartEdit on Understanding Scenarios SmartEdit on Reasoning Scenarios Dependencies and Installation pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url pip install -r requirements.txt git clone cd flash-attention pip install . --no-build-isolation cd ..Training model preparationPlease put the prepared checkpoints in file checkpoints.Prepare Vicuna-1.1-7B/13B checkpoint: please download Vicuna-1.1-7B and Vicuna-1.1-13B in link.Prepare LLaVA-1.1-7B/13B checkpoint: please follow the LLaVA instruction to prepare LLaVA-1.1-7B/13B weights.Prepare InstructDiffusion checkpoint: please download InstructDiffusion(v1-5-pruned-emaonly-adaption-task.ckpt) and the repo in link. Download them first and use python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path "./checkpoints/InstructDiffusion/v1-5-pruned-emaonly-adaption-task.ckpt" --original_config_file "./checkpoints/InstructDiffusion/configs/instruct_diffusion.yaml" --dump_path "./checkpoints/InstructDiffusion_diffusers".Training dataset preparationPlease put the prepared checkpoints in file dataset.Prepare CC12M dataset: InstructPix2Pix and MagicBrush datasets: these two datasets InstructPix2Pix and MagicBrush are prepared in diffusers website. Download them first and use python process_HF.py to process them from "parquet" file to "arrow" file.Prepare RefCOCO, GRefCOCO and COCOStuff datasets: please follow InstructDiffusion to prepare them.Prepare LISA ReasonSeg dataset: please follow LISA to prepare it.Prepare our synthetic editing dataset: please download in link.Stage-1: textual alignment with CC12MUse the script to train: bash scripts/TrainStage1_7b.sh bash scripts/TrainStage1_13b.shThen, use the script to inference: python test/TrainStage1_inference.py --model_name_or_path "./checkpoints/vicuna-7b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-7B-v1" --save_dir './checkpoints/stage1_CC12M_alignment_7b/Results-100000' --pretrain_model "./checkpoints/stage1_CC12M_alignment_7b/embeddings_qformer/checkpoint-150000.bin" --get_orig_out --LLaVA_version "v1.1-7b" python test/TrainStage1_inference.py --model_name_or_path "./checkpoints/vicuna-13b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-13B-v1" --save_dir './checkpoints/stage1_CC12M_alignment_13b/Results-100000' --pretrain_model "./checkpoints/stage1_CC12M_alignment_13b/embeddings_qformer/checkpoint-150000.bin" --get_orig_out --LLaVA_version "v1.1-13b"Stage-2: SmartEdit trainingUse the script to train first: bash scripts/MLLMSD_7b.sh bash scripts/MLLMSD_13b.shThen, use the script to train: bash scripts/SmartEdit_7b.sh bash scripts/SmartEdit_13b.shInferencePlease download SmartEdit-7B and SmartEdit-13B checkpoints and put them in file checkpointsPlease download Reason-Edit evaluation benchmark and put it in file datasetUse the script to inference on understanding and reasoning scenes: python test/DS_SmartEdit_test.py --is_understanding_scenes True --model_name_or_path "./checkpoints/vicuna-7b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-7B-v1" --save_dir

Comments

User9283

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language ModelsDeyao Zhu* (On Job Market!), Jun Chen* (On Job Market!), Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. *Equal ContributionKing Abdullah University of Science and Technology NewsWe now provide a pretrained MiniGPT-4 aligned with Vicuna-7B! The demo GPU memory consumption now can be as low as 12GB.Online DemoClick the image to chat with MiniGPT-4 around your imagesExamplesMore examples can be found in the project page.IntroductionMiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer.We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted.To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset.The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100.MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4.Getting StartedInstallation1. Prepare the code and the environmentGit clone our repository, creating a python environment and ativate it via the following commandgit clone MiniGPT-4conda env create -f environment.ymlconda activate minigpt42. Prepare the pretrained Vicuna weightsThe current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B.Please refer to our instruction hereto prepare the Vicuna weights.The final weights would be in a single folder in a structure similar to the following:vicuna_weights├── config.json├── generation_config.json├── pytorch_model.bin.index.json├── pytorch_model-00001-of-00003.bin... Then, set the path to the vicuna weight in the model config filehere at Line 16.3. Prepare the pretrained MiniGPT-4 checkpointDownload the pretrained checkpoints according to the Vicuna model you prepare.Checkpoint Aligned with Vicuna 13BCheckpoint Aligned with Vicuna 7BDownladDownloadThen, set the path to the pretrained checkpoint in the evaluation config filein eval_configs/minigpt4_eval.yaml at Line 11.Launching Demo LocallyTry out our demo demo.py on your local machine by runningpython demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0To save GPU memory, Vicuna loads as 8 bit by default, with a beam search width of 1.This configuration requires about 23G GPU memory for Vicuna 13B and 11.5G GPU memory for Vicuna 7B.For more powerful GPUs, you can run the modelin 16 bit by setting low_resource to False in the config fileminigpt4_eval.yaml and use a larger beam search width.Thanks @WangRongsheng, you can also run our code on ColabTrainingThe training of MiniGPT-4 contains two alignment stages.1. First pretraining stageIn the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasetsto align the vision and language model. To download and prepare the datasets, please checkour first stage dataset preparation instruction.After the first stage, the visual features are mapped and can be understood by the languagemodel.To launch the first stage

2025-04-10
User1511

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models (CVPR-2024 Highlight)[Paper][Project Page][Demo] 🔥🔥 2024.04. SmartEdit is released!🔥🔥 2024.04. SmartEdit is selected as highlight by CVPR-2024!🔥🔥 2024.02. SmartEdit is accepted by CVPR-2024!If you are interested in our work, please star ⭐ our project.SmartEdit Framework SmartEdit on Understanding Scenarios SmartEdit on Reasoning Scenarios Dependencies and Installation pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url pip install -r requirements.txt git clone cd flash-attention pip install . --no-build-isolation cd ..Training model preparationPlease put the prepared checkpoints in file checkpoints.Prepare Vicuna-1.1-7B/13B checkpoint: please download Vicuna-1.1-7B and Vicuna-1.1-13B in link.Prepare LLaVA-1.1-7B/13B checkpoint: please follow the LLaVA instruction to prepare LLaVA-1.1-7B/13B weights.Prepare InstructDiffusion checkpoint: please download InstructDiffusion(v1-5-pruned-emaonly-adaption-task.ckpt) and the repo in link. Download them first and use python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path "./checkpoints/InstructDiffusion/v1-5-pruned-emaonly-adaption-task.ckpt" --original_config_file "./checkpoints/InstructDiffusion/configs/instruct_diffusion.yaml" --dump_path "./checkpoints/InstructDiffusion_diffusers".Training dataset preparationPlease put the prepared checkpoints in file dataset.Prepare CC12M dataset: InstructPix2Pix and MagicBrush datasets: these two datasets InstructPix2Pix and MagicBrush are prepared in diffusers website. Download them first and use python process_HF.py to process them from "parquet" file to "arrow" file.Prepare RefCOCO, GRefCOCO and COCOStuff datasets: please follow InstructDiffusion to prepare them.Prepare LISA ReasonSeg dataset: please follow LISA to prepare it.Prepare our synthetic editing dataset: please download in link.Stage-1: textual alignment with CC12MUse the script to train: bash scripts/TrainStage1_7b.sh bash scripts/TrainStage1_13b.shThen, use the script to inference: python test/TrainStage1_inference.py --model_name_or_path "./checkpoints/vicuna-7b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-7B-v1" --save_dir './checkpoints/stage1_CC12M_alignment_7b/Results-100000' --pretrain_model "./checkpoints/stage1_CC12M_alignment_7b/embeddings_qformer/checkpoint-150000.bin" --get_orig_out --LLaVA_version "v1.1-7b" python test/TrainStage1_inference.py --model_name_or_path "./checkpoints/vicuna-13b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-13B-v1" --save_dir './checkpoints/stage1_CC12M_alignment_13b/Results-100000' --pretrain_model "./checkpoints/stage1_CC12M_alignment_13b/embeddings_qformer/checkpoint-150000.bin" --get_orig_out --LLaVA_version "v1.1-13b"Stage-2: SmartEdit trainingUse the script to train first: bash scripts/MLLMSD_7b.sh bash scripts/MLLMSD_13b.shThen, use the script to train: bash scripts/SmartEdit_7b.sh bash scripts/SmartEdit_13b.shInferencePlease download SmartEdit-7B and SmartEdit-13B checkpoints and put them in file checkpointsPlease download Reason-Edit evaluation benchmark and put it in file datasetUse the script to inference on understanding and reasoning scenes: python test/DS_SmartEdit_test.py --is_understanding_scenes True --model_name_or_path "./checkpoints/vicuna-7b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-7B-v1" --save_dir

2025-04-24
User3735

'./checkpoints/SmartEdit-7B/Understand-15000' --steps 15000 --total_dir "./checkpoints/SmartEdit-7B" --sd_qformer_version "v1.1-7b" --resize_resolution 256 python test/DS_SmartEdit_test.py --is_reasoning_scenes True --model_name_or_path "./checkpoints/vicuna-7b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-7B-v1" --save_dir './checkpoints/SmartEdit-7B/Reason-15000' --steps 15000 --total_dir "./checkpoints/SmartEdit-7B" --sd_qformer_version "v1.1-7b" --resize_resolution 256 python test/DS_SmartEdit_test.py --is_understanding_scenes True --model_name_or_path "./checkpoints/vicuna-13b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-13B-v1" --save_dir './checkpoints/SmartEdit-13B/Understand-15000' --steps 15000 --total_dir "./checkpoints/SmartEdit-13B" --sd_qformer_version "v1.1-13b" --resize_resolution 256 python test/DS_SmartEdit_test.py --is_reasoning_scenes True --model_name_or_path "./checkpoints/vicuna-13b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-13B-v1" --save_dir './checkpoints/SmartEdit-13B/Reason-15000' --steps 15000 --total_dir "./checkpoints/SmartEdit-13B" --sd_qformer_version "v1.1-13b" --resize_resolution 256You can use different resolution to inference on reasoning scenes: python test/DS_SmartEdit_test.py --is_reasoning_scenes True --model_name_or_path "./checkpoints/vicuna-7b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-7B-v1" --save_dir './checkpoints/SmartEdit-7B/Reason-384-15000' --steps 15000 --total_dir "./checkpoints/SmartEdit-7B" --sd_qformer_version "v1.1-7b" --resize_resolution 384 python test/DS_SmartEdit_test.py --is_reasoning_scenes True --model_name_or_path "./checkpoints/vicuna-13b-v1-1" --LLaVA_model_path "./checkpoints/LLaVA-13B-v1" --save_dir './checkpoints/SmartEdit-13B/Reason-384-15000' --steps 15000 --total_dir "./checkpoints/SmartEdit-13B" --sd_qformer_version "v1.1-13b" --resize_resolution 384Explanation of new tokens:The original vocabulary size of LLaMA-1.1 (both 7B and 13B) is 32000, while LLaVA-1.1 (both 7B and 13B) is 32003, which additionally expands 32000="", 32001="", 32002="". In SmartEdit, we maintain "" and "" in LLaVA and remove "". Besides, we add one special token called "img" for system message to generate image, and 32 tokens to summarize image and text information for conversation system ("..."). Therefore, the original vocabulary size of SmartEdit is 32035, where "img"=32000, ""=32001, ""=32002, and the 32 new tokens are 32003~32034. Only the 32 new tokens are effective embeddings for QFormer.We especially explain the meanings of new embeddings here to eliminate misunderstanding, and there is no need to merge lora after you download SmartEdit checkpoints. If you have download the checkpoints of SmartEdit before 2024.4.28, please only re-download checkpoints in LLM-15000 folder. Besides, when preparing LLaVA checkpoints, you must firstly convert the LLaMA-delta-weight, since it is under policy protection, and LLaVA fine-tunes the whole LLaMA weights.Metrics EvaluationUse the script to compute metrics on Reason-Edit (256x256 resolution): python test/metrics_evaluation.py --edited_image_understanding_dir "./checkpoints/SmartEdit-7B/Understand-15000" --edited_image_reasoning_dir "./checkpoints/SmartEdit-7B/Reason-15000" python test/metrics_evaluation.py --edited_image_understanding_dir "./checkpoints/SmartEdit-13B/Understand-15000" --edited_image_reasoning_dir "./checkpoints/SmartEdit-13B/Reason-15000"Todo List Release checkpoints that could conduct "add" functionality

2025-04-19
User3716

Training, run the following command. In our experiments, we use 4 A100.You can change the save path in the config filetrain_configs/minigpt4_stage1_pretrain.yamltorchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yamlA MiniGPT-4 checkpoint with only stage one training can be downloadedhere.Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently.2. Second finetuning stageIn the second stage, we use a small high quality image-text pair dataset created by ourselvesand convert it to a conversation format to further align MiniGPT-4.To download and prepare our second stage dataset, please check oursecond stage dataset preparation instruction.To launch the second stage alignment,first specify the path to the checkpoint file trained in stage 1 intrain_configs/minigpt4_stage1_pretrain.yaml.You can also specify the output path there.Then, run the following command. In our experiments, we use 1 A100.torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yamlAfter the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly.AcknowledgementBLIP2 The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!Lavis This repository is built upon Lavis!Vicuna The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX:@misc{zhu2022minigpt4, title={MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models}, author={Deyao Zhu and Jun Chen and Xiaoqian Shen and xiang Li and Mohamed Elhoseiny}, year={2023},}LicenseThis repository is under BSD 3-Clause License.Many codes are based on Lavis withBSD 3-Clause License here.

2025-04-19

Add Comment