support LLaVa by JINGZIjingzi · Pull Request #119 · Tencent/TencentPretrain

JINGZIjingzi · 2024-01-09T09:10:03Z

change file tencentpretrain/utils/constants.py, modify models/special_tokens_map.json to models/llama_special_tokens_map.json.
data preprocess

python3 preprocess.py \
                --corpus_path corpora/llava.json \
                --dataset_path datasets/llava.pt \
                --spm_model_path tokenizer.model \
                --processes_num 4 --data_processor llava \
                --seq_length 1024

The tokenizer.model is the same as the LLM pretrained model used for training LLaVa.
corpora/llava.json is the same format as official LLaVa datasets.
2. feature align:
To use pretrained models, we need convert the models first

python3 scripts/convert_llm_in_llava.py --input_model_path $origin_pretrained_model_path \
               --output_model_path $pretrained_model_path

python3 scripts/convert_model_add_prefix.py --input_model_path $origin_vision_model_path \
              --output_model_path $vision_model_path --prefix embedding.image_text.vision_

deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \
                      --pretrained_model_path $pretrained_model_path \
                      --vision_model_in_VL_emb_path $vision_model_path \
                      --dataset_path datasets/llava.pt \
                      --spm_model_path tokenizer.model  \
                      --config_path models/llava/7b_config.json \
                      --output_model_path models/llava_stage1 \
                      --world_size 8 --accumulation_steps 16 --batch_size 2 \
                      --learning_rate 1e-3 --report_steps 100 \
                      --total_steps 40000 --save_checkpoint_steps 10000 \
                      --freeze_exclude_by_name vision_language.projection \
                      --freeze_parameters embedding encoder target tgt_embedding \
                      --patch_size 14 --image_height 336 --image_width 336 \
                      --image_preprocess pad normalize

$pretrained_model_path is the path of the pretrained LLM model. $vision_model_path is the path of the pretrained vision model. world_size * accumulation_steps * batch_size is the actual total batch size.
After training, convert the model into a .bin file

python3 models/llava_stage1/zero_to_fp32.py models/llava_stage1/ models/llava_stage1.bin

instruction tuning

deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \
                      --pretrained_model_path models/llava_stage1.bin \
                      --dataset_path datasets/llava.pt \
                      --spm_model_path tokenizer.model \
                      --config_path models/llava/7b_config.json \
                      --output_model_path models/llava_stage2 \
                      --world_size 8 --accumulation_steps 16 --batch_size 1 \
                      --learning_rate 2e-5 --report_steps 100 \
                      --total_steps 60000 --save_checkpoint_steps 10000 \
                      --patch_size 14 --image_height 336 --image_width 336 \
                      --image_preprocess pad normalize

To save GPU graphics memory, you can use ZeRO3 by changing --deepspeed_config models/deepspeed_config.json to --deepspeed_config models/deepspeed_zero3_config.json. Note: it would be slower evidently.

After training, convert the model into a .bin file

python3 models/llava_stage2/zero_to_fp32.py models/llava_stage2/ models/llava_stage2.bin

infer

deepspeed scripts/generate_lm_llava_deepspeed.py \
    --deepspeed --deepspeed_config models/deepspeed_config.json \
    --load_model_path models/llava_stage2.bin \
    --spm_model_path tokenizer.model \
    --config_path models/llava/7b_config.json \
    --test_path test.json \
    --prediction_path output.txt \
    --seq_length 1024

janinezhao added 18 commits December 14, 2023 16:14

add support for llava

b76ea79

fix dataset & name & details in llava

1c83b30

dataset support jsonl -> json

d1ca0ac

fix model loader

b0ab452

fix model laoder

2f8fcb4

fix image read

9a3bf31

fix vision preprocess

c0ea936

fix

996296e

fix dataset

959dc55

fix

b255e5b

fix dataset and dataloader

845fef8

add pad in image preprocess

c33ffb0

fix seq_length;expand2square

9738751

fix infer

d21baeb

fix print

004f109

fix data form and vision features

b4419c6

update convert script and transformer_encoder

018318f

fix infer script

e99c9d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support LLaVa#119

support LLaVa#119
JINGZIjingzi wants to merge 18 commits intoTencent:mainfrom
JINGZIjingzi:llava_1214

JINGZIjingzi commented Jan 9, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JINGZIjingzi commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JINGZIjingzi commented Jan 9, 2024 •

edited

Loading