Skip to content

How to use quantized model on inference #4

Description

@yachty66

I have successfully quantized the facebook/opt-125m model using the opt.py script with the following command:

CUDA_VISIBLE_DEVICES=0 python opt.py facebook/opt-125m c4 --wbits 4 --quant ldlq --incoh_processing --save quantized_model

This command generates a quantized model named quantized_model. My question is, should I replace the original weights from https://huggingface.co/facebook/opt-125m/tree/main with the weights from quantized_model to run the 2-bit model on inference?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions