Skip to content

Low Ppl benchmark results #9

Description

@waters222

Hi.
I am in the process of adding QuiP inference support into ExllamaV2
and this is the PR

The problem I am having right now is the my Ppl testing results is kind worse compare to your blog results.

so I am wondering is there something wrong with my implementation or any other reasons.

Ppl Benchmarks

using dataset: [wikitext-2-v1_validation_0000.parquet]
(https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation)

Model Performance
2Bit
Llama-2-7b-E8P-2Bit 8.7339
Llama2-7b-exl2-2.5bpw 8.0745
Llama-2-13b-E8P-2Bit 7.1207
Llama2-13b-exl2-2.5bpw 7.2741
Llama-2-70b-E8P-2Bit 6.2192
Llama2-70b-exl2-2.5bpw 5.8270
4Bit
Llama-2-7b-HI-4Bit-Packed 6.0748
Llama2-7b-exl2-4.0bpw 6.0300
Llama-2-13b-HI-4Bit-Packed 7.4169
Llama2-13b-exl2-4.0bpw 5.4905

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions