Low Ppl benchmark results

Hi. 
I am in the process of adding QuiP inference support into [ExllamaV2](https://github.com/turboderp/exllamav2)
and this is the [PR](https://github.com/turboderp/exllamav2/pull/217)

The problem I am having right now is the my Ppl testing results is kind worse compare to your [blog](https://cornell-relaxml.github.io/quip-sharp/) results.

so I am wondering is there something wrong with my implementation or any other reasons. 


### Ppl Benchmarks
using dataset: [wikitext-2-v1_validation_0000.parquet]
(https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation)
| Model                       | Performance |
|-----------------------------|-------------|
| **2Bit**                    |             |
| Llama-2-7b-E8P-2Bit         | 8.7339      |
| Llama2-7b-exl2-2.5bpw       | 8.0745      |
| Llama-2-13b-E8P-2Bit        | 7.1207      |
| Llama2-13b-exl2-2.5bpw      | 7.2741      |
| Llama-2-70b-E8P-2Bit        | 6.2192      |
| Llama2-70b-exl2-2.5bpw      | 5.8270      |
| **4Bit**                    |             |
| Llama-2-7b-HI-4Bit-Packed   | 6.0748      |
| Llama2-7b-exl2-4.0bpw       | 6.0300      |
| Llama-2-13b-HI-4Bit-Packed  | 7.4169      |
| Llama2-13b-exl2-4.0bpw      | 5.4905      |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Ppl benchmark results #9

Ppl Benchmarks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Performance
2Bit
Llama-2-7b-E8P-2Bit	8.7339
Llama2-7b-exl2-2.5bpw	8.0745
Llama-2-13b-E8P-2Bit	7.1207
Llama2-13b-exl2-2.5bpw	7.2741
Llama-2-70b-E8P-2Bit	6.2192
Llama2-70b-exl2-2.5bpw	5.8270
4Bit
Llama-2-7b-HI-4Bit-Packed	6.0748
Llama2-7b-exl2-4.0bpw	6.0300
Llama-2-13b-HI-4Bit-Packed	7.4169
Llama2-13b-exl2-4.0bpw	5.4905

Low Ppl benchmark results #9

Description

Ppl Benchmarks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions