python torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --quantize config/data/aarch64_cpu_channelwise.json --device cpu --max-seq-length 1024
Converting meta-llama/Meta-Llama-3.1-8B-Instruct to torchchat format...
known configs: ['stories15M', '13B', 'CodeLlama-7b-Python-hf', 'stories42M', 'Mistral-7B', '34B', 'Meta-Llama-3-8B', 'Meta-Llama-3.1-8B', '30B', '7B', 'stories110M', 'Meta-Llama-3-70B', 'Meta-Llama-3.1-70B', '70B']
Model config {'block_size': 2048, 'vocab_size': 128256, 'n_layers': 32, 'n_heads': 32, 'dim': 4096, 'hidden_dim': 14336, 'n_local_heads': 8, 'head_dim': 128, 'rope_base': 500000.0, 'norm_eps': 1e-05, 'multiple_of': 1024, 'ffn_dim_multiplier': 1.3, 'use_tiktoken': True, 'max_seq_length': 8192, 'use_scaled_rope': True}
Moving checkpoint to /home/torch/.torchchat/model-cache/downloads/meta-llama/Meta-Llama-3.1-8B-Instruct/model.pth.
Done.
Moving model to /home/torch/.torchchat/model-cache/meta-llama/Meta-Llama-3.1-8B-Instruct.
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.5.0.dev20240814 available.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
lm_eval is not installed, GPTQ may not be usable
Using device=cpu
Loading model...
Time to load model: 1.88 seconds
Quantizing the model with: {'executor': {'accelerator': 'cpu'}, 'precision': {'dtype': 'fp32'}, 'linear:int4': {'groupsize': 0, 'scheme': 'symmetric_channelwise'}}
linear: layers.0.attention.wq, in=4096, out=4096
Time to quantize model: 0.05 seconds
Traceback (most recent call last):
File "/home/torch/pytorch/torchchat/torchchat.py", line 97, in <module>
export_main(args)
File "/home/torch/pytorch/torchchat/export.py", line 124, in main
model = _initialize_model(
^^^^^^^^^^^^^^^^^^
File "/home/torch/pytorch/torchchat/build/builder.py", line 514, in _initialize_model
quantize_model(
File "/home/torch/pytorch/torchchat/quantization/quantize.py", line 109, in quantize_model
model = quant_handler.quantize(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torchao-0.4.0+git174e630a-py3.11-linux-aarch64.egg/torchao/quantization/GPTQ.py", line 809, in quantize
state_dict = self._create_quantized_state_dict(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torchao-0.4.0+git174e630a-py3.11-linux-aarch64.egg/torchao/quantization/GPTQ.py", line 774, in _create_quantized_state_dict
weight_int4pack = torch.ops.aten._kai_weight_pack_int4(w_int4x8.to(self.device),scales_and_zeros,mod.out_features,mod.in_features,0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torch/_ops.py", line 1222, in __getattr__
raise AttributeError(
AttributeError: '_OpNamespace' 'aten' object has no attribute '_kai_weight_pack_int4'
conda list | grep -i torch (torch_env)
# packages in environment at /home/torch/miniforge3/envs/torch_env:
torch 2.5.0.dev20240814 pypi_0 pypi
torchao 0.4.0+git174e630a pypi_0 pypi
Hi,
I am following the article at https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/
but at step
I get the following error:
I see this attribute comes from https://github.com/ArmDeveloperEcosystem/PyTorch-arm-patches/blob/main/0001-Feat-Add-support-for-kleidiai-quantization-schemes.patch
Any ideas ?
Thanks!