Welcome to LLaMA-BitNet

Welcome to the LLaMA-BitNet repository. Our repository is your gateway to training your very own BitNet model, as highlighted in the groundbreaking paper The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. Built upon the cutting-edge LLaMA 2 architecture, this project allows you to unleash the potential of a model wielding approximately 78 million parameters, trained on a staggering corpus of around 1.5 billion tokens.

Note: You need to have access to LLaMA model if you wish to run code without modifications. To get access to LLaMA family of models, you need to go to https://llama.meta.com/llama-downloads/ and provide credentials which you use in Hugging Face. After that, you will receive mail to either download weights directly to your device or to use LLaMA through API.

Easy Installation

Getting started with LLaMA-BitNet is a breeze! Follow these simple steps to install all the necessary modules:

pip install -r requirements.txt

Intuitive File Structure

Our repository boasts a clear and intuitive file structure designed for effortless navigation and customization:

LLaMA-BitNet                    (root folder)
|
│   ├── inference.py            (Run inference with the trained BitNet model)
│   ├── LICENSE                 (MIT License)
│   ├── README.md
│   ├── requirements.txt        (List of required modules for installation)
│   ├── train.py                (Run the training process)
│   └── utils.py                (Contains utility functions)

Training Data

A 15% subset of the OpenWebText2 dataset meticulously prepared for training, this subset, tokenized with a context length of 256 for seamless testing, offers unparalleled versatility. However, our code also facilitates manual tokenization, allowing you to train on datasets of your choice effortlessly.

Streamlined Dependencies

We've curated a set of essential dependencies listed in the requirements.txt file, ensuring a seamless installation process:

transformers
datasets
torch
wandb
huggingface_hub

Architecture

The conversion applied by convert_to_bitnet() makes exactly two changes to a standard LLaMA 2 decoder layer, as specified in the paper:

Replace every nn.Linear inside self_attn and mlp with BitLinear.
Remove input_layernorm and post_attention_layernorm from each decoder layer — because BitLinear now owns a parameter-free RMSNorm internally.

embed_tokens and lm_head stay full-precision. The rest of the LLaMA components (RoPE, SwiGLU, causal mask) are unchanged.

BitLinear forward pass (training)

x  →  RMSNorm (parameter-free, inside BitLinear)
   →  per-token absmax quantise to int8 range [-127, 127]   (STE)
   →  F.linear(x_q, w_q)   where w_q = absmean-quantised to {-1, 0, +1}  (STE)
   →  rescale by (w_scale × act_scale)

The straight-through estimator (STE) lets gradients flow through the non-differentiable round() calls during backprop. High-precision shadow weights live in self.weight the whole time — only the quantised copies are used for the forward pass.

Training

python train.py

Options:

--steps          N       Total training steps     (default: 10000)
--batch-size     N       Per-device batch size    (default: 32)
--lr             F       Learning rate            (default: 3e-4)
--checkpoint-dir PATH    Where to save models     (default: ./checkpoints)
--wandb                  Enable W&B logging

Checkpoints are saved to ./checkpoints/step_NNNNNNN/ every 1000 steps and a final model at ./checkpoints/final/.

Inference

# Interactive REPL
python inference.py --model ./checkpoints/final
 
# Single prompt
python inference.py --model ./checkpoints/final \
    --prompt "The future of computing is" \
    --max-new-tokens 200 \
    --temperature 0.8

Unleash the Full Potential of BitNet

Our BitNet architecture is engineered for excellence, drawing inspiration from the meticulous design laid out in the training details manuscript, The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf. By seamlessly integrating BitLinear and leveraging HuggingFace's LlamaForCasualLM, we empower you to unlock the true power of BitNet.

Explore, train, and revolutionize with LLaMA-BitNet!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to LLaMA-BitNet

Easy Installation

Intuitive File Structure

Training Data

Streamlined Dependencies

Architecture

BitLinear forward pass (training)

Training

Inference

Unleash the Full Potential of BitNet

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Welcome to LLaMA-BitNet

Easy Installation

Intuitive File Structure

Training Data

Streamlined Dependencies

Architecture

BitLinear forward pass (training)

Training

Inference

Unleash the Full Potential of BitNet

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages