- Fork the repository and clone your fork
- Install dependencies:
pip install -r requirements.txt - Tests run in Docker only (see below)
- Python: PEP 8, type hints in function signatures
- No unnecessary comments: Only where logic is not self-evident
- Hard asserts: No silent recovery or defensive error handling
- No
.get(),getattr(),hasattr(): Let missing keys throwKeyError/AttributeError
- Standard library:
import xyz - Third-party:
from xyz import q - Local/repo:
from models.foo import Bar
Within each group, import x lines come before from x import y lines.
Never import from the same package on separate lines:
# Bad
from torch import nn
from torch import optim
# Good
from torch import nn, optim
# For many names, use parenthesized form
from transformers import (
AutoConfig,
AutoModel,
AutoModelForMaskedLM,
PreTrainedModel,
)fastplms/new_model/
__init__.py
modeling_new_model.py # PreTrainedModel + PretrainedConfig
get_new_model_weights.py # Weight conversion from official checkpoint
README.md # HuggingFace model card README
LICENSE # Model license
testing/official/new_model.py # Load official model for compliance testing
Your modeling_*.py should:
- Subclass
PreTrainedModelandEmbeddingMixin - Define a
PretrainedConfigsubclass withattn_backendattribute - Implement the
AttentionBackendenum and backend resolution - Implement
_embed(input_ids, attention_mask)returning last hidden states - Register in
config.jsonviaauto_map:
{
"auto_map": {
"AutoConfig": "modeling_new_model.NewModelConfig",
"AutoModelForMaskedLM": "modeling_new_model.NewModelForMaskedLM"
}
}get_*_weights.py should:
- Load the official checkpoint
- Remap parameter names to match your architecture
- Export
config.json,pytorch_model.bin, and modeling source files - The output directory can be pushed to HuggingFace
testing/official/new_model.py should expose:
def load_official_model(reference_repo_id: str, device: torch.device, dtype: torch.dtype):
# Load and wrap the official model
# Return (wrapped_model, tokenizer) where wrapped_model has .logits and .hidden_states outputs
...Add your model to testing/conftest.py:
# In MODEL_REGISTRY (for fast CI, pick the smallest checkpoint)
"new_model": {
"fast_path": "Synthyra/NewModel-150M",
"official_path": "org/official-model",
"load_official": "testing.official.new_model",
"model_type": "NewModel",
"uses_tokenizer": True,
},
# In FULL_MODEL_REGISTRY (all checkpoints with size_category)
"new_model_150m": {
"fast_path": "Synthyra/NewModel-150M",
"official_path": "org/official-model-150m",
"load_official": "testing.official.new_model",
"model_type": "NewModel",
"uses_tokenizer": True,
"size_category": "small",
},Create fastplms/new_model/README.md with the HuggingFace model card content and fastplms/new_model/LICENSE with the model license.
Create Dockerfile.<family> that layers on top of fastplms-base and installs your model's native reference deps. Add the family to build_images.sh so ./build_images.sh picks it up.
If your model's native package conflicts with another (e.g. transformers version pin, torchtext pin), prefer either:
- Loading the native package from a
sys.path-injected submodule (seetesting/official/__init__.pyfor the ESM++ pattern), or - Using a HuggingFace
transformersreference class instead (DPLM usesEsmForMaskedLMfor this reason).
Add a ParityTolerances(...) entry in FAMILY_TOLERANCES at the top of testing/test_parity.py. Start with the default, then tighten as you investigate failures.
Add entries for pushing your model's files to the Hub.
All tests must run in Docker. Never run tests natively on Windows (missing Triton, flash-attention, CUDA kernels). Always pass --ipc=host.
# Build base + your family image
./build_images.sh new_model
# Run your model's parity suite (its own image)
docker run --rm --gpus all --ipc=host -v $(pwd):/workspace fastplms-new_model \
python -m pytest /workspace/testing/test_parity.py -k new_model -v
# Broader smoke tests in the monolithic image
docker build -t fastplms .
docker run --gpus all --ipc=host fastplms python -m pytest /app/testing/ -k new_model -vBefore submitting a PR for a new model, ensure inside the family's Docker image:
test_parity.py::test_tokenizer_parity[<family>]test_parity.py::test_weight_parity_fp32[<family>]test_parity.py::test_forward_parity_fp32[<family>-{single,uniform,skewed}](all three padding scenarios)test_parity.py::test_forward_parity_bf16[<family>-{single,uniform,skewed}]test_parity.py::test_padding_does_not_pollute_valid_positions_fp32[<family>](tokenizer-mode families)test_parity.py::test_backend_consistency_fp32[<family>]
And in the monolithic image:
test_automodel_loadsandtest_automodel_forward_passtest_nan_stabilitytest_batch_single_match(tokenizer-mode models)
Found a bug or have a feature request? Open a GitHub Issue.