You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A ~1.06B parameter non-transformer language model with a novel cognitive architecture featuring working, episodic, and semantic memory systems. CogNet uses cognitive routing with vectorized channel processing and hierarchical memory tiers, achieving O(n) per-layer complexity instead of O(n^2) for transformers.
Architecture
Parameter
Value
Hidden dim
2048
Blocks
16 (8 channels each)
Channel dim
384
FF dim
8192 (Fused SwiGLU)
Working memory slots
128
Episodic memory slots
256
Semantic memory slots
512
Tokenizer
CharTokenizer (136 vocab)
Normalization
RMSNorm
Positional encoding
RoPE
Key Differences from Transformers
Cognitive routing: Input is routed through parallel channels instead of attention heads
Hierarchical memory: 3-tier memory system (working/episodic/semantic) with SDPA reads
O(n) per-layer complexity: Channel processing is linear in sequence length (vs O(n^2) attention)
Vectorized channels: All 8 channels processed in a single batched operation (no for-loops)
Fused SwiGLU: Gate and up projections combined into a single matmul
Optimized Training Pipeline
The train_ultra.py script includes the complete training pipeline with all optimizations:
python infer_optimized.py generate --prompt "The future of AI is" --max-tokens 100
python infer_optimized.py benchmark
Benchmark Your Hardware
# Full benchmark: original vs optimized + scalability test
python benchmark.py
# Quick benchmark during training (automatic)
python train_ultra.py --max-steps 20
# The first 13 steps are: 3 warmup + 10 benchmark = real speed measurement
Config Files
YAML configs are available in configs/:
Config
Description
1b_single_gpu.yaml
1B model, single GPU
1b_fsdp.yaml
1B model, multi-GPU FSDP
350m_fast.yaml
350M model, fast iteration
About
CogNet-1B: A 1.06B parameter non-transformer LLM with O(n) per-layer complexity, cognitive routing, and hierarchical memory