Skip to content

Latest commit

 

History

History
1171 lines (1036 loc) · 53.3 KB

File metadata and controls

1171 lines (1036 loc) · 53.3 KB

v2.0.0b30 (2026-04-16)

  • Merge pull request #215 from OpenMOSS/dev
  • Release: TP support in QK tracing & perf attribution
  • fix(circuits): basedpyright type error in NodeInfo.is_identity
  • fix(circuits): use NodeInfo.is_identity for source split correctness
  • perf(circuits): batch identity-indexed sources in attribution_scores
  • perf(circuits): memoize NodeRefs.values and cache multi_batch_index shape meta
  • fix(circuits): TP-aware QK tracing and DTensor invariants

v2.0.0b29 (2026-04-15)

  • Merge pull request #213 from OpenMOSS/dev
  • Release: fix docs deploy
  • fix(docs): bump pymdown-extensions to 10.21.2 for pygments 2.20 compat
  • refactor(circuits): minor attribution cleanup

v2.0.0b28 (2026-04-14)

  • Merge pull request #212 from OpenMOSS/dev
  • Release: fix and refactor QK tracing
  • fix(serialize): migrate decorator must target staticmethod's underlying func
  • fix(circuits): drop k_side transpose in slot_pair_attribution, guard empty side
  • refactor(circuits): extract attribution_scores helper, add flat_topk, rewrite qk pair dedup via zero-out
  • chore(server): bf16 dtype and explicit device_mesh in model/SAE loaders
  • fix(circuits): correct QK tracing scale, target post-LN/rotary Q/K, dedup pairs
  • feat(circuits): QKTracingResult bundle with Q/K marginals and merged top-k picks
  • feat(circuits): role-labeled QK pair attribution via VJP with K as cotangent
  • chore: add .playwright-mcp to .gitignore
  • fix(circuits): split bias leaves around apply_saes so SAE sees correct tensor
  • refactor(circuits): reduce intermediate dimension variables in attribution
  • fix(backend): align interfaces of model.qk_trace and qk_trace
  • refactor(circuits): replace NodeInfoRef with collection type NodeRefs

v2.0.0b27 (2026-04-12)

  • Merge pull request #209 from OpenMOSS/dev
  • Release: revert the renaming of NodeDimension
  • refactor(circuits): rename NodeAxis to NodeDimension

v2.0.0b26 (2026-04-12)

  • Merge pull request #208 from OpenMOSS/dev
  • Release: refactor circuits (naming & move to PyTree base class) and fix attribution
  • fix(circuits): keep order in retrieval_from_intermediates
  • fix(circuits): correct NodeIndexed.iter return type
  • refactor(circuits): rename to NodeAxis, NodeIndexed, NodeIndexedTensor/Vector/Matrix
  • refactor: use PyTree base class to support move to device and full tensor
  • refactor: directly import from lm_saes in server
  • refactor(circuits): move circuit tracing parts to separate module
  • fix(circuits): structure Dimension
  • refactor: use cattrs for (de)serialization

v2.0.0b25 (2026-04-10)

  • Merge pull request #206 from OpenMOSS/dev
  • Release: QK Tracing
  • feat(server): support qk tracing
  • refactor(circuits): move Dimensioned to indexed_tensor
  • feat(server): support host mode
  • feat(circuits): support nodes_to_offsets with unregistered nodes (returns -1)
  • refactor(attribution): replace generic D container with Dimensioned for improved type safety and clarity
  • refactor(attribution): unify QK tracing via generic D container and Hessian matrix
  • chore: enforce commitizen pre-commit rules
  • chore: ignore claude artifacts and drop BaseSAEConfig alias
  • refactor(circuits): serialize AttributionResult using torch.save
  • fix(server): preload models on workers & fix distributed function registration
  • fix(circuits): fix full_tensor and to device to transfer all fields properly
  • feat(server): support distributed circuit tracing
  • fix(backend): relax input constraint to allow non-tensor input in distributed settings
  • feat(circuits): support full_tensor and to device in AttributionResult
  • feat(examples): add script for QK tracing in attribute() with model loading and replacement modules
  • feat(circuits): integrate QK tracing into attribute() pipeline
  • chore(deps): bump pygments from 2.19.2 to 2.20.0
  • Bumps pygments from 2.19.2 to 2.20.0.
  • Release notes
  • Changelog
  • Commits

updated-dependencies:


updated-dependencies:

  • dependency-name: requests dependency-version: 2.33.0 dependency-type: indirect ...
  • Signed-off-by: dependabot[bot] support@github.com
  • chore(deps): bump nltk from 3.9.3 to 3.9.4
  • Bumps nltk from 3.9.3 to 3.9.4.
  • Changelog
  • Commits

updated-dependencies:

  • dependency-name: nltk dependency-version: 3.9.4 dependency-type: indirect ...
  • Signed-off-by: dependabot[bot] support@github.com
  • chore(deps): bump aiohttp from 3.13.3 to 3.13.4

updated-dependencies:

  • dependency-name: aiohttp dependency-version: 3.13.4 dependency-type: indirect ...
  • Signed-off-by: dependabot[bot] support@github.com
  • chore(deps): bump tornado from 6.5.4 to 6.5.5
  • Bumps tornado from 6.5.4 to 6.5.5.
  • Changelog
  • Commits

updated-dependencies:

  • dependency-name: tornado dependency-version: 6.5.5 dependency-type: indirect ...
  • Signed-off-by: dependabot[bot] support@github.com
  • fix(circuits): detach ref tensor
  • chore: remove some timers in lorsa and sae
  • chore(dependencies): add numba to dev dependencies
  • fix(circuits): only encode once in apply_saes
  • perf(circuits): use multi_batch_index
  • perf(circuits): use batch index & move source values out of loop
  • feat: implement batch_index to replace the torch DTensor indexing
  • perf: improve tp attribution
  • Use local Dimension; merge NodeInfoRef before doing values
  • fix: abstopk
  • fix: use abstopk in feature activation function

v2.0.0b24 (2026-04-04)

Fix

  • circuit: detach attributions & activations and fix some other dtensor issues; result consistency is not fixed

v2.0.0b23 (2026-04-03)

Fix

  • circuit: fix edge score computation in pruning
  • circuit: some other dtensor issues (WIP)
  • circuit: dtensor issues
  • attribution: use consistent dtype for attribution (#194)

v2.0.0b22 (2026-04-02)

Feat

  • sparse-dictionary: add fold_activation_scale parameter to save_pretrained method and update training settings
  • language-model: add device transfer methods to Dimension and NodeIndexedTensor classes
  • language-model: add support for bias leaves in qk trace

Fix

  • loading lorsa; distributed attribution (WIP)
  • circuits: type error and tests
  • circuits: resolve SAE config types from metadata
  • sparse-dictionary: change default value of fold_activation_scale to False
  • circuits: transfer attribution data to CUDA and reduce batch size in run_circuit_attribution
  • circuits: correct source and target node assignments in load_circuit_graph function
  • language-model: add attention detach hooks
  • circuits: improve SAE name handling and update logit probability mapping
  • attribution
  • language-model: fix bias hooks handling and improve module assertions

Refactor

  • circuits: modularize circuit tracing and attribution logic
  • hooks: clean up replace_biases_with_leaves
  • server: use new attribution logic

Perf

  • circuits: optimize attribution graph pruning
  • optimize circuit graph construction

v2.0.0b21 (2026-03-26)

Feat

  • attribution: Implement qk_trace on TransformerLensLanguageModel to attribute attention scores to upstream features via second-order gradients. Bias trace is not yet included.
  • backend: add tl addons
  • language-model: WIP refactor attribute with Matrix pipeline
  • language-model: introduce Matrix and Node classes for enhanced matrix operations

Fix

  • metrics: use cast(T, ...) to resolve Pyright type error
  • metrics: improve the precision of accumulater to fp32
  • sparse-dictionary: move file existence check after snapshot download to support HF_HUB_OFFLINE case
  • sparse-dictionary: add file existence check for hf loading
  • language-model: fix bug in attribution
  • language-model: ruff & type errors
  • language-model: NodeIndexedTensor index error
  • language-model: remove unnecessary for-loop
  • language-model: fix bug in WIP hook-based replacement model
  • remove abstract_sae file
  • util: move is_primary_rank to distributed utils to avoid circular import

Refactor

  • backend: update apply_saes function and add ln_detach_hooks utility
  • language-model: refactor sae/detach hooks
  • language-model: implemnet attribution with NodeIndexedTensor
  • language-model: better organize most parts (WIP)
  • language-model: refactor Matrix to NodeIndexedTensor
  • language-model: rename hook functions and enhance AdjacencyMatrix class
  • language-model: enhance hook functions and introduce detach_hook_fn_builder
  • move all sparse dictionary models into subfolder; split base class methods into protocols
  • language-model: enhance hook functions to utilize HookPoint and improve type hints
  • language-model: update hook functions to use SparseDictionary instead of AbstractSparseAutoEncoder
  • sparse-dictionary: add hooks_in and hooks_out properties to SparseDictionaryConfig
  • language-model: introduce EdgeMatrix class for cleaner attribute implementation (untested)
  • language-model: complete attribute method implementation (untested)
  • language-model: WIP hook-based replacement model refactor in TransformerLensLanguageModel
  • rename abstract_sae to sparse_dictionary

v2.0.0b20 (2026-03-02)

Refactor

  • rename AbstractSparseAutoEncoder and BaseSAEConfig

v2.0.0b19 (2026-03-01)

Fix

  • add VS Code configuration to .gitignore
  • circuit: ensure feature activations are correctly compared in attribution
  • circuit: add list_of_features parameter to relevant functions and classes

Refactor

  • circuit: optimize linear layer probing with vectorized Jacobian computation

v2.0.0b18 (2026-02-27)

v2.0.0b17 (2026-02-27)

Fix

  • enhance error handling for SAELens format in from_pretrained

v2.0.0b16 (2026-02-27)

Fix

  • wrongly placed type ignore

Refactor

  • remove TransformerLens submodule and fix lorsa type errors
  • drop forked TransformerLens; reimplement run_with_cache_until in lm-saes
  • ui: remove legacy ui codebase

v2.0.0b15 (2026-02-14)

v2.0.0b14 (2026-02-12)

Feat

  • ui/feature: color based on sae type

v2.0.0b13 (2026-02-11)

Feat

  • ui/feature: add hints

v2.0.0b12 (2026-02-11)

Feat

  • ui/server: add configurable LRU cache sizes for models, datasets, saes, and circuits

v2.0.0b11 (2026-02-11)

Fix

  • ci: correct docs group

v2.0.0b10 (2026-02-11)

Perf

  • ui/server: enhance sampling caching

v2.0.0b9 (2026-02-09)

Feat

  • circuits: implement loading and querying of circuit QK node data
  • ui: target blank for embedded pages
  • ui: add new route for embedded circuit QK tracing and update related components
  • ui/embed: add circuit page for embedded iframe
  • ui/embed: enhance FeatureCardCompactForEmbed with plain mode
  • database: support removing SAE sets
  • ui/circuits: implement click outside to close functionality in threshold controls

Fix

  • ui: update defaultVisibleRange handling and adjust node colors
  • ui/embed: remove click effect
  • ui: swap Q and K labels in QKTracing components for consistency
  • ui: adjust layout
  • ui: adjust layout
  • graph: handle bfloat16 tensor conversion to numpy
  • auxk: fix log of l_auxk
  • auxk: fix train and log of auxk
  • math: topk grad backprop
  • circuit: adjust default thresholds for graph pruning and circuit retrieval
  • ui/circuits: remove redundant feature ID display for better narrow screen experience
  • circuit: minor improvements in attribution
  • circuit: support batched tracing.
  • feature_interpreter: correct tensor dimensions in generate_activating_examples
  • ui/circuits: clarify message regarding circuit generation time

v2.0.0b8 (2026-01-18)

v2.0.0b7 (2026-01-18)

Feat

  • ui/circuits: add remix button to fill NewGraphDialog with initial configs
  • circuit: implement circuit generation status tracking and dynamic pruning
  • ui/circuits: enhance SAE set creation dialog with filtering and search functionality
  • ui/dictionary: add feature index input and update dictionary select styling
  • ui/circuits: add BiasNodeSchema and update NodeSchema
  • ui/circuits: add isFromQkTracing field for filtering (untested)
  • ui: support dictionary selection filtering
  • ui/circuits: support logical subgraph
  • ui/circuits: trace features
  • ui/circuits: add QK tracing section to node connections
  • ui/circuit: show new graph dialog when no circuit available
  • ui: support grouping circuits
  • train: add auxiliary loss for topk (#164)
  • lorsa: add auto expand when load from pretrain
  • add capability of tracing from features to attribution graphs (#163)

Fix

  • feature_interpreter: handle None interpretation in feature analysis
  • ui/circuits: specify timeout for circuit generation in undici
  • attn_scores_attribution: correct component expansion for QK calculations
  • ui/circuits: use ctrl/cmd to multiselect features
  • topk: improve topk in single card situation
  • graph: assign sae_name based on feature type in append_qk_tracing_results function
  • circuit: support tracing in bf16 precision
  • autointerp: comment out assertion for interpretation text in interpret_feature function
  • handle 0-d tensors in feature tracing detection (#170)
  • auxk: fix function signature for compute_loss
  • auxk: add valid token mask for dead statistics
  • lorsa: remove useless overide method for lorsa
  • circuit: fix probe equiv for ln for qk norms, fix a gradient control bug
  • ui: fix preview logic. Now consistent with tracing results
  • ui: fix preview logic. Now consistent with tracing results
  • examples: adapt to new from_pretrained method
  • runners: support setting from_pretrained args
  • pass device in from_pretrained

Refactor

  • admin: use transaction to update sae/sae set name
  • ui/circuits: improve circuit query options and navigation handling
  • replacement_model: enhance tokenization by passing tokenizer to ensure_tokenized function
  • replacement_model: simplify input tokenization by utilizing ensure_tokenized function
  • ui: remove debug console logs
  • circuit: enhance node mapping and tracing result handling
  • graph: remove argument_graph_file.py
  • graph: remove debug print statements in append_qk_tracing_results function
  • graph: reorganize graph-related classes and functions
  • attribution: improve readability and structure in _run_attribution function
  • lorsa: replace qk_exp_factor with ov_group_size for attention calculations
  • integrate local/HF/SAELens loading into from_pretrained (#159)
  • relocate configs and enforce import rules (#158)
  • move distributed testing utils into the main library
  • remove distributed state dict loading; directly use nn.Module.load_state_dict instead

v2.0.0b6 (2025-12-30)

Fix

  • examples: add amp_dtype configuration
  • examples: remove use of Path

v2.0.0b5 (2025-12-30)

Feat

  • ui: support iframe
  • ui: dictionary inference (WIP)
  • ui: dictionary inference (WIP)
  • ui: display activation value
  • ui: support modify explanations
  • ui: add bookmark page
  • ui: add admin dashboard
  • support uploading to hf and downloading from hf (#155)
  • ui: display logits/embed tokens
  • ui: preview next token
  • ui: adjust new graph dialog layout
  • ui: support applying chat template
  • ui: display time in graph selector
  • ui: support adding sae set
  • ui: improve circuit visualization
  • ui: improve circuit visualization

Fix

  • ui: avoid infinite useEffect in LinkGraph
  • feature_analyzer: convert DTensor to local before calling item() in FeatureAnalyzer
  • lorsa: fix lorsa distribute load
  • ui: feature schema
  • ui: bookmark count
  • server: update circuit creation logic to handle cases without Lorsas and improve error handling

Refactor

  • cli: use typer to reconstruct the CLI
  • app: split app into smaller modules
  • ui: replace navigation with Link component for improved routing
  • ui: manage svg rendering with react directly, instead of through d3.js; make d3.js only calculate the positions

Perf

  • ui: improve data persistency
  • ui: topologically sort nodes
  • ui: speed up (and track) graph feature retrieval
  • ui: use rbush to speed up index; better cache results
  • ui: add index to node/edge to improve rendering/filtering

v2.0.0b4 (2025-12-18)

Feat

  • analyze: make FeatureAnalyzer aware of mask
  • ui: hover & click nearest node
  • transcoder: init transcoder with MLP.
  • circuit tracing with backend interaction
  • optim: add custom gradient norm computation and clipping for distributed training
  • training: support training lorsa with varying lengths of training sequences. This will lead to total number of training tokens inaccurate (#150)
  • metrics: add GradientNormMetric and extend Record with reduction modes
  • lorsa: Init lorsa with the active subspace of V.
  • ui: simply move the original circuit page to ui-ssr
  • backend: add DTensor support to TransformerLensLanguageModel
  • circuit: Major revision. 1. Support circuit tracing with plt+lorsa and plt only. wrap list of plts into Trancoder Set, following circuit tracer. 2. update QK tracing. Now we can see feature-feature pairwise attribution. Efficiency might require revisiting. 3. refactor attribution sturcture. Breaking down several heavy files. Ready to be further improved, mainly in reducing numerous if use_lorsa branches
  • ui: adjust accent color
  • ui: remove in card progress bar
  • server: support preloading models/saes
  • ui: feature list in feature page (WIP)
  • ui: dictionary page
  • ui: dictionary page (WIP)
  • ui: interpretation with real data
  • ui: set scrollbar-gutter to ensure space reserved for scrollbar to prevent layout shiftingg
  • ui: support paged queries of samples
  • autointerp: refactor to async & support lorsa
  • conversion methods between lm-saes and saelens
  • train: add checkpoint resume support for crosscoder, clt, lorsa and molt runners
  • support resuming wandb run from training checkpoint
  • distributed: add get_process_group utility, trying to fix checkpoint saving during sweeps.
  • generate: add override_dtype setting to control activation dtype in GenerateActivationsSettings
  • autointerp: refactor to async & support lorsa
  • analyze: add functionality to save analysis results to file and update analysis with logits; enhance settings for output directory and feature analysis name

Fix

  • runner: type mismatch
  • tc: fix type problem
  • tc: fix type problem
  • trainer: remove assertion for clip_grad_norm in distributed training
  • attribution: add details to some comments
  • ui: minor layout issues
  • ui: correctly display truncated z pattern
  • ui: better display dead feature
  • backend: use TokenizerFast for trace token origins
  • server: synchronized decorator type issue
  • misc: we do not want to filter out eos. It might be end of chats and include useful information
  • compute_loss DTensor loss shape
  • training: also transform batch['mask'] to Tensor from DTensor in… (#152)
  • lorsa: avoid triggering DTensor bug in torch==2.9.0
  • lorsa: fix set decoder norm for lorsa
  • lorsa: fix lorsa init
  • server: expose lru_cache ability from synchronized decorater
  • attribution: fix missing gradient flow configuration for lorsa QKnorm (#149)
  • optim: add DTensor support for SparseAdam, redistribute grad to match parameter's placements when grad is DTensor
  • ui: fix visible range comparison
  • metric: support inconsistent batch size
  • ui: fix feature list height
  • ui: reinitialize useFeatures hook when concerned feature index out of range
  • ui: feature list loading previous page causes wrong scroll position
  • pin torch==2.8.0 for dtensor compatibility. - Pin torch version to 2.8.0 to avoid dtensor-related errors in 2.9.0 - Remove unused d_model field from LanguageModelConfig - Add GPU memory usage display in training progress bar - Move batch refresh to end of training loop iteration
  • database: deal with none value
  • TL: add support for whole qwen3 family & fix inconsistency in tie-word-embed
  • type errors due to torch updates on local_map
  • trainer: correct token count calculation for 2D activation in LORSA training
  • trainer: use ctx.get() for optional coefficients to prevent KeyError
  • activation: use local_map for mask computation on DTensor to ensure correct device placement
  • activation: make mask/attention_mask on the correct device
  • replace .item() with item(...) to ensure distributed consistency.
  • basedpyright: exclude scripts folder
  • basedpyright & ruff issues
  • fix sweep SAE distributed training and convert Path to str in training configs
  • log: use torch.any to detect inf value in total_variance_mean, to fix crosscoder case.
  • crosscoder: make tokens DTensor in analysis
  • log: fix incorrect log_info update
  • ensure WORLD_SIZE is correctly compared as an integer in training scripts
  • prepend_bos: fix shape unmatch
  • distributed: update broadcast_object_list calls to use group_src parameter
  • batchtopk: fix batchtopk in dp mode
  • distributed: fix the sort of import
  • analyze: improve implements for list operations
  • analyze: define functions common to Tensor and DTensor, improve implements for list operations
  • basedpyright issues (WIP)
  • analysis: fix tp for analyze_chunk

Refactor

  • ui: remove standalone CircuitVisualization component
  • ui: split data and visual states; move up feature fetching logic
  • use tanstack start for frontend; make a more neuronpedia-like ui (#146)
  • use Metric classes to run evaluation (#133)
  • use Metric classes for disentangled metric computation
  • use TensorSpecs rather than logging method dispatch for logging with different SAE variants (#130)

Perf

  • ui: better visual display for circuit (WIP)
  • ui: fetch sample range on demand

v2.0.0b2 (2025-11-04)

v2.0.0b1 (2025-11-04)

Feat

  • crosscoder: add log for (non) activated decoder norms in activated feature
  • add tanh-quad frequency_scale to TrainerConfig
  • circuit tracing + z_pattern
  • general loading for HuggingFace models
  • add z_pattern
  • add z_pattern
  • dla: add DLA test for clt and lorsa
  • circuit: add intervention for replacement model; add tests for attribution and intervention.
  • update ev compute
  • analyze: support molt analysis
  • trainer: add per rank_group logging logic to molt
  • molt: implement distributed molt
  • molt: most of molt should be done right
  • molt: a staged version of molt
  • activation: chunk assignment in tp+dp setting
  • analysis: reimplement DirectLogitAttributor and related configurations
  • analysis: reimplement DirectLogitAttributor and related configurations
  • feature_analyzer: implement DDP support for feature analyzer. Add DDP capabilities to the feature analyzer. Note: TP compatibility is not guaranteed and may require additional work.
  • feature_analyzer: add mask ratio statistics update and logging
  • api: add metric filtering and retrieval for features
  • language_model: enhance activation processing and configuration
  • language_model: enhance LLaDA model integration and preprocessing
  • activation: support checking activation consistency
  • activation: implements distributed loading across tp dimension
  • activation: implements distributed loading across tp dimension
  • TransformerLens: fix precision mismatch for ln hook_normalized for rms norm
  • trainer: support param groups
  • l_p: add pre-act loss for jumprelu and fix some bugs in sae.py and lorsa.py
  • qk trace: implement compute_attention_scores_attribution in attribution
  • autointerp of neuronpedia
  • add autointerp and logits to graph json
  • DLA for lorsa and clt; autointerp for lorsa and clt.
  • improve lorsa and clt training
  • topk: add conversion to jrelu
  • circuit; frontend: add support to show feature card in linkgraph
  • ui: update circuit tracing (not done but quiet close)
  • kernels: support encoder bacakward acceleration kernels
  • clt: add binary search for singleGPU and fix some bug regarding kernel opening logic
  • kernels: add a shitty implementation of encode with triton kernel
  • clt: add a binary_search method to accelerate batchtopk
  • clt: improve batchtopk by divide and conquer; precision and efficiency will be done later
  • clt: major change. clt done right. will fix dirty parts later
  • activation: implements distributed loading across tp dimension
  • activation: implements distributed loading across tp dimension
  • TransformerLens: fix precision mismatch for ln hook_normalized for rms norm
  • config: add prepend_bos option to LanguageModelConfig
  • timer: integrate timer to activation writer
  • evaluator: add evaluation functionality for CrossCoders
  • trainer: log current l1 coefficient
  • trainer: remove l0_based decoder weight learning rate
  • config: add LLaDA model configuration and implementation
  • clt: start to seem good in training. fix all bugs in fwd and bwd (temp)
  • clt: passed all fwd test in distributed settings. requires training on real data
  • trainer: add update_decoder_lr_with_l0 flag and enhance learning rate adjustment
  • trainer: add expected_l0 parameter and update decoder learning rate dynamically
  • abstract_sae: enhance JumpReLU with precision promotion
  • crosscoder: support head parallelism with world size < head number
  • logging: implement centralized logging system across modules
  • vis: enhance feature retrieval and analysis handling
  • bookmarks: implement bookmarking functionality for features
  • abstract_sae: enhance state dict handling in AbstractSparseAutoEncoder
  • feature-interpretation: enhance feature interpretation component and update API calls
  • abstract_sae: add support for DCP checkpoint format in save/load methods
  • trainer: add per-head logging (ev & l0) for crosscoder
  • interpreter: improve automatic feature interpretation with multiprocessing
  • timer: integrate timing functionality into key methods for performance monitoring
  • interpreter: support for auto interp without loading LLM ckpt
  • interpreter: enhance feature interpretation with optional interpretation field and parallel processing
  • feature: add analysis retrieval and enhance feature details in UI
  • sae: enhance parameter loading and device management in JumpReLU and CrossCoder
  • crosscoder: distribute tensor while loading
  • config: add non-activating subsample parameters and enhance FeatureAnalyzer with non-activating example sampling
  • crosscoder: support inputs of different shapes on each head
  • sae: support tanh-quad loss
  • decouple batching and activation 1d
  • analysis: support converting from crosscoder head parallel to analyzer feature dimension parallel
  • implement autointerp
  • crosscoder: add distributed support
  • ui: visible segments
  • feature analysis for crosscoder
  • share activation factory in sweep experiment
  • crosscoder: reimplement crosscoder to internalize n_heads
  • mixcoder: fix bugs when using apply_decoder_bias_to_pre_encoder
  • visualization: support act frequency
  • analysis: record analyzed token count
  • backend: re-support TransformerLens models
  • visualization: add modality specific info
  • analysis: add modality-specific metrics
  • language_model: set padding method to max_length
  • analysis: support mixcoder
  • support multi-lingual mixcoder
  • trainer: add more loginfo for mixcoder training
  • backend: support qwen2.5 base
  • language_model: support qwen2.5 vl with hf_backend
  • backend: add language model base class
  • kernels: support spmm triton kernel for topk greater
  • kernels: support spmm triton kernel for topk greater
  • kernels: support spmm triton kernel for topk saes. Topk SAEs do not require precise gradients passed to feature acts so acceleration can be greater
  • mixcoder: changed loss calculation method and added more log in trainer
  • spmm_decode: implement sparse mm decode few & bwd with triton
  • anthropic jumprelu: ready for training
  • anthropic jumprelu: training is ready
  • anthropic jumprelu: initial implementation
  • runner: support train with non-pre-generated activations
  • trainer: add some new extra log info for mixcoder
  • trainer: add some new extra log info for mixcoder
  • trainer: add extra log info for mixcoder
  • runner: support mixcoder training (#78)
  • activation: add tqdm in loading cached activation
  • mixcoder: implemented mixcoder
  • sae: support saving/loading dataset_average_activation_norm to/from SAE state dict
  • sae: change input format of forward method
  • entrypoint: support train/analyze runner and create/remove dataset record
  • dataset: support removing analysis & sae
  • analysis: remove internal batching
  • activation: support writing activations without batching
  • runner: add AnalyzeSAERunner
  • config: automatically load model & dataset config from database
  • activation: batched model generation
  • activation: add some tqdms to monitor activation generation
  • activation: add model_name to activation meta
  • runner: add train sae runner with TrainSAESettings; update SAE initialization and training logic
  • activation: support load from cached activatio in ActivationFactory
  • config: remove FlattenableModel inheritance
  • entrypoint: support generate activations
  • runner: generate activations
  • activation: add start shard
  • activation: support ddp activation generation
  • add ActivationWriter
  • activation: rename "info" to "meta"
  • activation: add CachedActivationLoader
  • complete ActivationFactory
  • activation: implement basic activation pipeline

Fix

  • basedpyright issue (WIP)
  • backend: unify to tokens function
  • basedpyright issue (WIP)
  • TL
  • crosscoder training
  • lorsa post qk init
  • crosscoder: dp gradient should be partial
  • import
  • crosscoder: eliminate the impact of activation function from f(x)/x to f(x)
  • abstract_sae: use local_map to conduct mean reduction in tanh-quad computation
  • import
  • attribution graph and intervention pnly for CLT. and ruff
  • attribution graph and intervention pnly for CLT.
  • attribution
  • analyze for SAE
  • ruff
  • analyze for clt
  • trainer: correctly apply jumprelu lr; make new ev appliable to crosscoder
  • circuit: change absolute path
  • train: support sae for data prallel
  • molt: fix a major bug in _decode_distributed and now dist molt is done right
  • molt: fix a init bug in dist molt
  • trainer: adapt log_info for situations where l_s doesn't exist
  • runner: dp runner
  • activation: tensor parallel
  • recycle import and jumprelu pre-act loss
  • activation: adjust total count for activation processing in CachedActivationLoader
  • activation: fix batch size validation for data parallelism
  • distributed: fix several dp placements errors. revert the CachedActivationLoader to load the entire chunk.
  • distributed: enhance data parallelism support
  • writer: avoid including activations from other hook points during chunk data handling in ActivationWriter and improve assertion clarity in CachedActivationLoader.
  • activation: correct chunk buffer condition in CachedActivationLoader.
  • ci: fix dependency installation
  • ci: ignore all import errors; temporarily remove unit tests checking; fix ci dependency installation
  • activation: update device_mesh type hint and improve DistributedSampler usage
  • trainer: remove logics about n_forward_passes_since_fired
  • code comments. del useless file.
  • autointerp of neuronpedia about 'say ...' feature
  • autointerp of neuronpedia
  • circuit trace clt error node idx
  • clt; lorsa: minor fixes on scalable training
  • ui: fix bugs in lorsa zpattern
  • circuit: minor improvements in attribution
  • kernels: return a minimal zero tensor as grad in case no feature is activated for some rank
  • clt: replace isinstance(,torch.sparse.Tensor) with torch.is_sparse
  • math: batch_kthvalue_clt_binary_search donot need grad
  • misc: fix a misalignment bug in 7fd72fc introduced by 7fd72fc
  • activation: update device_mesh type hint and improve DistributedSampler usage
  • trainer: remove logics about n_forward_passes_since_fired
  • misc: remove abundant mixcoder logics in config; bring back time in abstract_sae
  • type errors
  • language_model: enable trust_remote_code in model loading
  • sae: remove sae data parallel runner
  • sae: placements of norms
  • sae: add device_mesh parameter to init
  • trainer: ensure metrics logging only occurs on primary rank
  • bookmarks: correct query parameter in bookmarks feature link
  • abstract_sae: correct tensor shape initialization in state dict
  • trainer: update crosscoder logging keys to use hook points instead of batch keys
  • trainer: logs timer stats when enabled
  • abstract_sae: correct indentation for label tensor conversion
  • pyproject.toml: fix torch version
  • crosscoder: correct slice calculation for local rank in tensor distribution
  • trainer: move compile to training step
  • analysis: insert analysis only once
  • backend: minor type errors
  • activation: generate activation w.r.t. specific context size
  • crosscoder: various issues in crosscoder head parallel
  • crosscoder: fix distributed training
  • analysis: add start_idx in tp analysis
  • minor compatibility issues with crosscoder
  • replace SparseAutoEncoder with AbstractSparseAutoEncoder
  • initializer: normalize activations in initialization search
  • visualization: typos
  • language_model: add support for multiple images in single input
  • server: batchify raw data for tracing
  • fix some bugs in training and analysis
  • backend: padding & truncation
  • mixcoder: pass in modalities to encode & decode
  • activation: transform & batchify all tensor field
  • activation: intercept activation source
  • activation: generate activation in no_grad mode
  • image format from HuggingFace dataset
  • server: correctly retrieve image
  • kernel: fix speed degradation of TopK kernel
  • cached_acts: re-implement changes mistakenly removed in 71ff9f9
  • kernels: fix bugs in kernel tests
  • server: correctly retrieve non-sharded dataset
  • activation: preserve tokens type during dtype conversion
  • activation: preserve tokens type during dtype conversion
  • activation: preserve tokens type during dtype conversion
  • analyze: misc things stopping analysis working
  • trainer: convert modality_ev from bf16 to fp32
  • kernel: fix type error
  • sae: fix type error
  • cc: fix cc inheriting logics of activation func
  • tests;ruff: remove unused variables and functions. fix errors in tests
  • cached_activation: fix errors in implementation of overriding cached activations
  • triton: add triton in pyproject
  • fp32 threshold: enforce jumprelu threshold uses fp32
  • fp32 threshold: enforce jumprelu threshold uses fp32
  • mixcoder: fix topk activation func
  • config: fix lr warmup & cooldown step default value
  • config: set prefetch in ActivationFactoryActivationsSource to be optional
  • training: fully utilize GPU cores with torch dataloaders for CachedActivations
  • test_sae: fix ground truth for sae fwd when sae does not apply dec bias to pre dec by default
  • crosscoder: fix minor bugs and logs in cc training
  • crosscoder: fix minor bugs and logs in cc training
  • crosscoder: fix minor bugs and logs in cc training
  • ui: minor type error
  • topk activation: add keepdim=True to enable broadcasting; make d… (#73)
  • runner: fixed the issue where the wandb logger was not properly initialized during SAE training
  • misc: fix calculate_activation_norm method
  • entrypoint: load dataset & model config
  • server: database interaction
  • analysis: early skipping condition
  • activation: inconsistent runner & component config
  • master node condition
  • module export
  • activation: remove meta requirements for cached activation
  • activation: mask tokens
  • trainer: fix warmup process of lr & topk
  • sae: fix the key error of sae.dataset_average_norm
  • config: set default ActivationFactoryDatasetSource.prepend_bos to False
  • sae: fix some bugs, log device_mesh in SAE now, and add trainer tests
  • database: add sae path
  • trainer: fix type error
  • activation: record shard info in metadata
  • resouce_loader: num of shards when specifying start_shard
  • activation: pipeline order
  • activation: misc issues in generating activations
  • SparseAutoencoder: overlapping method sign in overloading compute_loss

Refactor

  • move model-specific logs to model class
  • molt: remove sparisty score and activation func acts directly on hidden_pre
  • utils: remove unused util modules; reorganize distributed module
  • abstract_sae: prepare_input should now also return decoder_kwargs
  • molt: refactor rank assignments logic for disentangling dist inference and dist training
  • molt: refactor decode into einsum oprations
  • molt: refactor decode and achieve a comparable efficiency to transcoder
  • molt: refactor decode for a better effciency and vram usage
  • molt: refactor rank distribution logic and a fix a bug in decode einops
  • activation: improve distributed loading logic and tensor gathering
  • writer: streamline chunk data structure in activation writer to include extra information.
  • activation: simplify activation output structure and type hints
  • clt: disentangle distributed and single-gpu logic of encode and decode
  • kernels: move sparsity check into kernels for decoder regarding the unequal division of activations across devices
  • kernels: move sparsity check into kernels regarding the unequal division of activations across devices
  • kernels: remove vmap implementation for encoder kernels
  • distributed: remove heuristic topk implementation which is no longer needed
  • timer: enhance TimerNode and Timer structure for hierarchical path management
  • resource_loaders: update imports for LLaDALanguageModel
  • language_model: rename LLaDA class to LLaDALanguageModel
  • timer: enhance hierarchical timer functionality
  • clt: remove debug print statements from CrossLayerTranscoder
  • initializer: streamline initialization logic for encoder norms
  • sae: directly manipulate DTensor parameters
  • initializer, runners: remove MixCoder references and clean up code
  • entrypoint: update import paths for runner modules
  • initializer: remove constant fire times bias initialization and rename method
  • analysis: update import for feature interpretation module
  • autointerp: remove auto_interp module and integrate functionality into runner
  • distributed: replace dictionary-based dimension maps with DimMap class
  • runners: reorganize runner module and update imports
  • decouple hook_point_in/hook_point_out with BaseSAEConfig
  • explicitly normalizing activations
  • move public interface of SparseAutoEncoder into AbstractSparseAutoEncoder
  • activation: directly get activation from raw data
  • activation: directly get activation from raw data
  • activation: replace HookedTransformer with LanguageModel
  • activation: reorganize Dataloader-based cached activation reading
  • sae: decouple encoder & decoder from sae methods
  • server: comply to new api
  • analysis: replace sample_feature_activation with FeatureAnalyzer
  • config: compute BaseSAEConfig.d_sae on the fly; set default for BaseSAEConfig.hook_point_out based on BaseSAEConfig.hook_point_in
  • database: drop GridFS; use a 3-level architecture (sae, analysis, feature); use pydantic to parse database retrieval result
  • config: use pydantic models
  • trainer: fix bugs in sae and initializer and add codes for trainer
  • activation: add low level processors of ActivationPipeline

Perf

  • abstract_sae: reduce tanh-quad memory cost in dp by manually execution of DTensor mean
  • crosscoder: put data in the first mesh dim
  • crosscoder: add encoding and decoding methods with optional einsum support
  • vis: refactor feature components for improved performance and usability
  • crosscoder, analysis: improve tensor redistribution strategy to reduce GPU memory usage
  • abstract_sae, crosscoder: fully distributed initialization & redistribute accumulated_hidden_pre in crosscoder encode
  • trainer: remove unnecessary tensor gathering
  • crosscoder: directly initialize parameters and parallelism in target devices
  • trainer: compile the model
  • analysis: change distribute_tensor to DTensor.from_local for non-leaf tensor
  • runner: remove useless rebatching
  • runner: logging load/save config from/to database
  • activation: support parallel writing activation
  • activation: support parallel & background loading cached activation

v1.1.0 (2024-12-25)

Feat

  • ui: add showImageHighlights and showImageGrid buttons
  • ui: show sample with origins
  • ActivationSource: filter pad token from activation dataset
  • server: set field feature_acts_all in feature as optional
  • analysis: refactor token source & analysis
  • ActivationSource: support cached multi-dataset activation
  • support tracing token origins
  • act gen: add option to center a batch of activation generated

Fix

  • ActivationSource: remove act from the buffer after taking it out of the buffer
  • server: load datasets & model
  • ActivationSource: set correct device in loading cached activation
  • generate activations
  • types and format
  • server: steering
  • minor type issues

Refactor

  • make all types work with basedpyright

v1.0.0 (2024-11-01)

Fix

  • example: remove training examples from an older version
  • example: fix error in loading example
  • server: specify env file for starting server
  • tensor_parallel: avoid passing during_init to indicate pre-tp condition
  • example: remove device-specific path
  • mypy: fix mypy issues
  • example: fix wrong default params for llamascope

v0.1.0 (2024-09-12)

Feat

  • ui/circuit: show attention pattern
  • server: catch oom error
  • server: trace attention score
  • ui/circuit: trace attention score; remove existed tracing
  • ui/circuit: show information of the selected node/edge
  • server: update tracing return schema
  • ui/circuit: trace intermediate nodes
  • server: add tracing api
  • ui/model-page: add basic circuit
  • sae_training: change save ckpt interval to log scale
  • ui/model-page: steering
  • ui/server: model generate with sae and steering
  • ui/model-page: add detail of selected tokens
  • ft4supp: support ft4supp adjusted for AprilTrick update SAEs
  • sae: add a utils func to merge pre-enc bias into enc bias
  • ft4supp: support ft4supp adjusted for AprilTrick update SAEs
  • ui/model-page: generation section
  • ui/model-page: create model generation interface
  • ui: create model page
  • analysis: support tensor parallel
  • sae: add a utils func to merge pre-enc bias into enc bias
  • ui: update style of section navigator
  • model: support llama3_1
  • model: support llama3_1
  • config: add decay ratio
  • config: support warmup step set to a proportion of overall steps
  • fix type error
  • offload LLM parameters after last hook, and support warm up/cool down ratio in config
  • sae: Implement ckpt saving in tensor parallel environment.
  • Implement tensor parallelism in SAE using device mesh
  • circuit: add specific functionality for attributing transformers
  • circuit: add basic attributors
  • HookedRootModule: add mount_hooked_modules
  • runner: load tokenizer manually
  • runner: load tokenizer manually
  • entrypoint: add entry point for lm_saes
  • HookedRootModule: implement run_with_cache_until & remove fake tensors
  • HookedRootModule: fix & add test case for fake tensors
  • training: support buffer filling from multiple data sources and configure pack and sample probability for each dataset
  • sae: implement from_pretrained from huggingface hub
  • change ckpts to safetensors; decouple RunnerConfig; add from_pretrained for local parameters
  • server: support all TL models in server app
  • runner: replace hardcoded 'gpt2' with cfg.model_name variable
  • analysis: accelerate analysis with chunked d_sae and stop_at_layer and a pre-check before sorting
  • hook_points: Enable early stopping by converting parameters to fake tensors
  • runner: replace hardcoded 'gpt2' with cfg.model_name variable
  • transformer_lens: add ref cache
  • add support for bf16
  • circuit: minor changes
  • runner: add model_from_pretrained_path param
  • circuit: contribution graph
  • analysis: support analysis on self-trained models
  • visualizer: display token position
  • db: remove dictionary
  • sae: unbind sae input and label
  • core: support analysis and visualization for self-trained models
  • sae: support training saes in local models
  • visualizer: auto fetch dictionary
  • visualizer: add navbar
  • visualizer: multiple dictionary samples
  • visualizer: dictionary sample display
  • visualizer: dictionary custom input
  • visualizer: add feature logits
  • visualizer: add feature logits
  • visualizer: attn score
  • stats: compute attn score
  • visualizer: auto interp
  • auto_interp: optimize logic and improve code readability
  • autointerp: implement automatic interpretation
  • visualizer: search params for dictionary and feature index
  • visualizer: feature activation histogram
  • visualizer: pagination
  • visualizer: custom input
  • visualizer: subsample
  • visualizer: fetch random living feature
  • FeatureActivations: subsample
  • visualizer: add result dir
  • enable non-strict loading of model state dicts
  • sae pruning
  • record wandb id
  • adjust result saving structure
  • visualization: dictionary selection
  • FastAPI server for feature visualization
  • sample_feature_activations: save feature index
  • save feature activations as huggingface dataset
  • sample_feature_activations: compute feature act hists
  • save feature acts with datasets & compute feature act bins
  • TokenSource: directly remove bos token
  • TokenSource: disable concating tokens
  • add glu encoder bias
  • add gau encoder
  • batch-wise act norm
  • scheduler: add cool down
  • add eval runner
  • evals: explained variance and l0
  • SAE: config to remove decoder bias
  • count useful feature
  • support non-exactly decoder norm
  • remove thomson potential
  • exponential warmup scheduler
  • sample feature activations
  • add config for lp and Adam betas
  • load checkpoint
  • load hf model from local files
  • ActivationSource: load cached activations
  • remove unused field when generating activations
  • generate activations on disk
  • add cache_dir config
  • add l2 norm error metric
  • update activation source
  • activation source
  • switch to their activation store

Fix

  • server: trace sae feature
  • eval: not offload params on eval
  • config: fix linear saving mode bugs and mypy issues
  • runner, server: use HookedTransformer.from_pretrained_no_processing under all circumstances
  • TransformerLens: fix mlp dtype and missing attn code
  • sae: support grid searching for best init
  • ui/sample: fix folded start
  • runner: add accidentally missing from_init_searching
  • runner: add accidentally missing from_init_searching
  • runner: set offload after the last hook (previously it was the first)
  • textdataset: set default prepend_bos to True
  • textdataset: set default prepend_bos to True
  • ft4supp: supp final ver.
  • misc: remove unnecessary changes
  • sae: do not init device mesh in single device mode
  • ui/model-page: model generation
  • sae: move transform decoder_norm to save_pretrained
  • sae: transform decoder_norm and encoder_norm to dtensor under tensor parallel settings
  • sae: do not init device mesh in single device mode
  • training: stable GPU usage
  • training: stable GPU usage
  • training: stable GPU usage
  • runner: remove redundant code
  • training: stable training
  • training: change back to clip grad norm
  • ui: import error
  • ui/preview: add color to "..." to avoid confuse
  • ui/preview: disable flex to avoid showing long token
  • ui/preview: assign unique value to accordion item; fix(ui/preview): enable showing 10 samples within one page
  • runner: remove redundant code
  • training: use clip grad value instead of norm
  • sae: fix post process in transform_to_unit_decoder_norm
  • typing: fix mypy issues
  • config: ignore mypy checking for norm init.
  • sae: merge a standalone SAE init func with static method
  • typo error
  • typo
  • typo
  • sae: fix transform_to_unit_decoder_norm in tensor parallel
  • prune_sae: support bfloat16
  • convert decoder bias to local tensor while using tensor parallel
  • fix some bugs encountered during the initialization of SAE and the retrieval of next token in a tensor parallel environment.
  • sae: fix merge bugs
  • fix bugs of prepend bos during eval and sampling (#30)
  • evals: performs logits mask when computing ce score. Ignoring pad tokens
  • activation gen: remove bos. ce score is greatly improved
  • resolve DDP-related synchronization bug
  • HookPoint: use register_full_backward_hook in bwd hooks
  • sae_training: support bf16
  • frontend: implement a byte-to-unicode function instead of using the hf implementation in tokenizer
  • examples: fix programmatic runners
  • type checking workflow
  • replace deprecated import from core package with lm_saes package
  • runner: add dtype parameter during model initialization
  • analysis: optimize GPU memory usage after finishing each chunk
  • analysis: fix numpy to list in runner.py
  • analysis: optimize GPU memory usage after finishing each chunk
  • activation: remove context info in activation_source
  • runner: add dtype parameter during model initialization
  • sae: set default use_decoder_bias to False
  • sae: bring back decoder bias
  • SAE: output non-normalized aux data in compute_loss
  • remove decoder bias
  • visualizer: load model
  • visualizer: load sae
  • visualizer: preserving leading & trailing space of tokens
  • visualizer: update token group indices on samples changed
  • visualizer: minor style fixes
  • minor bugs in database io
  • runner: minor bugs
  • database: minor bugs
  • visualizer: prevent fetchFeature with empty searchParams
  • analysis: disable feature_acts gathering when subsampling
  • SAE: apply decoder bias to x_hat at correct place
  • FeatureActivations: minor bugs
  • notebooks: use correctly pruned sae
  • sae: apply feature activation mask & scale
  • prune_sae: assign mask
  • SAE: parameter groups
  • pack tokens which cannot be directly decoded
  • server: minor bugs
  • TokenSource: skip tokens with inadequate seq_len
  • feature activation notebook
  • sample_feature_activations: analyzing steps
  • TokenSource: disable concating tokens
  • ActivationStore: shuffle activations
  • SAE: batch wise act norm
  • eval: ev computation
  • SAE: reset decoder norm
  • minor bugs in eval
  • sample_feature_activations: rand distribution
  • exponential warmup
  • sample_feature_activations: nan elt value
  • sample feature activations
  • load hf model
  • eval: remove attention mask in computing ce loss
  • activation_dataset: chunk size
  • minor bugs in generate_activations
  • use default_factory to create default list
  • run name

Refactor

  • ui/circuit: use tracings as circuit props
  • ui: integrate functionalities into Sample component
  • config: merge exp_name into exp_result_dir as exp_result_path and added a path field to the database
  • circuit: decouple attributor with saes
  • config: change to compositional config
  • analysis: move feature direct logits contribution to core.analysis
  • visualizer: move database module to core
  • analysis: analysis to database
  • migrate result to MongoDB
  • visualizer: FeatureCard
  • sample feature activations

Perf

  • ui/circuit: select edges
  • comply with strict mypy typing
  • early stop activation caching with run_with_cache_until
  • activation_source: disable gradients during inference
  • SAE: decouple encoding and decoding procedure of SAE
  • training: merge finetuning into training process
  • visualizer: change plotly style
  • remove unused config
  • ft4supp: remove unused code & save hyperparams
  • visualizer: hint for loading dictionary first time
  • visualizer: minor style changes