Skip to content

Commit 0b24fe8

Browse files
committed
LZPgen stability improved, fixed bug in stop condition probability and fragility, also made simulating sequences more efficent via caching
1 parent a33df0a commit 0b24fe8

20 files changed

Lines changed: 3495 additions & 190 deletions

Examples/LZPgen Example.ipynb

Lines changed: 1100 additions & 0 deletions
Large diffs are not rendered by default.

docs/api/metrics.md

Lines changed: 152 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,15 @@ from LZGraphs import (
2222
jensen_shannon_divergence,
2323
cross_entropy,
2424
kl_divergence,
25-
mutual_information_genes
25+
mutual_information_genes,
26+
transition_predictability,
27+
graph_compression_ratio,
28+
repertoire_compressibility_index,
29+
transition_kl_divergence,
30+
transition_jsd,
31+
transition_mutual_information_profile,
32+
path_entropy_rate,
33+
compare_repertoires,
2634
)
2735
```
2836

@@ -223,8 +231,151 @@ print(f"MI (V): {mi_v:.4f}, MI (J): {mi_j:.4f}")
223231

224232
---
225233

234+
## Information-Theoretic Metrics
235+
236+
### transition_predictability
237+
238+
Measures how deterministic the graph transitions are relative to the maximum possible branching.
239+
240+
```python
241+
from LZGraphs import transition_predictability
242+
243+
tp = transition_predictability(graph)
244+
print(f"Transition predictability: {tp:.3f}") # 0 to 1
245+
```
246+
247+
!!! note "Function Signature"
248+
`transition_predictability(lzgraph, base=2) -> float`
249+
250+
Returns a value in [0, 1]. Higher values indicate more deterministic transitions (restricted repertoire). Empirically stable at ~0.60 for AAPLZGraph across sample sizes.
251+
252+
### graph_compression_ratio
253+
254+
Measures how much the graph compresses repeated transitions into shared edges.
255+
256+
```python
257+
from LZGraphs import graph_compression_ratio
258+
259+
gcr = graph_compression_ratio(graph)
260+
print(f"Compression ratio: {gcr:.3f}") # 0 to 1
261+
```
262+
263+
!!! note "Function Signature"
264+
`graph_compression_ratio(lzgraph) -> float`
265+
266+
Returns `n_edges / n_transitions`. Lower values indicate more path sharing. AAPLZGraph ~0.18, NaiveLZGraph ~0.05.
267+
268+
### repertoire_compressibility_index
269+
270+
Alias for `transition_predictability`, framed from a data compression perspective.
271+
272+
```python
273+
from LZGraphs import repertoire_compressibility_index
274+
275+
rci = repertoire_compressibility_index(graph)
276+
print(f"Compressibility: {rci:.3f}") # 0 to 1
277+
```
278+
279+
!!! note "Function Signature"
280+
`repertoire_compressibility_index(lzgraph, base=2) -> float`
281+
282+
RCI = 1 means fully deterministic (compressible), RCI = 0 means maximally uncertain (incompressible).
283+
284+
### path_entropy_rate
285+
286+
Estimates the average information content per subpattern step across actual sequences.
287+
288+
```python
289+
from LZGraphs import path_entropy_rate
290+
291+
sequences = data['cdr3_amino_acid'].tolist()
292+
h = path_entropy_rate(graph, sequences)
293+
print(f"Entropy rate: {h:.3f} bits/step")
294+
```
295+
296+
!!! note "Function Signature"
297+
`path_entropy_rate(lzgraph, sequences, base=2) -> float`
298+
299+
Uses `walk_log_probability()` internally. AAPLZGraph ~2.5 bits/step, NaiveLZGraph ~3.5 bits/step.
300+
301+
---
302+
303+
## Transition-Level Divergence
304+
305+
### transition_kl_divergence
306+
307+
Transition-level KL divergence — compares the transition structure, not just node distributions.
308+
309+
```python
310+
from LZGraphs import transition_kl_divergence
311+
312+
kl = transition_kl_divergence(graph1, graph2)
313+
print(f"Transition KL: {kl:.4f}")
314+
```
315+
316+
!!! note "Function Signature"
317+
`transition_kl_divergence(lzgraph_p, lzgraph_q) -> float`
318+
319+
Asymmetric, can be infinite. Use `transition_jsd` for a bounded alternative.
320+
321+
### transition_jsd
322+
323+
Transition-level Jensen-Shannon divergence — always finite, symmetric.
324+
325+
```python
326+
from LZGraphs import transition_jsd
327+
328+
jsd_t = transition_jsd(graph1, graph2)
329+
print(f"Transition JSD: {jsd_t:.4f}") # 0 to 1
330+
```
331+
332+
!!! note "Function Signature"
333+
`transition_jsd(lzgraph1, lzgraph2) -> float`
334+
335+
Symmetric and bounded [0, 1]. Recommended for comparing repertoire transition structures.
336+
337+
### transition_mutual_information_profile
338+
339+
Position-specific mutual information along the CDR3 sequence.
340+
341+
```python
342+
from LZGraphs import transition_mutual_information_profile
343+
344+
tmip = transition_mutual_information_profile(graph)
345+
for pos in sorted(tmip):
346+
print(f"Position {pos}: MI = {tmip[pos]:.3f} bits")
347+
```
348+
349+
!!! note "Function Signature"
350+
`transition_mutual_information_profile(lzgraph) -> dict`
351+
352+
Returns `{position: mutual_information}`. Only works with positional graphs (AAPLZGraph, NDPLZGraph). Raises `MetricsError` for NaiveLZGraph.
353+
354+
---
355+
356+
## Convenience
357+
358+
### compare_repertoires
359+
360+
All-in-one repertoire comparison returning a pandas Series of metrics.
361+
362+
```python
363+
from LZGraphs import compare_repertoires
364+
365+
result = compare_repertoires(graph1, graph2)
366+
print(result)
367+
```
368+
369+
!!! note "Function Signature"
370+
`compare_repertoires(graph1, graph2) -> pd.Series`
371+
372+
Returns: `js_divergence`, `transition_jsd`, `cross_entropy_1_2`, `cross_entropy_2_1`, `kl_divergence_1_2`, `kl_divergence_2_1`, `node_entropy_1`, `node_entropy_2`, `edge_entropy_1`, `edge_entropy_2`, `transition_predictability_1`, `transition_predictability_2`, `shared_nodes`, `shared_edges`, `jaccard_nodes`, `jaccard_edges`.
373+
374+
---
375+
226376
## See Also
227377

228378
- [Tutorials: Diversity Metrics](../tutorials/diversity-metrics.md)
229379
- [How-To: Compare Repertoires](../how-to/repertoire-comparison.md)
230380
- [Concepts: Probability Model](../concepts/probability-model.md)
381+
- [Example: Information-Theoretic Analysis](https://github.com/MuteJester/LZGraphs/blob/master/Examples/Information-Theoretic%20Analysis.ipynb)

docs/examples/index.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,17 @@ Use NaiveLZGraph for consistent feature vectors and cross-repertoire analysis.
5454
[:material-notebook: View Notebook](https://github.com/MuteJester/LZGraphs/blob/master/Examples/NaiveLZGraph%20Example.ipynb){ .md-button }
5555
</div>
5656

57+
<div class="example-card" markdown>
58+
59+
### Information-Theoretic Analysis
60+
61+
**Advanced repertoire characterization**
62+
63+
Transition predictability, compression ratio, path entropy rate, transition JSD, mutual information profiles, and repertoire fingerprinting.
64+
65+
[:material-notebook: View Notebook](https://github.com/MuteJester/LZGraphs/blob/master/Examples/Information-Theoretic%20Analysis.ipynb){ .md-button }
66+
</div>
67+
5768
</div>
5869

5970
## Running Notebooks Locally
@@ -148,6 +159,7 @@ print(f"V: {v_gene}, J: {j_gene}")
148159
| NDPLZGraph | Nucleotide encoding, double positions, gene analysis |
149160
| Metrics | K-diversity, entropy, perplexity, JS divergence |
150161
| NaiveLZGraph | Fixed dictionaries, eigenvector centrality, ML features |
162+
| Information-Theoretic Analysis | Transition predictability, compression ratio, path entropy, TMIP, transition JSD |
151163

152164
## Next Steps
153165

docs/how-to/repertoire-comparison.md

Lines changed: 54 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -271,9 +271,49 @@ print("Top V gene differences:")
271271
print(v_comp.sort_values('diff', key=abs, ascending=False).head())
272272
```
273273

274+
## Transition-Level Comparison
275+
276+
Standard JSD compares which subpatterns are used. **Transition JSD** compares how they connect, detecting structural differences even when subpattern frequencies are similar.
277+
278+
```python
279+
from LZGraphs import transition_jsd, transition_kl_divergence
280+
281+
# Symmetric, bounded [0, 1] — recommended for most use cases
282+
jsd_t = transition_jsd(graph1, graph2)
283+
print(f"Transition JSD: {jsd_t:.4f}")
284+
285+
# Asymmetric — use when you have a reference model
286+
kl_t = transition_kl_divergence(graph1, graph2) # Can be infinite
287+
print(f"Transition KL(1||2): {kl_t}")
288+
```
289+
290+
!!! tip "When to use transition-level metrics"
291+
Use `transition_jsd` instead of `jensen_shannon_divergence` when you suspect two repertoires use the same subpatterns but connect them differently, e.g. after clonal expansion creates dominant transition paths without changing overall subpattern frequencies.
292+
293+
## Quick Comparison with compare_repertoires
294+
295+
The `compare_repertoires` function computes all relevant metrics in one call:
296+
297+
```python
298+
from LZGraphs import compare_repertoires
299+
300+
result = compare_repertoires(graph1, graph2)
301+
print(result)
302+
# Returns a pandas Series with 16 metrics including:
303+
# js_divergence, transition_jsd, cross_entropy, kl_divergence,
304+
# node/edge entropy, transition_predictability, Jaccard similarity
305+
```
306+
274307
## Complete Comparison Pipeline
275308

276309
```python
310+
from LZGraphs import (
311+
AAPLZGraph, K1000_Diversity,
312+
node_entropy, jensen_shannon_divergence,
313+
transition_jsd, transition_predictability,
314+
compare_repertoires,
315+
)
316+
277317
def full_repertoire_comparison(data1, data2, name1="Rep1", name2="Rep2"):
278318
"""Complete comparison of two repertoires."""
279319

@@ -282,6 +322,9 @@ def full_repertoire_comparison(data1, data2, name1="Rep1", name2="Rep2"):
282322
graph1 = AAPLZGraph(data1, verbose=False)
283323
graph2 = AAPLZGraph(data2, verbose=False)
284324

325+
# Quick comparison (all metrics at once)
326+
result = compare_repertoires(graph1, graph2)
327+
285328
# Basic stats
286329
print(f"\n{'='*50}")
287330
print("BASIC STATISTICS")
@@ -293,8 +336,15 @@ def full_repertoire_comparison(data1, data2, name1="Rep1", name2="Rep2"):
293336
print(f"\n{'='*50}")
294337
print("DIVERGENCE")
295338
print(f"{'='*50}")
296-
jsd = jensen_shannon_divergence(graph1, graph2)
297-
print(f"Jensen-Shannon Divergence: {jsd:.4f}")
339+
print(f"Node-level JSD: {result['js_divergence']:.4f}")
340+
print(f"Transition-level JSD: {result['transition_jsd']:.4f}")
341+
342+
# Predictability
343+
print(f"\n{'='*50}")
344+
print("TRANSITION PREDICTABILITY")
345+
print(f"{'='*50}")
346+
print(f"{name1}: {result['transition_predictability_1']:.3f}")
347+
print(f"{name2}: {result['transition_predictability_2']:.3f}")
298348

299349
# Diversity
300350
print(f"\n{'='*50}")
@@ -307,17 +357,10 @@ def full_repertoire_comparison(data1, data2, name1="Rep1", name2="Rep2"):
307357
print(f"{name1} K1000: {k1:.1f}")
308358
print(f"{name2} K1000: {k2:.1f}")
309359

310-
# Entropy
311-
print(f"\n{'='*50}")
312-
print("ENTROPY")
313-
print(f"{'='*50}")
314-
print(f"{name1} node entropy: {node_entropy(graph1):.2f}")
315-
print(f"{name2} node entropy: {node_entropy(graph2):.2f}")
316-
317-
return graph1, graph2, jsd
360+
return graph1, graph2, result
318361

319362
# Run comparison
320-
g1, g2, jsd = full_repertoire_comparison(data1, data2, "Healthy", "Disease")
363+
g1, g2, metrics = full_repertoire_comparison(data1, data2, "Healthy", "Disease")
321364
```
322365

323366
## Next Steps

docs/resources/changelog.md

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,38 @@ This project follows [Semantic Versioning](https://semver.org/).
66

77
---
88

9-
## [Unreleased]
9+
## [2.1.0] - 2026
1010

1111
### Added
12+
- **Information-theoretic metrics** for advanced repertoire characterization:
13+
- `transition_predictability` — measures transition determinism, stable across sample sizes (~0.60 for AAPLZGraph)
14+
- `graph_compression_ratio` — quantifies path sharing efficiency (edge reuse)
15+
- `repertoire_compressibility_index` — compression-framed alias for transition predictability
16+
- `path_entropy_rate` — average bits per subpattern step via Monte Carlo
17+
- `transition_kl_divergence` — transition-level KL divergence between two graphs
18+
- `transition_jsd` — symmetric, bounded transition-level Jensen-Shannon divergence
19+
- `transition_mutual_information_profile` — position-specific MI along the CDR3 sequence
20+
- `compare_repertoires` now includes `transition_jsd`, `transition_predictability_1`, and `transition_predictability_2`
21+
- New example notebook: **Information-Theoretic Analysis** with full walkthrough and visualizations
22+
23+
---
24+
25+
## [2.0.0] - 2026
26+
27+
### Changed
28+
- **Breaking**: All internal modules renamed to snake_case (graphs/, metrics/, utilities/, mixins/, etc.)
29+
- Complete `EdgeData` refactor — raw counts as source of truth
30+
- `graph_union` rewritten to merge via `EdgeData.merge()` + `recalculate()`
31+
- Walk probability model consolidated into LZGraphBase
32+
- Laplace smoothing via `smoothing_alpha` parameter
33+
34+
### Added
35+
- `remove_sequence()` method on LZGraphBase
36+
- `recalculate()` method to recompute all derived state from raw counts
37+
- `to_networkx()` for external tool compatibility
38+
- `walk_log_probability` on all graph types
1239
- Professional documentation with MkDocs Material theme
13-
- Comprehensive tutorials and how-to guides
14-
- API reference documentation
15-
- FAQ and troubleshooting guide
40+
- Comprehensive tutorials, how-to guides, and API reference
1641

1742
---
1843

@@ -67,9 +92,15 @@ For the complete version history, see the [GitHub Releases](https://github.com/M
6792

6893
## Migration Guides
6994

70-
### Upgrading to 1.1.x
95+
### Upgrading to 2.1.x
96+
97+
No breaking changes from 2.0. New information-theoretic metrics are additive.
98+
99+
### Upgrading from 1.x to 2.0
71100

72-
No breaking changes. New features are additive.
101+
- All internal module paths changed to snake_case (e.g., `LZGraphs.graphs.amino_acid_positional`)
102+
- Edge data now uses `EdgeData` objects: access via `graph[a][b]['data'].weight`
103+
- Public class/function names unchanged — imports like `from LZGraphs import AAPLZGraph` still work
73104

74105
### Upgrading from Pre-1.0
75106

0 commit comments

Comments
 (0)