Releases: ksanyok/TextHumanize
Releases · ksanyok/TextHumanize
TextHumanize v0.28.4
Added
- Explainable AI audit — added
detect_ai_explain()with metric contributions, highlighted spans, sentence-level reports, mixed-content shares, calibration, confidence intervals, and suggested actions. - Unified watermark forensics — added
watermark_report(),watermark_report_batch(),clean_safe(), andneutralise_aggressive()for Unicode, homoglyph, invisible-character, and statistical watermark risk reporting. - Promopilot-ready audit API — added
audit_report()plus CLI/reporting paths for AI and watermark audit flows. - Strict and minimal humanization controls — added
quality_gate="strict",minimal=True,--minimal,--only-flagged, and intent aliases forseo_article,landing_page,product_description,support_reply,academic,legal, andsocial_post. - Humanize explain metadata —
humanize()now returns lightweightmetrics_after["humanize_explain"]with top change reasons, remaining risks, sentence report, score delta, and quality summary. - Short commercial copy golden set — added regression coverage for landing, product, and support-copy flows used by Promopilot-style integrations.
Changed
- GitHub CI stability — Python 3.12 now uses one parallel test run and keeps coverage as a local release check, avoiding hosted-runner coverage hangs while preserving full matrix validation.
- Release verification baseline — local release checks now include full pytest (
2105 passed),mypy,ruff, version sync, and coverage (80.09%).
Fixed
- NumPy dtype stability in training v2 —
NumpyMLP.forward()preservesfloat32for sigmoid/tanh activations, fixing Python 3.12mypyfailures in CI. - Neural inference warnings — stabilized NumPy matmul paths in neural engine/LM code and covered the cleanup with regression tests.
v0.28.1
Full Changelog: v0.28.0...v0.28.1
v0.28.0
Full Changelog: v0.27.1...v0.28.0
v0.25.0
What's Changed
Bug Fixes
- CRITICAL: Fixed naturalizer.py regex crash for RU/UK text (~50 patterns with non-capturing groups + backreferences). The entire naturalization stage was silently skipped.
- Added thread-safety locks to
_ai_cacheand_AI_WORDSfor multi-threaded usage. - Added division-by-zero guards in detector metric calculations.
Cleanup
- Removed dead module
tokenizer.py(replaced bysentence_split.py). - Removed 14 one-off diagnostic scripts, 4 outdated competitive analysis docs, debug artifacts.
- Synced PHP and JS package versions to 0.25.0.
Documentation
- Fixed 17-stage to 20-stage across all 15+ documentation files.
- Corrected test counts, LOC claims, speed benchmarks for consistency.
- Fixed CHANGELOG date chronology.
CI
- Raised per-test timeout from 120s to 300s to prevent false failures on slow CI runners.
v0.24.0 — Deep Humanization for EN/RU/UK
v0.24.0: deep humanization improvements for EN/RU/UK Neural detector: - Per-language feature normalization (RU/UK char_entropy baseline 4.8-4.9 vs 4.3 EN) - Expanded RU/UK conjunctions, transitions, AI word sets for MLP features 27/33/34 Naturalizer: - Transition-phrase deletion (22 EN / 23 RU / 23 UK patterns) - Em-dash injection (_comma_to_dash + _insert_dash_aside) - Aggressive burstiness (threshold 25→16-20, fragment insertion strategy) - Light perplexity boost (rhetorical questions for formal profiles) - Paragraph splitting (5+ sentence paragraphs) - +30 EN word simplification entries Pipeline: - Intensity cap raised 70→85, multipliers 1.15→1.20/1.1→1.15 - Stage 13a: final entropy re-injection post-grammar/coherence Results (local backend, 3-sentence AI text, intensity=60): EN: 0.920 → 0.372 (human) RU: 0.880 → 0.390 (human) UK: 0.840 → 0.351 (human) All 1984 tests pass.
v0.23.0 - OSS LLM Backend, PyPI Publication
What's New
Backend Parameter
- New backend parameter: local (default), oss, openai, auto
- OSS backend: Free AI humanization via amd/gpt-oss-120b-chatbot on HuggingFace Spaces
- OpenAI backend: Optional paid backend using GPT-4o-mini
- Auto mode: Tries OSS then OpenAI then local fallback
Install
pip install texthumanize==0.23.0
Usage
from texthumanize import humanize
result = humanize('AI text', backend='oss')
Full Changelog: v0.15.0...v0.23.0
v0.15.0 — Full Audit Closure: 9 New Modules
What's New
9 New Core Modules
ai_backend— Three-tier AI backend: OpenAI API → OSS Gradio model (rate-limited) → built-in rules. Newhumanize_ai()function.pos_tagger— Rule-based POS tagger for EN (500+ exceptions), RU/UK (200+), DE (300+). Universal tagset.cjk_segmenter— Chinese BiMM (2504 entries), Japanese character-type, Korean space+particle segmentation.syntax_rewriter— 8 sentence-level transforms (active↔passive, clause inversion, enumeration reorder, adverb migration). 150+ irregular verbs.statistical_detector— 35-feature ML classifier for AI text detection. Integrated intodetect_ai()with 60/40 weighted merge.word_lm— Word-level unigram/bigram language model for 14 languages. Perplexity, burstiness, naturalness scoring.collocation_engine— PMI-based collocation scoring for context-aware synonym selection. EN ~130, RU ~30, DE ~20 collocations.fingerprint_randomizer— Anti-fingerprint diversification for output variety.benchmark_suite— 6-dimension automated quality benchmarking.
Pipeline & Detection
- Pipeline expanded to 17 stages (added syntax rewriting + anti-fingerprint diversification)
detect_ai()now returnscombined_score(statistical + heuristic)- Fixed NO-OP
_reduce_adjacent_repeats()— now actually removes repetitions
Tests
- 1,696 tests — 92 new, all passing (100% pass rate)
v0.14.0
v0.14.0 -- Reliability, Analysis Tools & New APIs
New API Functions
humanize_sentences()-- per-sentence AI scoring with graduated intensity; only rewrites sentences above a configurable AI probability thresholdhumanize_variants()-- generates 1-10 humanization variants with different random seeds, sorted by qualityhumanize_stream()-- generator that yields humanized text chunk-by-chunk with progress tracking
New Analysis Modules (zero-dependency, offline)
perplexity_v2-- character-level trigram cross-entropy model withcross_entropy()andperplexity_score()returning naturalness score (0-100) and verdictdict_trainer-- corpus analysis for custom dictionary building withtrain_from_corpus()andexport_custom_dict()plagiarism-- offline originality detection via n-gram fingerprinting withcheck_originality()andcompare_originality()
Pipeline Improvements
- Error isolation -- each processing stage wrapped in
_safe_stage()with try/except; failing stages are skipped gracefully instead of crashing the pipeline - Partial rollback -- pipeline records checkpoints after each stage; on validation failure, rolls back stage-by-stage to find the last valid state
- Pipeline profiling --
stage_timingsdict andtotal_timeincluded inmetrics_afterfor performance analysis
Bug Fixes & Code Quality
- Fixed
adversarial_calibrateintensity parameter (float 0-1 changed to int 0-100 to match API) - Added input sanitization: TypeError for non-str, ValueError for >500K chars, early return for empty text
- Thread-safe lazy loading with double-checked locking on all module loaders
- Instance-level plugins preventing cross-instance interference
- Fixed
humanize_sentencescrash (detect_ai_sentences returns list, not dict)
Tests
- 1,604 tests -- up from 1,560 (44 new tests for all v0.14.0 features)
- 100% pass rate
v0.13.0 — 16-Stage Pipeline, Grammar & Tone & Readability & Coherence
TextHumanize v0.13.0
4 new pipeline stages (12 to 16):
- Tone harmonization — match text tone to profile (academic/blog/seo/casual)
- Readability optimization — split complex sentences, join short ones
- Grammar correction — fix doubled words, spacing, typos (9 languages)
- Coherence repair — transitions between paragraphs, diversify openings
Dictionary expansion (~3,600 new entries):
- EN: +475 | RU: +430 | UK: +337
- DE/ES/FR/IT/PL/PT: ~235 each
- AR/ZH/JA/KO/TR: ~205 each
- Total: ~13,800 entries across 14 languages
Tests: 1,560 (all passing)
Full changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md
v0.12.0 — 14 Languages, Placeholder Safety, Watermark Pipeline
What's New
5 New Languages (14 total)
- Arabic (ar) — 81 bureaucratic, 80 synonyms, 49 AI connectors, 47 abbreviations
- Chinese Simplified (zh) — 80 bureaucratic, 80 synonyms, 36 AI connectors
- Japanese (ja) — 60+ per category, keigo to casual register replacements
- Korean (ko) — 60+ per category, honorific to casual register
- Turkish (tr) — 60+ per category, Ottoman to modern Turkish
Critical Bug Fixes
- Placeholder safety — all 6 processing modules now skip placeholder tokens; no more leaked placeholders in output
- 3-pass restore() — exact match, case-insensitive, orphan cleanup
- HTML block protection — ul, ol, table, pre, blockquote preserved as single segments
- Bare domain protection — site.com.ua, portal.kh.ua, example.co.uk etc.
- Homoglyph fix — removed Cyrillic characters from special homoglyphs table (was corrupting all Cyrillic text)
Pipeline Improvements
- Watermark cleaning — automatic first stage (12 stages total), removes zero-width chars, homoglyphs, invisible Unicode
- Language detection — Arabic/CJK/Turkish script detection added
Tests
- 1,509 tests passed (54 new)