Skip to content

Releases: ksanyok/TextHumanize

TextHumanize v0.28.4

31 May 23:17

Choose a tag to compare

Added

  • Explainable AI audit — added detect_ai_explain() with metric contributions, highlighted spans, sentence-level reports, mixed-content shares, calibration, confidence intervals, and suggested actions.
  • Unified watermark forensics — added watermark_report(), watermark_report_batch(), clean_safe(), and neutralise_aggressive() for Unicode, homoglyph, invisible-character, and statistical watermark risk reporting.
  • Promopilot-ready audit API — added audit_report() plus CLI/reporting paths for AI and watermark audit flows.
  • Strict and minimal humanization controls — added quality_gate="strict", minimal=True, --minimal, --only-flagged, and intent aliases for seo_article, landing_page, product_description, support_reply, academic, legal, and social_post.
  • Humanize explain metadatahumanize() now returns lightweight metrics_after["humanize_explain"] with top change reasons, remaining risks, sentence report, score delta, and quality summary.
  • Short commercial copy golden set — added regression coverage for landing, product, and support-copy flows used by Promopilot-style integrations.

Changed

  • GitHub CI stability — Python 3.12 now uses one parallel test run and keeps coverage as a local release check, avoiding hosted-runner coverage hangs while preserving full matrix validation.
  • Release verification baseline — local release checks now include full pytest (2105 passed), mypy, ruff, version sync, and coverage (80.09%).

Fixed

  • NumPy dtype stability in training v2NumpyMLP.forward() preserves float32 for sigmoid/tanh activations, fixing Python 3.12 mypy failures in CI.
  • Neural inference warnings — stabilized NumPy matmul paths in neural engine/LM code and covered the cleanup with regression tests.

v0.28.1

05 Apr 12:08

Choose a tag to compare

Full Changelog: v0.28.0...v0.28.1

v0.28.0

04 Apr 22:02

Choose a tag to compare

Full Changelog: v0.27.1...v0.28.0

v0.25.0

02 Mar 00:14

Choose a tag to compare

What's Changed

Bug Fixes

  • CRITICAL: Fixed naturalizer.py regex crash for RU/UK text (~50 patterns with non-capturing groups + backreferences). The entire naturalization stage was silently skipped.
  • Added thread-safety locks to _ai_cache and _AI_WORDS for multi-threaded usage.
  • Added division-by-zero guards in detector metric calculations.

Cleanup

  • Removed dead module tokenizer.py (replaced by sentence_split.py).
  • Removed 14 one-off diagnostic scripts, 4 outdated competitive analysis docs, debug artifacts.
  • Synced PHP and JS package versions to 0.25.0.

Documentation

  • Fixed 17-stage to 20-stage across all 15+ documentation files.
  • Corrected test counts, LOC claims, speed benchmarks for consistency.
  • Fixed CHANGELOG date chronology.

CI

  • Raised per-test timeout from 120s to 300s to prevent false failures on slow CI runners.

v0.24.0 — Deep Humanization for EN/RU/UK

01 Mar 00:32

Choose a tag to compare

v0.24.0: deep humanization improvements for EN/RU/UK

Neural detector:
- Per-language feature normalization (RU/UK char_entropy baseline 4.8-4.9 vs 4.3 EN)
- Expanded RU/UK conjunctions, transitions, AI word sets for MLP features 27/33/34

Naturalizer:
- Transition-phrase deletion (22 EN / 23 RU / 23 UK patterns)
- Em-dash injection (_comma_to_dash + _insert_dash_aside)
- Aggressive burstiness (threshold 25→16-20, fragment insertion strategy)
- Light perplexity boost (rhetorical questions for formal profiles)
- Paragraph splitting (5+ sentence paragraphs)
- +30 EN word simplification entries

Pipeline:
- Intensity cap raised 70→85, multipliers 1.15→1.20/1.1→1.15
- Stage 13a: final entropy re-injection post-grammar/coherence

Results (local backend, 3-sentence AI text, intensity=60):
  EN: 0.920 → 0.372 (human)
  RU: 0.880 → 0.390 (human)
  UK: 0.840 → 0.351 (human)

All 1984 tests pass.

v0.23.0 - OSS LLM Backend, PyPI Publication

28 Feb 22:04

Choose a tag to compare

What's New

Backend Parameter

  • New backend parameter: local (default), oss, openai, auto
  • OSS backend: Free AI humanization via amd/gpt-oss-120b-chatbot on HuggingFace Spaces
  • OpenAI backend: Optional paid backend using GPT-4o-mini
  • Auto mode: Tries OSS then OpenAI then local fallback

Install

pip install texthumanize==0.23.0

Usage

from texthumanize import humanize
result = humanize('AI text', backend='oss')

Full Changelog: v0.15.0...v0.23.0

v0.15.0 — Full Audit Closure: 9 New Modules

26 Feb 20:31

Choose a tag to compare

What's New

9 New Core Modules

  • ai_backend — Three-tier AI backend: OpenAI API → OSS Gradio model (rate-limited) → built-in rules. New humanize_ai() function.
  • pos_tagger — Rule-based POS tagger for EN (500+ exceptions), RU/UK (200+), DE (300+). Universal tagset.
  • cjk_segmenter — Chinese BiMM (2504 entries), Japanese character-type, Korean space+particle segmentation.
  • syntax_rewriter — 8 sentence-level transforms (active↔passive, clause inversion, enumeration reorder, adverb migration). 150+ irregular verbs.
  • statistical_detector — 35-feature ML classifier for AI text detection. Integrated into detect_ai() with 60/40 weighted merge.
  • word_lm — Word-level unigram/bigram language model for 14 languages. Perplexity, burstiness, naturalness scoring.
  • collocation_engine — PMI-based collocation scoring for context-aware synonym selection. EN ~130, RU ~30, DE ~20 collocations.
  • fingerprint_randomizer — Anti-fingerprint diversification for output variety.
  • benchmark_suite — 6-dimension automated quality benchmarking.

Pipeline & Detection

  • Pipeline expanded to 17 stages (added syntax rewriting + anti-fingerprint diversification)
  • detect_ai() now returns combined_score (statistical + heuristic)
  • Fixed NO-OP _reduce_adjacent_repeats() — now actually removes repetitions

Tests

  • 1,696 tests — 92 new, all passing (100% pass rate)

v0.14.0

26 Feb 17:22

Choose a tag to compare

v0.14.0 -- Reliability, Analysis Tools & New APIs

New API Functions

  • humanize_sentences() -- per-sentence AI scoring with graduated intensity; only rewrites sentences above a configurable AI probability threshold
  • humanize_variants() -- generates 1-10 humanization variants with different random seeds, sorted by quality
  • humanize_stream() -- generator that yields humanized text chunk-by-chunk with progress tracking

New Analysis Modules (zero-dependency, offline)

  • perplexity_v2 -- character-level trigram cross-entropy model with cross_entropy() and perplexity_score() returning naturalness score (0-100) and verdict
  • dict_trainer -- corpus analysis for custom dictionary building with train_from_corpus() and export_custom_dict()
  • plagiarism -- offline originality detection via n-gram fingerprinting with check_originality() and compare_originality()

Pipeline Improvements

  • Error isolation -- each processing stage wrapped in _safe_stage() with try/except; failing stages are skipped gracefully instead of crashing the pipeline
  • Partial rollback -- pipeline records checkpoints after each stage; on validation failure, rolls back stage-by-stage to find the last valid state
  • Pipeline profiling -- stage_timings dict and total_time included in metrics_after for performance analysis

Bug Fixes & Code Quality

  • Fixed adversarial_calibrate intensity parameter (float 0-1 changed to int 0-100 to match API)
  • Added input sanitization: TypeError for non-str, ValueError for >500K chars, early return for empty text
  • Thread-safe lazy loading with double-checked locking on all module loaders
  • Instance-level plugins preventing cross-instance interference
  • Fixed humanize_sentences crash (detect_ai_sentences returns list, not dict)

Tests

  • 1,604 tests -- up from 1,560 (44 new tests for all v0.14.0 features)
  • 100% pass rate

v0.13.0 — 16-Stage Pipeline, Grammar & Tone & Readability & Coherence

26 Feb 16:31

Choose a tag to compare

TextHumanize v0.13.0

4 new pipeline stages (12 to 16):

  • Tone harmonization — match text tone to profile (academic/blog/seo/casual)
  • Readability optimization — split complex sentences, join short ones
  • Grammar correction — fix doubled words, spacing, typos (9 languages)
  • Coherence repair — transitions between paragraphs, diversify openings

Dictionary expansion (~3,600 new entries):

  • EN: +475 | RU: +430 | UK: +337
  • DE/ES/FR/IT/PL/PT: ~235 each
  • AR/ZH/JA/KO/TR: ~205 each
  • Total: ~13,800 entries across 14 languages

Tests: 1,560 (all passing)

Full changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

v0.12.0 — 14 Languages, Placeholder Safety, Watermark Pipeline

26 Feb 14:54

Choose a tag to compare

What's New

5 New Languages (14 total)

  • Arabic (ar) — 81 bureaucratic, 80 synonyms, 49 AI connectors, 47 abbreviations
  • Chinese Simplified (zh) — 80 bureaucratic, 80 synonyms, 36 AI connectors
  • Japanese (ja) — 60+ per category, keigo to casual register replacements
  • Korean (ko) — 60+ per category, honorific to casual register
  • Turkish (tr) — 60+ per category, Ottoman to modern Turkish

Critical Bug Fixes

  • Placeholder safety — all 6 processing modules now skip placeholder tokens; no more leaked placeholders in output
  • 3-pass restore() — exact match, case-insensitive, orphan cleanup
  • HTML block protection — ul, ol, table, pre, blockquote preserved as single segments
  • Bare domain protection — site.com.ua, portal.kh.ua, example.co.uk etc.
  • Homoglyph fix — removed Cyrillic characters from special homoglyphs table (was corrupting all Cyrillic text)

Pipeline Improvements

  • Watermark cleaning — automatic first stage (12 stages total), removes zero-width chars, homoglyphs, invisible Unicode
  • Language detection — Arabic/CJK/Turkish script detection added

Tests

  • 1,509 tests passed (54 new)