Releases · ksanyok/TextHumanize

31 May 23:17

ksanyok

v0.28.4

f57e6db

TextHumanize v0.28.4 Latest

Latest

Added

Explainable AI audit — added detect_ai_explain() with metric contributions, highlighted spans, sentence-level reports, mixed-content shares, calibration, confidence intervals, and suggested actions.
Unified watermark forensics — added watermark_report(), watermark_report_batch(), clean_safe(), and neutralise_aggressive() for Unicode, homoglyph, invisible-character, and statistical watermark risk reporting.
Promopilot-ready audit API — added audit_report() plus CLI/reporting paths for AI and watermark audit flows.
Strict and minimal humanization controls — added quality_gate="strict", minimal=True, --minimal, --only-flagged, and intent aliases for seo_article, landing_page, product_description, support_reply, academic, legal, and social_post.
Humanize explain metadata — humanize() now returns lightweight metrics_after["humanize_explain"] with top change reasons, remaining risks, sentence report, score delta, and quality summary.
Short commercial copy golden set — added regression coverage for landing, product, and support-copy flows used by Promopilot-style integrations.

Changed

GitHub CI stability — Python 3.12 now uses one parallel test run and keeps coverage as a local release check, avoiding hosted-runner coverage hangs while preserving full matrix validation.
Release verification baseline — local release checks now include full pytest (2105 passed), mypy, ruff, version sync, and coverage (80.09%).

Fixed

NumPy dtype stability in training v2 — NumpyMLP.forward() preserves float32 for sigmoid/tanh activations, fixing Python 3.12 mypy failures in CI.
Neural inference warnings — stabilized NumPy matmul paths in neural engine/LM code and covered the cleanup with regression tests.

Assets 2

05 Apr 12:08

ksanyok

v0.28.1

64b1e63

v0.28.1

Full Changelog: v0.28.0...v0.28.1

Assets 2

04 Apr 22:02

ksanyok

v0.28.0

7a524d0

v0.28.0

Full Changelog: v0.27.1...v0.28.0

Assets 2

02 Mar 00:14

ksanyok

v0.25.0

aa95633

v0.25.0

What's Changed

Bug Fixes

CRITICAL: Fixed naturalizer.py regex crash for RU/UK text (~50 patterns with non-capturing groups + backreferences). The entire naturalization stage was silently skipped.
Added thread-safety locks to _ai_cache and _AI_WORDS for multi-threaded usage.
Added division-by-zero guards in detector metric calculations.

Cleanup

Removed dead module tokenizer.py (replaced by sentence_split.py).
Removed 14 one-off diagnostic scripts, 4 outdated competitive analysis docs, debug artifacts.
Synced PHP and JS package versions to 0.25.0.

Documentation

Fixed 17-stage to 20-stage across all 15+ documentation files.
Corrected test counts, LOC claims, speed benchmarks for consistency.
Fixed CHANGELOG date chronology.

CI

Raised per-test timeout from 120s to 300s to prevent false failures on slow CI runners.

Assets 2

01 Mar 00:32

ksanyok

v0.24.0

df23965

v0.24.0 — Deep Humanization for EN/RU/UK

v0.24.0: deep humanization improvements for EN/RU/UK

Neural detector:
- Per-language feature normalization (RU/UK char_entropy baseline 4.8-4.9 vs 4.3 EN)
- Expanded RU/UK conjunctions, transitions, AI word sets for MLP features 27/33/34

Naturalizer:
- Transition-phrase deletion (22 EN / 23 RU / 23 UK patterns)
- Em-dash injection (_comma_to_dash + _insert_dash_aside)
- Aggressive burstiness (threshold 25→16-20, fragment insertion strategy)
- Light perplexity boost (rhetorical questions for formal profiles)
- Paragraph splitting (5+ sentence paragraphs)
- +30 EN word simplification entries

Pipeline:
- Intensity cap raised 70→85, multipliers 1.15→1.20/1.1→1.15
- Stage 13a: final entropy re-injection post-grammar/coherence

Results (local backend, 3-sentence AI text, intensity=60):
  EN: 0.920 → 0.372 (human)
  RU: 0.880 → 0.390 (human)
  UK: 0.840 → 0.351 (human)

All 1984 tests pass.

Assets 4

28 Feb 22:04

ksanyok

v0.23.0

af319f8

v0.23.0 - OSS LLM Backend, PyPI Publication

What's New

Backend Parameter

New backend parameter: local (default), oss, openai, auto
OSS backend: Free AI humanization via amd/gpt-oss-120b-chatbot on HuggingFace Spaces
OpenAI backend: Optional paid backend using GPT-4o-mini
Auto mode: Tries OSS then OpenAI then local fallback

Install

pip install texthumanize==0.23.0

Usage

from texthumanize import humanize
result = humanize('AI text', backend='oss')

Full Changelog: v0.15.0...v0.23.0

Assets 4

26 Feb 20:31

ksanyok

v0.15.0

cfc7136

v0.15.0 — Full Audit Closure: 9 New Modules

What's New

9 New Core Modules

ai_backend — Three-tier AI backend: OpenAI API → OSS Gradio model (rate-limited) → built-in rules. New humanize_ai() function.
pos_tagger — Rule-based POS tagger for EN (500+ exceptions), RU/UK (200+), DE (300+). Universal tagset.
cjk_segmenter — Chinese BiMM (2504 entries), Japanese character-type, Korean space+particle segmentation.
syntax_rewriter — 8 sentence-level transforms (active↔passive, clause inversion, enumeration reorder, adverb migration). 150+ irregular verbs.
statistical_detector — 35-feature ML classifier for AI text detection. Integrated into detect_ai() with 60/40 weighted merge.
word_lm — Word-level unigram/bigram language model for 14 languages. Perplexity, burstiness, naturalness scoring.
collocation_engine — PMI-based collocation scoring for context-aware synonym selection. EN ~130, RU ~30, DE ~20 collocations.
fingerprint_randomizer — Anti-fingerprint diversification for output variety.
benchmark_suite — 6-dimension automated quality benchmarking.

Pipeline & Detection

Pipeline expanded to 17 stages (added syntax rewriting + anti-fingerprint diversification)
detect_ai() now returns combined_score (statistical + heuristic)
Fixed NO-OP _reduce_adjacent_repeats() — now actually removes repetitions

Tests

1,696 tests — 92 new, all passing (100% pass rate)

Assets 2

26 Feb 17:22

ksanyok

v0.14.0

7314c37

v0.14.0

v0.14.0 -- Reliability, Analysis Tools & New APIs

New API Functions

humanize_sentences() -- per-sentence AI scoring with graduated intensity; only rewrites sentences above a configurable AI probability threshold
humanize_variants() -- generates 1-10 humanization variants with different random seeds, sorted by quality
humanize_stream() -- generator that yields humanized text chunk-by-chunk with progress tracking

New Analysis Modules (zero-dependency, offline)

perplexity_v2 -- character-level trigram cross-entropy model with cross_entropy() and perplexity_score() returning naturalness score (0-100) and verdict
dict_trainer -- corpus analysis for custom dictionary building with train_from_corpus() and export_custom_dict()
plagiarism -- offline originality detection via n-gram fingerprinting with check_originality() and compare_originality()

Pipeline Improvements

Error isolation -- each processing stage wrapped in _safe_stage() with try/except; failing stages are skipped gracefully instead of crashing the pipeline
Partial rollback -- pipeline records checkpoints after each stage; on validation failure, rolls back stage-by-stage to find the last valid state
Pipeline profiling -- stage_timings dict and total_time included in metrics_after for performance analysis

Bug Fixes & Code Quality

Fixed adversarial_calibrate intensity parameter (float 0-1 changed to int 0-100 to match API)
Added input sanitization: TypeError for non-str, ValueError for >500K chars, early return for empty text
Thread-safe lazy loading with double-checked locking on all module loaders
Instance-level plugins preventing cross-instance interference
Fixed humanize_sentences crash (detect_ai_sentences returns list, not dict)

Tests

1,604 tests -- up from 1,560 (44 new tests for all v0.14.0 features)
100% pass rate

Assets 2

26 Feb 16:31

ksanyok

v0.13.0

e2b2988

v0.13.0 — 16-Stage Pipeline, Grammar & Tone & Readability & Coherence

TextHumanize v0.13.0

4 new pipeline stages (12 to 16):

Tone harmonization — match text tone to profile (academic/blog/seo/casual)
Readability optimization — split complex sentences, join short ones
Grammar correction — fix doubled words, spacing, typos (9 languages)
Coherence repair — transitions between paragraphs, diversify openings

Dictionary expansion (~3,600 new entries):

EN: +475 | RU: +430 | UK: +337
DE/ES/FR/IT/PL/PT: ~235 each
AR/ZH/JA/KO/TR: ~205 each
Total: ~13,800 entries across 14 languages

Tests: 1,560 (all passing)

Full changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

Assets 2

26 Feb 14:54

ksanyok

v0.12.0

121d079

v0.12.0 — 14 Languages, Placeholder Safety, Watermark Pipeline

What's New

5 New Languages (14 total)

Arabic (ar) — 81 bureaucratic, 80 synonyms, 49 AI connectors, 47 abbreviations
Chinese Simplified (zh) — 80 bureaucratic, 80 synonyms, 36 AI connectors
Japanese (ja) — 60+ per category, keigo to casual register replacements
Korean (ko) — 60+ per category, honorific to casual register
Turkish (tr) — 60+ per category, Ottoman to modern Turkish

Critical Bug Fixes

Placeholder safety — all 6 processing modules now skip placeholder tokens; no more leaked placeholders in output
3-pass restore() — exact match, case-insensitive, orphan cleanup
HTML block protection — ul, ol, table, pre, blockquote preserved as single segments
Bare domain protection — site.com.ua, portal.kh.ua, example.co.uk etc.
Homoglyph fix — removed Cyrillic characters from special homoglyphs table (was corrupting all Cyrillic text)

Pipeline Improvements

Watermark cleaning — automatic first stage (12 stages total), removes zero-width chars, homoglyphs, invisible Unicode
Language detection — Arabic/CJK/Turkish script detection added

Tests

1,509 tests passed (54 new)

Assets 2

Releases: ksanyok/TextHumanize

TextHumanize v0.28.4

Added

Changed

Fixed

Uh oh!

v0.28.1

Uh oh!

v0.28.0

Uh oh!

v0.25.0

What's Changed

Bug Fixes

Cleanup

Documentation

CI

Uh oh!

v0.24.0 — Deep Humanization for EN/RU/UK

Uh oh!

v0.23.0 - OSS LLM Backend, PyPI Publication

What's New

Backend Parameter

Install

Usage

Uh oh!

v0.15.0 — Full Audit Closure: 9 New Modules

What's New

9 New Core Modules

Pipeline & Detection

Tests

Uh oh!

v0.14.0

v0.14.0 -- Reliability, Analysis Tools & New APIs

New API Functions

New Analysis Modules (zero-dependency, offline)

Pipeline Improvements

Bug Fixes & Code Quality

Tests

Uh oh!

v0.13.0 — 16-Stage Pipeline, Grammar & Tone & Readability & Coherence

TextHumanize v0.13.0

4 new pipeline stages (12 to 16):

Dictionary expansion (~3,600 new entries):

Tests: 1,560 (all passing)

Uh oh!

v0.12.0 — 14 Languages, Placeholder Safety, Watermark Pipeline

What's New

5 New Languages (14 total)

Critical Bug Fixes

Pipeline Improvements

Tests

Uh oh!