AI Image Detector

A small, friendly open-source detector for AI-generated images. It is designed in the yt-dlp / rembg spirit: install it, run one command, get a probability and a reproducible report.

AI image detection is probabilistic. Treat the output as one signal, not as proof.

Model Choice

The default backend is UnivFD / UniversalFakeDetect: CLIP ViT-L/14 image features plus a tiny linear fake/real head. This is a strong practical default because the task-specific weight is tiny, the code path is understandable, and the CVPR 2023 paper showed good cross-generator generalization compared with older GAN-trained detectors.

This repo also ships hybrid, nonescape-mini, sentry-convnext-small, hybrid-plus, and ultra backends. hybrid blends UnivFD with a lightweight Hugging Face classifier, nonescape-mini and sentry-convnext-small adapt external open-source detectors, hybrid-plus ensembles our internal hybrid with Nonescape, and ultra adds Sentry on top. This has become the strongest practical route so far without training a new model from scratch.

The benchmark commands also support post-hoc threshold calibration objectives such as balanced_accuracy and f1. In practice, this has been one of the most effective low-risk levers for improving held-out performance.

Recent research has moved further. AIDE combines CLIP semantics with low-level frequency/noise features and reports gains on GenImage and AIGCDetectBenchmark. That is a good research target for a future backend, but UnivFD is currently the simplest robust default for an installable open-source tool.

Useful references:

UniversalFakeDetect paper: https://openaccess.thecvf.com/content/CVPR2023/html/Ojha_Towards_Universal_Fake_Image_Detectors_That_Generalize_Across_Generative_Models_CVPR_2023_paper.html
UniversalFakeDetect code: https://github.com/WisconsinAIVision/UniversalFakeDetect
AIDE paper: https://arxiv.org/abs/2406.19435
GenImage benchmark: https://github.com/GenImage-Dataset/GenImage
Tiny-GenImage runnable subset: https://huggingface.co/datasets/TheKernel01/Tiny-GenImage
CIFAKE benchmark paper: https://huggingface.co/papers/2303.14126

Install

Use Python 3.10+.

python -m venv .venv
source .venv/bin/activate
pip install -e .

Optional extras:

pip install -e '.[eval]'      # Hugging Face dataset benchmarks
pip install -e '.[hf]'        # generic Hugging Face image-classification backend
pip install -e '.[api]'       # FastAPI server
pip install -e '.[web]'       # Gradio UI
pip install -e '.[dev]'       # tests and linting

CLI Usage

Detect one image:

aidetect detect image.jpg

Detect a folder recursively:

aidetect detect ./images --csv report.csv

JSON lines output:

aidetect detect ./images --json

Use a Hugging Face image-classification model instead of UnivFD:

aidetect detect image.jpg --backend hf --hf-model capcheck/ai-image-detection

The generic hf backend expects a standard Transformers image-classification checkpoint. Some open-source detectors publish custom repos that need a dedicated adapter instead of --backend hf.

Use the hybrid backend:

aidetect detect image.jpg --backend hybrid --hybrid-univfd-weight 0.8

Use the external Nonescape Mini adapter directly:

aidetect detect image.jpg --backend nonescape-mini

Use the strongest current ensemble:

aidetect detect image.jpg --backend ultra

Python API

from aidetector import create_detector

detector = create_detector("univfd", device="auto")
result = detector.predict_path("image.jpg")
print(result.as_dict())

Web UI

pip install -e '.[web]'
aidetect serve

FastAPI

pip install -e '.[api]'
aidetect api --host 127.0.0.1 --port 8000

Then call:

curl -F "file=@image.jpg" http://127.0.0.1:8000/detect

Benchmarks

Evaluate a GenImage-style folder where nature/ contains real images and ai/ contains generated images:

aidetect benchmark-folder /path/to/GenImage/Midjourney/val \
  --real-dir nature \
  --fake-dir ai \
  --output benchmarks/midjourney-val.json

Evaluate a Hugging Face dataset such as Tiny-GenImage:

pip install -e '.[eval]'
aidetect benchmark-hf TheKernel01/Tiny-GenImage \
  --split validation \
  --image-field image \
  --label-field label \
  --fake-label 1 \
  --max-samples 200 \
  --output benchmarks/tiny-genimage-univfd-200.json

The JSON report includes accuracy, balanced accuracy, precision, recall, F1, ROC AUC, confusion counts, a diagnostic threshold sweep, model metadata, dataset metadata, and per-image predictions.

For more defensible evaluation, calibrate a threshold on one split and evaluate on another:

aidetect benchmark-calibrated-folder /path/to/exported-folder \
  --backend univfd \
  --output benchmarks/univfd-calibrated.json

For multi-shard Tiny-GenImage evaluation with per-generator slices:

aidetect benchmark-tiny-genimage-local \
  /path/to/validation-00000-of-00004.parquet \
  /path/to/validation-00001-of-00004.parquet \
  /path/to/validation-00002-of-00004.parquet \
  /path/to/validation-00003-of-00004.parquet \
  --backend ultra \
  --optimize-metric f1 \
  --max-per-class-per-shard 100 \
  --output benchmarks/tiny-genimage-ultra-800-f1.json

If Hugging Face dataset metadata requests are flaky, you can work from a local Tiny-GenImage parquet shard:

aidetect prepare-tiny-genimage .cache/tiny-genimage-validation-200 \
  --local-parquet /path/to/validation-00000-of-00004.parquet \
  --max-per-class 100

aidetect benchmark-calibrated-folder .cache/tiny-genimage-validation-200 \
  --backend univfd \
  --real-dir real \
  --fake-dir ai \
  --output benchmarks/tiny-genimage-univfd-calibrated-200.json

Current local benchmark evidence is split into two levels.

Smoke benchmark on Tiny-GenImage validation shard data/validation-00000-of-00004.parquet, 20 real + 20 fake images:

Backend	Threshold	Accuracy	Balanced Acc	F1	ROC AUC	Images/s
UnivFD / CLIP ViT-L/14	0.5	0.500	0.500	0.000	0.715	2.31
capcheck/ai-image-detection	0.5	0.600	0.600	0.692	0.743	32.03

Calibrated hold-out benchmark on the same shard family, exported as 100 real + 100 fake images and split deterministically into calibration/test sets:

Backend	Calibration	Test Accuracy	Test Balanced Acc	Test F1	Test ROC AUC
UnivFD / CLIP ViT-L/14	threshold-only	0.760	0.760	0.721	0.811
Hybrid (UnivFD 0.8 + HF 0.2)	threshold + blend weight	0.670	0.670	0.629	0.752
capcheck/ai-image-detection	threshold-only	0.580	0.580	0.580	0.610

Interpretation:

The 40-image run is only a smoke test.
The 200-image calibrated split is a stronger local benchmark because threshold selection happens on a separate calibration split before the test split is scored.
It is still not a publication-grade claim. It is one shard, one deterministic split, and one local environment.
These calibrated runs were executed on CPU in this workspace.

Current strongest local benchmark, calibrated on 4 Tiny-GenImage validation shards with up to 100 real + 100 fake images sampled per shard:

Backend	Test N	Test Accuracy	Test Balanced Acc	Precision	Recall	Test F1	Test ROC AUC
Ultra (`hybrid-plus` + `sentry-convnext-small`), `optimize=f1`	400	0.858	0.858	0.878	0.830	0.853	0.916
Sentry ConvNeXt Small, `optimize=f1`	400	0.835	0.835	0.842	0.825	0.833	0.911
Hybrid-plus (`hybrid` + `nonescape-mini`), `optimize=f1`	400	0.825	0.825	0.828	0.820	0.824	0.891
Hybrid (UnivFD 0.85 + HF 0.15), `optimize=f1`	400	0.773	0.773	0.779	0.760	0.770	0.843
Hybrid (UnivFD 0.85 + HF 0.15), `optimize=balanced_accuracy`	400	0.745	0.745	0.802	0.650	0.718	0.843
Nonescape Mini, `optimize=f1`	400	0.772	0.772	0.772	0.775	0.773	0.810
UnivFD / CLIP ViT-L/14	300	0.690	0.690	0.806	0.500	0.617	0.784

The important takeaway is that external detector ensembling helped more than any single internal threshold tweak. optimize=f1 still mattered, but the biggest jump came from combining our internal hybrid path with two external open-source detectors, first Nonescape and then Sentry.

Selected generator-vs-real slices from that same held-out split:

Generator	N	Accuracy	Balanced Acc	F1	ROC AUC
BigGAN vs Real	231	0.831	0.834	0.571	0.940
ADM vs Real	232	0.815	0.774	0.517	0.847
GLIDE vs Real	227	0.850	0.915	0.614	0.973
Midjourney vs Real	230	0.843	0.882	0.609	0.960
SD15 vs Real	228	0.842	0.879	0.591	0.938
Wukong vs Real	228	0.838	0.861	0.575	0.924
VQDM vs Real	224	0.781	0.603	0.269	0.618

This is the honest picture: the strongest gains came from combining fast external detectors with our internal stack, then calibrating the final decision for f1. That materially improves overall balance and lifts weak generators, though performance is still generator-dependent and far from a universal guarantee.

Model Weights

On first use, the UnivFD backend downloads:

CLIP ViT-L/14 OpenAI weights through open_clip_torch
UniversalFakeDetect linear head from siddharthksah/deepsafe-weights/universalfakedetect/fc_weights.pth

You can also pass a local head checkpoint:

aidetect detect image.jpg --weight-path ./fc_weights.pth

Development

pip install -e '.[dev,eval,hf,api]'
pytest
ruff check .

Limitations

No detector is universal. New generators, heavy recompression, screenshots, crops, edits, upscaling, and adversarial post-processing can change results.
Benchmarks can overstate real-world reliability if the deployment data differs from the benchmark distribution.
The tool currently detects whole-image synthetic likelihood. It does not localize edited regions.

Citation

If this helps your work, cite the original UnivFD paper:

@InProceedings{Ojha_2023_CVPR,
  author = {Ojha, Utkarsh and Li, Yuheng and Lee, Yong Jae},
  title = {Towards Universal Fake Image Detectors That Generalize Across Generative Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2023},
  pages = {24480-24489}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.omx		.omx
aidetector		aidetector
benchmarks		benchmarks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai-image-detector-mvp.zip		ai-image-detector-mvp.zip
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Image Detector

Model Choice

Install

CLI Usage

Python API

Web UI

FastAPI

Benchmarks

Model Weights

Development

Limitations

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Image Detector

Model Choice

Install

CLI Usage

Python API

Web UI

FastAPI

Benchmarks

Model Weights

Development

Limitations

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages