Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,70 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 0.1.0 (2026-05-26)


### Features

* :sparkles: trigger redeploy upon succesful python publish ([2c30ddd](https://github.com/fairdataihub/poster2json/commit/2c30ddde2fc74f47b7bbec49976d91c51526f384))
* add json-repair as last-resort JSON parse fallback (0.5.8) ([54b542f](https://github.com/fairdataihub/poster2json/commit/54b542fe59a6423a0a680f330cfffd2cc6ac665f))
* add sync_schema.py for canonical schema fetching ([8933b23](https://github.com/fairdataihub/poster2json/commit/8933b238e644731d445954c02cdde5932e27679d))
* add tests ([1e04dc0](https://github.com/fairdataihub/poster2json/commit/1e04dc0706b0efafbc24d3a6898b19232a05ad9b))
* authenticated ORCID API via OAuth client_credentials (0.5.10) ([d61a268](https://github.com/fairdataihub/poster2json/commit/d61a26811d17960e1e5907c898690b91fb4b5475))
* default JSON model to 4bit and document custom --model flag ([f730bfc](https://github.com/fairdataihub/poster2json/commit/f730bfc3e798e8d809b5c88591175b09b506d257))
* enhanced license normalization + junk filtering ([c169cad](https://github.com/fairdataihub/poster2json/commit/c169cad86aba91bdcfc3809cbac4461ae5181dcf))
* extract PDF link annotations into identifiers and relatedIdentifiers ([06863d0](https://github.com/fairdataihub/poster2json/commit/06863d0012ff37f2a1f2db01f292ef6158b3640c))
* ground researchField on the OpenAlex 4 domains ([bee8a41](https://github.com/fairdataihub/poster2json/commit/bee8a4126836cffa1491dc24710aa51b350ef7fe))
* heuristic language detection on raw poster text ([6ba04cb](https://github.com/fairdataihub/poster2json/commit/6ba04cb3dd42ae6217168f6e6f21251595bebbe3))
* Phase 1 — DOI / funder / award normalization (0.5.0) ([e4c67bc](https://github.com/fairdataihub/poster2json/commit/e4c67bc0ab2fb7c1957eea5433b047b75899e1e8))
* Phase 2 — ORCID enrichment via public API (0.5.1) ([04c9350](https://github.com/fairdataihub/poster2json/commit/04c9350a40181f487cc2c8a5eebf0ef1242bd18e))
* Phase 3 — publisher-suspect _validation warning (0.5.2) ([34382a9](https://github.com/fairdataihub/poster2json/commit/34382a9935f7e958565caa67d2872b00f2ecbfab))
* regex identifier extraction and caption ID auto-generation ([a5a3518](https://github.com/fairdataihub/poster2json/commit/a5a3518852b52038c7bcbcc8ad2c811c43f5c741))
* reject oversized inputs before GPU inference (MAX_INPUT_TOKENS) ([157bcc9](https://github.com/fairdataihub/poster2json/commit/157bcc91a56c4b08da67ac98232df81e9b49867c))
* SPDX license, subject, and ROR normalization on output ([096d281](https://github.com/fairdataihub/poster2json/commit/096d2811fb8c266792cf0e755b6c2787c707935e))
* update poster_schema.json to v0.2 (DataCite 4.7) ([11af839](https://github.com/fairdataihub/poster2json/commit/11af839544166ff3d815606bc74e3352f8fbe8a4))
* vision OCR fallback for image-only PDFs ([1e3e3f6](https://github.com/fairdataihub/poster2json/commit/1e3e3f67e2920ca56139362827ba9ce611154bce))


### Bug Fixes

* anti-hallucination safety nets + prompt grounding rule (0.5.4) ([5dbca4d](https://github.com/fairdataihub/poster2json/commit/5dbca4dfc0f1e61a81246c9080550601f4e7d98f))
* **deps:** bump nltk 3.9.2 -> 3.9.4 to clear CVE-2025-14009 (critical) ([610f73c](https://github.com/fairdataihub/poster2json/commit/610f73c87a33a614846017ba420acb409e807bbf))
* **deps:** bump pillow 12.1.0 -> 12.2.0 and black 22.12.0 -> 26.3.1 ([943d07e](https://github.com/fairdataihub/poster2json/commit/943d07e1895fe22c178e72602811a629b09f4bd7))
* derive formats from file extension instead of LLM extraction ([ce3d707](https://github.com/fairdataihub/poster2json/commit/ce3d70781776e5b32954b931eca2130d98070803))
* fix dependencies ([6c6fc67](https://github.com/fairdataihub/poster2json/commit/6c6fc67e1b9c55332fe48321388fcaf4b9643424))
* funder identifiers use URL format everywhere, update tests ([b4cf8ca](https://github.com/fairdataihub/poster2json/commit/b4cf8ca9b7b6b6c2a7275a50a3cf4809f8a0fdaf))
* license canonical display name + version field extraction (0.5.9) ([84d4b63](https://github.com/fairdataihub/poster2json/commit/84d4b639c137b02d1547d36efd9446ac0bd15ce0))
* normalize identifiers to URL format for Zenodo validation ([bd34144](https://github.com/fairdataihub/poster2json/commit/bd34144822116572915fb9c9e8a71c4f701e1a16))
* Prompt updates for title casing and placeholder hallucinations (v0.1.9) ([53b39e6](https://github.com/fairdataihub/poster2json/commit/53b39e64e504a259d671ed02ac4827524fcdcc0f))
* PyMuPDF fallback when pdfalto text causes JSON parse failure (0.5.5) ([3100b63](https://github.com/fairdataihub/poster2json/commit/3100b63a133bfbed6eeeffc26121035fd061fc61))
* remove concrete example values from prompts to prevent echoing (0.5.6) ([0254d9c](https://github.com/fairdataihub/poster2json/commit/0254d9c73de8521e93a76f49784cb506711fd69b))
* Revert ALTO XML column reordering that caused validation regression (v0.1.12) ([5795520](https://github.com/fairdataihub/poster2json/commit/5795520c83ec9e411812b004343f9f61aa91bdfb))
* rewrite _repair_unescaped_quotes as character-walking JSON repair ([aff3980](https://github.com/fairdataihub/poster2json/commit/aff398058c158612a1ae481bdfad88d5ccc1b59d))
* ROR rate limit 6 req/s, retry with backoff, 25-failure circuit breaker ([005776a](https://github.com/fairdataihub/poster2json/commit/005776a9c0f9c3d64601345d83c9b4dd76406d32))
* Smart title-case for ALL-CAPS poster titles (v0.1.9) ([97f3c4d](https://github.com/fairdataihub/poster2json/commit/97f3c4d8e3c38a02bf30afd70b402e506191ab42))
* stop conference field hallucination at the prompt level ([33289f0](https://github.com/fairdataihub/poster2json/commit/33289f0a0b4a40d73022b4a0d811081a6d591ebb))
* stop hallucinating publicationYear in extraction prompts ([6ab31af](https://github.com/fairdataihub/poster2json/commit/6ab31af33f5d9444bc2b528e8eead4872435d5a1))
* stop hardcoding descriptionType to Other, default to Abstract ([74536e8](https://github.com/fairdataihub/poster2json/commit/74536e8d6a67fdd98a3830901c9b649e285cfb75))
* Strip empty-string conference metadata values (v0.1.11) ([22357dc](https://github.com/fairdataihub/poster2json/commit/22357dc57dc7ac082234959ad8540e633411b4f8))
* strip mailto: links from PDF link annotations ([56b9020](https://github.com/fairdataihub/poster2json/commit/56b9020e82c1bb33280859f591ba36f8fb75c128))
* Strip prompt-placeholder hallucinations from conference metadata ([1a41d29](https://github.com/fairdataihub/poster2json/commit/1a41d2920bbc507cf366cb2f16830d48d533998a))
* suppress invalid escape sequence warning in repair function ([a48c493](https://github.com/fairdataihub/poster2json/commit/a48c493a9790b02d74bed27cf18493529872849c))
* tag auto-generated descriptions as Other instead of Abstract ([3945d81](https://github.com/fairdataihub/poster2json/commit/3945d810acea66a64f693d0ba7e446b35c845391))
* unload JSON model before vision load on image posters ([b85c08c](https://github.com/fairdataihub/poster2json/commit/b85c08c9e4ae0def7122f5c39e9762013917b9d5))
* update code and docs for poster_schema v0.2 (DataCite 4.7) ([01cc762](https://github.com/fairdataihub/poster2json/commit/01cc762474551be8c0fe487b8955b668cf6c2397))
* update markdownlint configuration and improve README.md content ([f06e74c](https://github.com/fairdataihub/poster2json/commit/f06e74cc83db1c92cae75561c2a5f25f7c5b5f05))
* update README.md by removing unnecessary brackets from URLs ([45e12d0](https://github.com/fairdataihub/poster2json/commit/45e12d0e54d2e046da18c85d115aa4aeea9d74a8))
* use MIME types for formats per DataCite schema 4.7 ([e3f82ae](https://github.com/fairdataihub/poster2json/commit/e3f82aefa149955838bac370b36523d846af2fc0))


### Documentation

* add AI-generated image attribution hover to logo ([077617b](https://github.com/fairdataihub/poster2json/commit/077617b8b7dad682e2a9c708e553e1d1be7ecb83))
* add funding section, version to citation, remove acknowledgements ([8223b35](https://github.com/fairdataihub/poster2json/commit/8223b354532662dbaf02d1f5d4ddc3570a29471b))
* add normalization and enrichment pipeline to architecture ([4e305d4](https://github.com/fairdataihub/poster2json/commit/4e305d49f017d344ad245bdbb305964bba383040))
* correct model description and surface 0.4.x features in README ([80827e6](https://github.com/fairdataihub/poster2json/commit/80827e6daea664f2a58a35451cdf361d92f35f9a))

## [0.5.9] - 2026-05-08

License display name normalization + version field extraction.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool.poetry]

name = "poster2json"
version = "0.6.2"
version = "0.1.0"
description = "Convert scientific posters (PDF/images) to structured JSON metadata using Large Language Models"

packages = [{ include = "poster2json" }]
Expand Down