A tool for converting PDF files into Accessible Digital Textbooks, ADTs.
The sample report can help in better understanding the process and outputs or you can view the final ADT (Accessible Digital Textbook).
Demos of ADTs created from the outputs of ADT Press:
- Momo Multilingual - Momo and the Leopards, multi-lingual reader from Bhutan (no edits, pure AI output).
- Queremos - Informative reader from Uruguay (lightly edited).
- Cuaderno5 Chapter 1 - Uruguay Grade 5 textbook with Activities (more extensively edited).
- PDF document processing and image extraction
- Image analysis using LLM models:
- Image captioning
- Intelligent image cropping
- Image meaningfulness assessment
- HTML report generation
- Visualization of the processing pipeline
- Python 3.13 or higher
- UV package manager (recommended)
API keys are required depending on your configuration. By default, ADT Press uses OpenAI for LLM processing, which requires setting the OPENAI_API_KEY environment variable. If you configure Azure for the main processing or for speech generation (speech.provider=azure), you'll also need to set AZURE_API_KEY and AZURE_API_BASE.
Example of .env file:
# Used by default
OPENAI_API_KEY=your-openai-api-key
# If using Azure
AZURE_API_KEY=your-azure-api-key
AZURE_API_BASE=your-azure-endpointThis project uses uv for dependency management. If you don't have uv installed, you can install it following the instructions at the uv documentation.
Clone the repository and install dependencies:
git clone [email protected]:unicef/adt-press.git
cd adt-press
uv syncRun the main script with the default configuration:
uv run adt-press.py pdf_path=assets/raven.pdfThe application uses OmegaConf for configuration management. The default configuration file is located at config/config.yaml.
To override configuration values from the command line:
uv run adt-press.py label=mydocument pdf_path=/path/to/your/document.pdf page_range.start=0 page_range.end=5pdf_path: Path to the PDF file to processlabel: The label for this PDF file, will be used as the subdirectory name underoutput_dir. If not provided, label will be generated based on the filename portion ofpdf_pathpage_range: Range of pages to process (start and end)output_dir: Base directory to store outputstemplate_dir: Directory containing HTML templatesclear_cache: Whether to clear the processing cache before the runrender_strategy: Controls which strategy to use for layout generationdynamic(by default) - detectslayout_typesand routes them to render strategiestwo_columnworks best for novels and storybookshtmlworks best for textbooksoverlayworks best for comic books
speech_strategy: Enable text-to-speech generation (default:none, set tottsto enable)speech.provider: TTS provider selection (default:autouses OpenAI, also supportsazure)autooropenai: Uses OpenAI's TTS modelsazure: Uses Azure Speech Services- Configure voice settings in
config/config.yamlunderspeech
Generate speech using OpenAI (default):
uv run adt-press.py label=mydoc pdf_path=assets/book.pdf speech_strategy=ttsThe application generates the following outputs in the output/[your label] directory:
- Extracted images from the PDF
- Cropped images
- HTML reports with analysis results
- Visualization of the processing pipeline
adt-press includes an evaluation tool used for measuring performance of the various LLM tasks against a gold standard. To run the tool make sure you have the following environment variables set:
LABEL_STUDIO_HOST=[Your LabelStudio Hostname]
LABEL_STUDIO_TOKEN=[Your LabelStudio API Token]
AZURE_STORAGE_ACCOUNT_NAME=[Azure storage account name]
AZURE_STORAGE_ACCOUNT_KEY=[Azure storage account key]
MLFLOW_TRACKING_URI=https://[MLFlow endpoint URL] (optional)Once the environment is set, you can run the adt-eval.py tool the same as the adt-press.py tool, by default, output is put in output/eval
uv run adt-eval.pyThis will create new reports with results against the gold standard in the output directory. Start at index.html.
Alternatively, you can configure various options from the command line, look in config/eval_config.yml for a full list. (as well as config/config.yml for global options)
// limit to only run the first 10 test cases and only the text_type task
uv run adt-eval.py label=evals eval.limit=10 eval.tasks=text_typeThis project uses Ruff for code formatting and linting. The configuration is specified in ruff.toml.
To check code style:
uv run ruff check --fixTo format code:
uv run ruff formatRun tests with pytest:
uv run pytestadt_press/: Main packagellm/: LLM integration modulesnodes/: Hamilton nodes for the processing pipelineutils/: Utility functionsmodels/: Data models used in adt-press
assets/: Example filesconfig/: Configuration filesprompts/: LLM prompt templatestemplates/: HTML templatestests/: Test files
Build the image:
docker build -t adt-press .Run the container:
docker run --rm adt-pressTo run a specific command inside the container (for example, to execute uv run adt-press.py with a PDF file):
docker run --rm adt-press uv run adt-press.py label=raven pdf_path=assets/raven.pdfReplace /data/yourfile.pdf with the path to your PDF file inside the container.
If you use Visual Studio Code, you can take advantage of the "Reopen in Container" feature for a full-featured development environment inside Docker.
This allows you to edit, run, and debug your code directly within the container.
To use this, add a .devcontainer configuration to your project and select "Reopen in Container" from the VS Code command palette.
You will need to have the Dev Containers extension installed in VS Code to use this feature.
Note:
The folder .devcontainer needs to be in the root of your project, containing a devcontainer.json file with the following content:
{
"name": "ADT Press",
"build": {
// Sets the run context to one level up instead of the .devcontainer folder.
"context": "..",
// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
"dockerfile": "../Dockerfile"
}
}Environment Variables
Note:
When running the Dockerized version, set the required environment variables when running the container.
For example:docker run --rm -e OPENAI_API_KEY=your-key-here adt-pressWhen using VS Code "Reopen in Container", you can add the variables to your
.envfile or set them in the container terminal before running your scripts.
License
This project is licensed under the Apache License 2.0.