Skip to content

Conversation

@stefanoric
Copy link

Summary

New translator for Portale Antenati (antenati.cultura.gov.it), the Italian State Archives portal for genealogical research. This site provides access to digitized civil registry records (births, marriages, deaths) from Italian archives.

Features

  • Extracts metadata from IIIF manifests including:
    - Title (constructed as "Year - Type - Archive path", e.g., "1850 - Nati - Archivio di Stato di Lecce > Stato civile della restaurazione > Casamassella")
    - Date range
    - Archive location
    - Document type (Nati, Morti, Matrimoni, etc.)
    - Total page count
    - Current page number (from Mirador viewer)

  • Embeds the currently viewed archival image as a base64-encoded note (workaround for IIIF server requiring custom headers that prevent direct attachment linking)

  • Includes fallback scraping when IIIF manifest is unavailable

Technical notes

  • Uses custom HTTP headers to access the IIIF image server (similar approach to existing Python tools for this site)
  • Creates manuscript items, appropriate for archival documents
  • Supports both domain variants: antenati.cultura.gov.it and antenati.san.beniculturali.it

Extracts metadata from IIIF manifests and embeds archival images
as base64 in notes (workaround for IIIF server requiring custom headers).
- Extract current page number from Mirador viewer navigation element
- Populate numPages field from IIIF canvas count
- Use DOM page number for canvas selection with URL fallback
- Add debug logging for manifest metadata fields
Title format: "1850 - Nati - Archivio di Stato di Lecce > ..."
- Use "Titolo" field for year (not "Datazione" which has date range)
- Add "Datazione" to date field handling
- Add debug logging for title construction
- Remove excessive debug logging, keep only key error cases
- Show pages as "X of Y" format (e.g., "8 of 126")
- Minor code style cleanups
- Place || operators at beginning of lines
- Remove redundant radix parameter from parseInt
- Remove unnecessary quotes from object property names
- Remove unnecessary escape characters in regex
- Add parentheses around arrow function argument
Change IIIF image request from 800px to 2000px max dimension
for better readability of archival documents. Uses !2000,2000
syntax which works with the server (unlike /full/full/ or /max/).
If 2000px image request fails with 403, automatically retry with
1200px. This helps handle AWS WAF restrictions on some images.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant