Skip to content

PT-2358: Generate Markers Map#1

Merged
tjcouch-sil merged 58 commits into
mainfrom
2358-generate-markers-map
Nov 19, 2025
Merged

PT-2358: Generate Markers Map#1
tjcouch-sil merged 58 commits into
mainfrom
2358-generate-markers-map

Conversation

@tjcouch-sil
Copy link
Copy Markdown
Member

@tjcouch-sil tjcouch-sil commented Sep 9, 2025

Goes with paranext/paranext-core#1887

For https://paratextstudio.atlassian.net/browse/PT-2358, I need to know what types each marker is and which attribute is default. So here's some stuff to make that happen.

The code is rather messy because I started with vibe coding and then had to hand-edit a bunch of stuff to make it work. :/


This change is Reviewable

@tjcouch-sil tjcouch-sil enabled auto-merge (squash) September 9, 2025 18:54
@tjcouch-sil tjcouch-sil disabled auto-merge September 9, 2025 20:01
Copy link
Copy Markdown
Member

@lyonsil lyonsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing through this! The data seems pretty dense.

Some questions/thoughts at a high level:

  • Why don't verse and chapter markers have default attributes in the output for "number"? I see in the readme that they were skipped on those two (and some other markers) on purpose, but it's not clear why. If we don't put it in this data file, then we'll have to bake special rules into the offset handler. It seems like those special rules could be avoided by just processing all the attributes (except maybe "style") on all the marker types that are present in the USFM.
  • It might be nice to include some description property since there are no comments. For those of us without extensive history reading USFM, it's nice to look at a tag and know "oh, 'x' came from 'CrossReference' in the XML".
  • Do we need anything in this data to indicate which markers are self-closing and which require an end tag? That will be necessary information for calculating offsets. Maybe that could go into the currently empty markerTypes data.
  • As with probably all deep XML parsing code that tries to make sense of someone else's data model, this is tough code to skim and understand. Having said that, it's not obvious to me why some decisions were made (e.g., the large block of "skip X on Y"). The comments there explain what the code is doing, but not why. It seems like it would be nice to have something somewhere (could be in the readme) to give a little understanding on why.

@lyonsil reviewed 12 of 12 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @tjcouch-sil)


.github/workflows/generate-markers-map.yml line 0 at r1 (raw file):
I did not test this workflow and am assuming that people who run this will generally clone and run it locally.


.github/workflows/test.yml line 28 at r1 (raw file):

      - name: npm unit tests
        run: npm test

There are no test files, so this step isn't really doing anything. However, I could see a couple of unit tests being really valuable that process existing usx.rng files (maybe for 3.0.x and 3.1.x) to make sure the output is what we expect it to be. Basically, these would be regression tests so that if we update something in package.json or make a bug fix we don't break something else unexpectedly.

I don't think we need exhaustive unit tests, but without some kind of regression testing I would be afraid to make a change without essentially generating a regression test on my own to run.

@tjcouch-sil tjcouch-sil marked this pull request as ready for review October 31, 2025 17:04
Copy link
Copy Markdown
Member Author

@tjcouch-sil tjcouch-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of improvements! Ready for another review.

Reviewable status: 6 of 24 files reviewed, 1 unresolved discussion (waiting on @lyonsil)


.github/workflows/test.yml line 28 at r1 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

There are no test files, so this step isn't really doing anything. However, I could see a couple of unit tests being really valuable that process existing usx.rng files (maybe for 3.0.x and 3.1.x) to make sure the output is what we expect it to be. Basically, these would be regression tests so that if we update something in package.json or make a bug fix we don't break something else unexpectedly.

I don't think we need exhaustive unit tests, but without some kind of regression testing I would be afraid to make a change without essentially generating a regression test on my own to run.

Good idea :) done

Copy link
Copy Markdown
Member

@lyonsil lyonsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the improvements.

Some thoughts:

  • There is no tsconfig.json file in the project. I didn't run into any big problems without it, but it seemed unusual. I'm not sure if type checking could be missing anything without it.
  • I flagged a few things to fix, but nothing major.
  • Overall I did not go through this with the same depth that I would for code in the product. Skimming through the code, this looks like a good baseline to work from for generating/updating future maps.
  • It might be worth presenting the map files and tooling needed to produce them to the USFM committee.

@lyonsil reviewed 18 of 18 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @tjcouch-sil)


src/generate-markers-map.ts line 128 at r4 (raw file):

    console.log(Array.from(skippedDefinitions).sort().join('\n'));
  } catch (e) {
    console.error(`Uncaught error in generate-markers-map async IIFE: ${e}`);

You might want process.exit(1) here, too, so you are consistently returning exiting with a non-zero code when you hit errors.


.github/workflows/generate-markers-map.yml line 61 at r4 (raw file):

      - name: Generate markers map
        run: npm run generate-markers-map -- --schema usx.rng --version "${{ steps.inputs.outputs.schemaVersion }}" --commit "${{ env.USX_COMMIT }}"

Feel free to ignore, but VSCode gets mad because it thinks USX_COMMIT might not be set. That makes the file show up yellow all the time. This might fix it, though I didn't test it out:

--commit "${USX_COMMIT}"

src/markers-map.util.ts line 67 at r4 (raw file):

    // Child is not an element node, so skip
    if (sibling.nodeType !== 1 /* Node.ELEMENT_NODE - not defined in Node.js */) continue;

It would be a little cleaner to define a const at the top and replace some of the magic values.

const NODE_TYPE = {
  ELEMENT: 1,
  COMMENT: 8,
} as const;

src/markers-map.util.ts line 1811 at r4 (raw file):

    version,
    commit,
    markersMapVersion: '1.0.0',

This is probably a good candidate for a magic, named constant defined at the top of the file.


package-lock.json line 0 at r4 (raw file):
Please run npm audit fix and check in the updates to package-lock.json. Just hoping to have this repo start out with no warnings.


.eslintrc.js line 20 at r4 (raw file):

    {
      // Specific configuration for the convert-marble-lexicon.ts file
      files: ['src/convert-marble-lexicon.ts'],

I see where you got your inspiration. 😆

You can probably just remove this overrides section.


package.json line 5 at r4 (raw file):

  "version": "1.0.0",
  "description": "Scripts and utilities for transforming and preparing US* schemas for use in Platform.Bible and Paratext",
  "main": "src/transform.ts",

This file doesn't exist. You can probably just erase this line.


src/markers-map.model.template.ts line 0 at r4 (raw file):
It seems like this is something that ideally would be pulled from PBU if PBU was a standalone package you could have a dependency on. Do you agree?

Copy link
Copy Markdown
Member Author

@tjcouch-sil tjcouch-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work on this review and the other review. Tons of stuff, and you have already been doing a lot of reviewing 😵‍💫 I appreciate your perseverance. Hopefully I'll be more available to take on more of our usually shared tasks after all this.

Regarding tsconfig.json, I modeled this repo off of https://github.com/paranext/marble-tools, which also has no tsconfig.json. Think I should add one here? I don't even know what I would put in it to be honest, but I could probably dig around and think of some stuff.

Indeed, it seems reasonable not to go as deep on this.

Indeed, I also hope to present this stuff to them.

Reviewable status: 10 of 24 files reviewed, 8 unresolved discussions (waiting on @lyonsil)


.eslintrc.js line 20 at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

I see where you got your inspiration. 😆

You can probably just remove this overrides section.

Whoops! 😂 Done


package.json line 5 at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

This file doesn't exist. You can probably just erase this line.

Good idea.


package-lock.json line at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

Please run npm audit fix and check in the updates to package-lock.json. Just hoping to have this repo start out with no warnings.

Good idea!


.github/workflows/generate-markers-map.yml line 61 at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

Feel free to ignore, but VSCode gets mad because it thinks USX_COMMIT might not be set. That makes the file show up yellow all the time. This might fix it, though I didn't test it out:

--commit "${USX_COMMIT}"

If I remove env., it shows this error (not just a warning): Unrecognized named-value: 'USX\_COMMIT'. I'm guessing that won't work, unfortunately. Looks like this bug has been open since 2023 with no workarounds :/


src/generate-markers-map.ts line 128 at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

You might want process.exit(1) here, too, so you are consistently returning exiting with a non-zero code when you hit errors.

Ah good call


src/markers-map.model.template.ts line at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

It seems like this is something that ideally would be pulled from PBU if PBU was a standalone package you could have a dependency on. Do you agree?

Actually, currently, I think of these as having the opposite relationship; multiple generations of the markers map model file are put into PBU so UsjReaderWriter can use them. I'm open to discussion.


src/markers-map.util.ts line 67 at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

It would be a little cleaner to define a const at the top and replace some of the magic values.

const NODE_TYPE = {
  ELEMENT: 1,
  COMMENT: 8,
} as const;

Good idea!


src/markers-map.util.ts line 1811 at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

This is probably a good candidate for a magic, named constant defined at the top of the file.

Good idea!

@tjcouch-sil tjcouch-sil force-pushed the 2358-generate-markers-map branch from a623a11 to 8193f1e Compare November 4, 2025 16:56
@tjcouch-sil tjcouch-sil force-pushed the 2358-generate-markers-map branch from 8193f1e to 7f7d781 Compare November 4, 2025 16:56
lyonsil
lyonsil previously approved these changes Nov 4, 2025
Copy link
Copy Markdown
Member

@lyonsil lyonsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I didn't end up with a problem missing tsconfig.json in the other repo. If you didn't notice anything in this repo, either, then I guess it's OK at least for now.

:lgtm: Thanks for the deep dive on getting the markers maps built!

@lyonsil reviewed 14 of 14 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)


.github/workflows/generate-markers-map.yml line 61 at r4 (raw file):

Previously, tjcouch-sil (TJ Couch) wrote…

If I remove env., it shows this error (not just a warning): Unrecognized named-value: 'USX\_COMMIT'. I'm guessing that won't work, unfortunately. Looks like this bug has been open since 2023 with no workarounds :/

Ok, thanks for trying. Not a blocker.


src/markers-map.model.template.ts line at r4 (raw file):

Previously, tjcouch-sil (TJ Couch) wrote…

Actually, currently, I think of these as having the opposite relationship; multiple generations of the markers map model file are put into PBU so UsjReaderWriter can use them. I'm open to discussion.

Hmm. That's fair. One day it would be nice to get all of these dependencies setup "properly."

Copy link
Copy Markdown

@irahopkinson irahopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irahopkinson reviewed 4 of 12 files at r1, 7 of 18 files at r4, 6 of 14 files at r5, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @tjcouch-sil)


src/markers-3.1.json line 6 at r5 (raw file):

  "schemaCommit": "50f2a6ac3fc1d867d906df28bc00fcff729a7b76",
  "markersMapVersion": "1.0.0",
  "usfmToolsCommit": "39dc10f26a33da0de4eb5b269f5bba1f2fd35c48+",

BTW the plus at the end doesn't look like part of a commit hash.

Code quote:

+

package.json line 5 at r4 (raw file):

  "version": "1.0.0",
  "description": "Scripts and utilities for transforming and preparing US* schemas for use in Platform.Bible and Paratext",
  "main": "src/transform.ts",

BTW would be a good idea to add "private": true, if you don't intend to publish this repo.


package.json line 49 at r4 (raw file):

  },
  "homepage": "https://github.com/paranext/usfm-tools#readme"
}

BTW those 2 red symbols at the bottom of the file is because GH expects a blank line at the end of the file.

Copy link
Copy Markdown

@irahopkinson irahopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irahopkinson reviewed 1 of 11 files at r3, 2 of 18 files at r4, 8 of 14 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @tjcouch-sil)


src/markers-map.util.ts line 61 at r5 (raw file):

/**
 * Helper function to get next child element of this element's parent. Almost the exact same as
 * `element.nextElementSibling`, but this returns `undefined` because `null` is dumb

Err... what about just doing element.nextElementSibling ?? undefined? If it's not that simple then perhaps update the comment.

Code quote:

`element.nextElementSibling`, but this returns `undefined` because `null` is dumb

src/markers-map.util.ts line 142 at r5 (raw file):

  if (nameElements.length > 1) {
    console.log(

BTW use console.warn instead? Here and elsewhere.

Code quote:

console.log

src/markers-map.util.ts line 333 at r5 (raw file):

  if (existingString.split('\n').includes(newString)) return existingString;

  // Both arrays are defined but don't match, so concat them

typo change 'arrays' to 'strings'

Code quote:

arrays

README.md line 180 at r5 (raw file):

    - The `style` attribute may contain a `ref` pointing to a `choice` of all the marker names associated with that marker type
    - If there is not a `style` attribute, the element's `name` is the marker type and the marker name
  - (`marker.isIndependentClosingMarkerFor`; `marker.independentClosingMarkers`) additional independent closing marker that goes with another other marker

typo just 'another'?

Code quote:

another other

README.md line 193 at r5 (raw file):

    - There are many kinds of special attribute types in USFM representation. One attribute cannot be multiple types of special attribute. Check if an attribute is a special type in this listed order:
      - Attributes should not be considered for being a special attribute type in any of the following circumstances:
        - the `attribute` tag has are multiple `usfm:match` tags

typo just 'has'?

Code quote:

has are

README.md line 313 at r5 (raw file):

  - [`usfm`](https://docs.usfm.bible/usfm/3.1/doc/usfm.html) with marker type `para` and no default attribute. This marker is present in USFM but most of the time is translated into the `usx` marker in USX and the `USJ` marker in USJ
    - Note that `usfm` is a special `para` in that its text content is considered to be `version`, which gets translated to `usx` and `USJ` as an attribute.
  - [`USJ`](https://docs.usfm.bible/usfm/3.1/doc/usfm.html) with marker type `USJ` and no default attribute. This marker is present in USJ but is translated into the `usx` marker in `USX` and the `usfm` marker in USFM.

typo just "USX", i.e. without the backticks?

Code quote:

`USX`

src/markers-map.model.template.ts line 31 at r5 (raw file):

   * List of normal marker names for which this marker is an attribute marker.
   *
   * For example, `ca` and `cp` are attribute markers for `c`. `isAttributeMarkerFor` would be

BTW what about using @example in all your TSDocs?

Code quote:

For example,

Copy link
Copy Markdown
Member Author

@tjcouch-sil tjcouch-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 19 of 24 files reviewed, 10 unresolved discussions (waiting on @irahopkinson and @lyonsil)


package.json line 5 at r4 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

BTW would be a good idea to add "private": true, if you don't intend to publish this repo.

Coincidentally, I have been thinking there is a significant change we will want to publish this repo, whereas we don't have "private": true on many other repos that we definitely don't intend to publish. 😂


package.json line 49 at r4 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

BTW those 2 red symbols at the bottom of the file is because GH expects a blank line at the end of the file.

Indeed! I'm not sure how it ended up without a newline at some point, but it has one now :)


README.md line 180 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

typo just 'another'?

Haha whoops! Thanks :)


README.md line 193 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

typo just 'has'?

Indeed - thanks!


README.md line 313 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

typo just "USX", i.e. without the backticks?

Fixed - thanks!


src/markers-3.1.json line 6 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

BTW the plus at the end doesn't look like part of a commit hash.

Indeed; it's explained in the TSDocs to be an indication that there were working changes when this was generated.

That being said, I don't think this should have been generated when there were working changes. I'll regenerate it


src/markers-map.model.template.ts line at r4 (raw file):

Previously, lyonsil (Matt Lyons) wrote…

Hmm. That's fair. One day it would be nice to get all of these dependencies setup "properly."

😮‍💨 indeed


src/markers-map.model.template.ts line 31 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

BTW what about using @example in all your TSDocs?

Good idea! Helped with reorganizing some things too


src/markers-map.util.ts line 61 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

Err... what about just doing element.nextElementSibling ?? undefined? If it's not that simple then perhaps update the comment.

Ah gotcha; makes sense that would be confusing. nextElementSibling is not available in Node; there are a number of utilities that are in browser environments that aren't in Node, sadly


src/markers-map.util.ts line 142 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

BTW use console.warn instead? Here and elsewhere.

I believe warn goes on stderr, but I wanted these things to go on stdout for unit testing purposes


src/markers-map.util.ts line 333 at r5 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

typo change 'arrays' to 'strings'

Oops! Thanks :)

Copy link
Copy Markdown

@irahopkinson irahopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: great job Mr USFM

@irahopkinson reviewed 5 of 5 files at r6, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)


src/markers-map.util.ts line 142 at r5 (raw file):

Previously, tjcouch-sil (TJ Couch) wrote…

I believe warn goes on stderr, but I wanted these things to go on stdout for unit testing purposes

Gotcha


src/markers-map.util.ts line 62 at r6 (raw file):

 * Helper function to get next child element of this element's parent. Almost the exact same as
 * `element.nextElementSibling` (which is not available in Node), but this returns `undefined`
 * because `null` is dumb

FYI every time I read this it makes be laugh out loud - even after reading it so many times already. 😆

Code quote:

because `null` is dumb

Copy link
Copy Markdown
Member Author

@tjcouch-sil tjcouch-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)


src/markers-map.util.ts line 62 at r6 (raw file):

Previously, irahopkinson (Ira Hopkinson) wrote…

FYI every time I read this it makes be laugh out loud - even after reading it so many times already. 😆

😂

@tjcouch-sil tjcouch-sil force-pushed the 2358-generate-markers-map branch from a3f77b4 to b347585 Compare November 12, 2025 17:08
Copy link
Copy Markdown
Member

@lyonsil lyonsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for the additional documentation to help clarify this messy topic.

@lyonsil reviewed 5 of 5 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)

Copy link
Copy Markdown

@irahopkinson irahopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irahopkinson reviewed 5 of 5 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)

@tjcouch-sil tjcouch-sil merged commit 443d9e6 into main Nov 19, 2025
3 checks passed
@tjcouch-sil tjcouch-sil deleted the 2358-generate-markers-map branch November 19, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants