PT-2358: Generate Markers Map#1
Conversation
…eleases are distributed
lyonsil
left a comment
There was a problem hiding this comment.
Thanks for pushing through this! The data seems pretty dense.
Some questions/thoughts at a high level:
- Why don't verse and chapter markers have default attributes in the output for "number"? I see in the readme that they were skipped on those two (and some other markers) on purpose, but it's not clear why. If we don't put it in this data file, then we'll have to bake special rules into the offset handler. It seems like those special rules could be avoided by just processing all the attributes (except maybe "style") on all the marker types that are present in the USFM.
- It might be nice to include some description property since there are no comments. For those of us without extensive history reading USFM, it's nice to look at a tag and know "oh, 'x' came from 'CrossReference' in the XML".
- Do we need anything in this data to indicate which markers are self-closing and which require an end tag? That will be necessary information for calculating offsets. Maybe that could go into the currently empty
markerTypesdata. - As with probably all deep XML parsing code that tries to make sense of someone else's data model, this is tough code to skim and understand. Having said that, it's not obvious to me why some decisions were made (e.g., the large block of "skip X on Y"). The comments there explain what the code is doing, but not why. It seems like it would be nice to have something somewhere (could be in the readme) to give a little understanding on why.
@lyonsil reviewed 12 of 12 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @tjcouch-sil)
.github/workflows/generate-markers-map.yml line 0 at r1 (raw file):
I did not test this workflow and am assuming that people who run this will generally clone and run it locally.
.github/workflows/test.yml line 28 at r1 (raw file):
- name: npm unit tests run: npm test
There are no test files, so this step isn't really doing anything. However, I could see a couple of unit tests being really valuable that process existing usx.rng files (maybe for 3.0.x and 3.1.x) to make sure the output is what we expect it to be. Basically, these would be regression tests so that if we update something in package.json or make a bug fix we don't break something else unexpectedly.
I don't think we need exhaustive unit tests, but without some kind of regression testing I would be afraid to make a change without essentially generating a regression test on my own to run.
…mprovement to documentation
tjcouch-sil
left a comment
There was a problem hiding this comment.
Lots of improvements! Ready for another review.
Reviewable status: 6 of 24 files reviewed, 1 unresolved discussion (waiting on @lyonsil)
.github/workflows/test.yml line 28 at r1 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
There are no test files, so this step isn't really doing anything. However, I could see a couple of unit tests being really valuable that process existing usx.rng files (maybe for 3.0.x and 3.1.x) to make sure the output is what we expect it to be. Basically, these would be regression tests so that if we update something in package.json or make a bug fix we don't break something else unexpectedly.
I don't think we need exhaustive unit tests, but without some kind of regression testing I would be afraid to make a change without essentially generating a regression test on my own to run.
Good idea :) done
lyonsil
left a comment
There was a problem hiding this comment.
Appreciate the improvements.
Some thoughts:
- There is no tsconfig.json file in the project. I didn't run into any big problems without it, but it seemed unusual. I'm not sure if type checking could be missing anything without it.
- I flagged a few things to fix, but nothing major.
- Overall I did not go through this with the same depth that I would for code in the product. Skimming through the code, this looks like a good baseline to work from for generating/updating future maps.
- It might be worth presenting the map files and tooling needed to produce them to the USFM committee.
@lyonsil reviewed 18 of 18 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @tjcouch-sil)
src/generate-markers-map.ts line 128 at r4 (raw file):
console.log(Array.from(skippedDefinitions).sort().join('\n')); } catch (e) { console.error(`Uncaught error in generate-markers-map async IIFE: ${e}`);
You might want process.exit(1) here, too, so you are consistently returning exiting with a non-zero code when you hit errors.
.github/workflows/generate-markers-map.yml line 61 at r4 (raw file):
- name: Generate markers map run: npm run generate-markers-map -- --schema usx.rng --version "${{ steps.inputs.outputs.schemaVersion }}" --commit "${{ env.USX_COMMIT }}"
Feel free to ignore, but VSCode gets mad because it thinks USX_COMMIT might not be set. That makes the file show up yellow all the time. This might fix it, though I didn't test it out:
--commit "${USX_COMMIT}"
src/markers-map.util.ts line 67 at r4 (raw file):
// Child is not an element node, so skip if (sibling.nodeType !== 1 /* Node.ELEMENT_NODE - not defined in Node.js */) continue;
It would be a little cleaner to define a const at the top and replace some of the magic values.
const NODE_TYPE = {
ELEMENT: 1,
COMMENT: 8,
} as const;src/markers-map.util.ts line 1811 at r4 (raw file):
version, commit, markersMapVersion: '1.0.0',
This is probably a good candidate for a magic, named constant defined at the top of the file.
package-lock.json line 0 at r4 (raw file):
Please run npm audit fix and check in the updates to package-lock.json. Just hoping to have this repo start out with no warnings.
.eslintrc.js line 20 at r4 (raw file):
{ // Specific configuration for the convert-marble-lexicon.ts file files: ['src/convert-marble-lexicon.ts'],
I see where you got your inspiration. 😆
You can probably just remove this overrides section.
package.json line 5 at r4 (raw file):
"version": "1.0.0", "description": "Scripts and utilities for transforming and preparing US* schemas for use in Platform.Bible and Paratext", "main": "src/transform.ts",
This file doesn't exist. You can probably just erase this line.
src/markers-map.model.template.ts line 0 at r4 (raw file):
It seems like this is something that ideally would be pulled from PBU if PBU was a standalone package you could have a dependency on. Do you agree?
tjcouch-sil
left a comment
There was a problem hiding this comment.
Thanks for all the hard work on this review and the other review. Tons of stuff, and you have already been doing a lot of reviewing 😵💫 I appreciate your perseverance. Hopefully I'll be more available to take on more of our usually shared tasks after all this.
Regarding tsconfig.json, I modeled this repo off of https://github.com/paranext/marble-tools, which also has no tsconfig.json. Think I should add one here? I don't even know what I would put in it to be honest, but I could probably dig around and think of some stuff.
Indeed, it seems reasonable not to go as deep on this.
Indeed, I also hope to present this stuff to them.
Reviewable status: 10 of 24 files reviewed, 8 unresolved discussions (waiting on @lyonsil)
.eslintrc.js line 20 at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
I see where you got your inspiration. 😆
You can probably just remove this
overridessection.
Whoops! 😂 Done
package.json line 5 at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
This file doesn't exist. You can probably just erase this line.
Good idea.
package-lock.json line at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
Please run
npm audit fixand check in the updates topackage-lock.json. Just hoping to have this repo start out with no warnings.
Good idea!
.github/workflows/generate-markers-map.yml line 61 at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
Feel free to ignore, but VSCode gets mad because it thinks USX_COMMIT might not be set. That makes the file show up yellow all the time. This might fix it, though I didn't test it out:
--commit "${USX_COMMIT}"
If I remove env., it shows this error (not just a warning): Unrecognized named-value: 'USX\_COMMIT'. I'm guessing that won't work, unfortunately. Looks like this bug has been open since 2023 with no workarounds :/
src/generate-markers-map.ts line 128 at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
You might want
process.exit(1)here, too, so you are consistently returning exiting with a non-zero code when you hit errors.
Ah good call
src/markers-map.model.template.ts line at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
It seems like this is something that ideally would be pulled from PBU if PBU was a standalone package you could have a dependency on. Do you agree?
Actually, currently, I think of these as having the opposite relationship; multiple generations of the markers map model file are put into PBU so UsjReaderWriter can use them. I'm open to discussion.
src/markers-map.util.ts line 67 at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
It would be a little cleaner to define a
constat the top and replace some of the magic values.const NODE_TYPE = { ELEMENT: 1, COMMENT: 8, } as const;
Good idea!
src/markers-map.util.ts line 1811 at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
This is probably a good candidate for a magic, named constant defined at the top of the file.
Good idea!
a623a11 to
8193f1e
Compare
8193f1e to
7f7d781
Compare
lyonsil
left a comment
There was a problem hiding this comment.
Yeah, I guess I didn't end up with a problem missing tsconfig.json in the other repo. If you didn't notice anything in this repo, either, then I guess it's OK at least for now.
Thanks for the deep dive on getting the markers maps built!
@lyonsil reviewed 14 of 14 files at r5, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)
.github/workflows/generate-markers-map.yml line 61 at r4 (raw file):
Previously, tjcouch-sil (TJ Couch) wrote…
If I remove
env., it shows this error (not just a warning):Unrecognized named-value: 'USX\_COMMIT'. I'm guessing that won't work, unfortunately. Looks like this bug has been open since 2023 with no workarounds :/
Ok, thanks for trying. Not a blocker.
src/markers-map.model.template.ts line at r4 (raw file):
Previously, tjcouch-sil (TJ Couch) wrote…
Actually, currently, I think of these as having the opposite relationship; multiple generations of the markers map model file are put into PBU so
UsjReaderWritercan use them. I'm open to discussion.
Hmm. That's fair. One day it would be nice to get all of these dependencies setup "properly."
irahopkinson
left a comment
There was a problem hiding this comment.
@irahopkinson reviewed 4 of 12 files at r1, 7 of 18 files at r4, 6 of 14 files at r5, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @tjcouch-sil)
src/markers-3.1.json line 6 at r5 (raw file):
"schemaCommit": "50f2a6ac3fc1d867d906df28bc00fcff729a7b76", "markersMapVersion": "1.0.0", "usfmToolsCommit": "39dc10f26a33da0de4eb5b269f5bba1f2fd35c48+",
BTW the plus at the end doesn't look like part of a commit hash.
Code quote:
+package.json line 5 at r4 (raw file):
"version": "1.0.0", "description": "Scripts and utilities for transforming and preparing US* schemas for use in Platform.Bible and Paratext", "main": "src/transform.ts",
BTW would be a good idea to add "private": true, if you don't intend to publish this repo.
package.json line 49 at r4 (raw file):
}, "homepage": "https://github.com/paranext/usfm-tools#readme" }
BTW those 2 red symbols at the bottom of the file is because GH expects a blank line at the end of the file.
irahopkinson
left a comment
There was a problem hiding this comment.
@irahopkinson reviewed 1 of 11 files at r3, 2 of 18 files at r4, 8 of 14 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @tjcouch-sil)
src/markers-map.util.ts line 61 at r5 (raw file):
/** * Helper function to get next child element of this element's parent. Almost the exact same as * `element.nextElementSibling`, but this returns `undefined` because `null` is dumb
Err... what about just doing element.nextElementSibling ?? undefined? If it's not that simple then perhaps update the comment.
Code quote:
`element.nextElementSibling`, but this returns `undefined` because `null` is dumbsrc/markers-map.util.ts line 142 at r5 (raw file):
if (nameElements.length > 1) { console.log(
BTW use console.warn instead? Here and elsewhere.
Code quote:
console.logsrc/markers-map.util.ts line 333 at r5 (raw file):
if (existingString.split('\n').includes(newString)) return existingString; // Both arrays are defined but don't match, so concat them
typo change 'arrays' to 'strings'
Code quote:
arraysREADME.md line 180 at r5 (raw file):
- The `style` attribute may contain a `ref` pointing to a `choice` of all the marker names associated with that marker type - If there is not a `style` attribute, the element's `name` is the marker type and the marker name - (`marker.isIndependentClosingMarkerFor`; `marker.independentClosingMarkers`) additional independent closing marker that goes with another other marker
typo just 'another'?
Code quote:
another otherREADME.md line 193 at r5 (raw file):
- There are many kinds of special attribute types in USFM representation. One attribute cannot be multiple types of special attribute. Check if an attribute is a special type in this listed order: - Attributes should not be considered for being a special attribute type in any of the following circumstances: - the `attribute` tag has are multiple `usfm:match` tags
typo just 'has'?
Code quote:
has areREADME.md line 313 at r5 (raw file):
- [`usfm`](https://docs.usfm.bible/usfm/3.1/doc/usfm.html) with marker type `para` and no default attribute. This marker is present in USFM but most of the time is translated into the `usx` marker in USX and the `USJ` marker in USJ - Note that `usfm` is a special `para` in that its text content is considered to be `version`, which gets translated to `usx` and `USJ` as an attribute. - [`USJ`](https://docs.usfm.bible/usfm/3.1/doc/usfm.html) with marker type `USJ` and no default attribute. This marker is present in USJ but is translated into the `usx` marker in `USX` and the `usfm` marker in USFM.
typo just "USX", i.e. without the backticks?
Code quote:
`USX`src/markers-map.model.template.ts line 31 at r5 (raw file):
* List of normal marker names for which this marker is an attribute marker. * * For example, `ca` and `cp` are attribute markers for `c`. `isAttributeMarkerFor` would be
BTW what about using @example in all your TSDocs?
Code quote:
For example,
tjcouch-sil
left a comment
There was a problem hiding this comment.
Reviewable status: 19 of 24 files reviewed, 10 unresolved discussions (waiting on @irahopkinson and @lyonsil)
package.json line 5 at r4 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
BTW would be a good idea to add
"private": true,if you don't intend to publish this repo.
Coincidentally, I have been thinking there is a significant change we will want to publish this repo, whereas we don't have "private": true on many other repos that we definitely don't intend to publish. 😂
package.json line 49 at r4 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
BTW those 2 red symbols at the bottom of the file is because GH expects a blank line at the end of the file.
Indeed! I'm not sure how it ended up without a newline at some point, but it has one now :)
README.md line 180 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
typo just 'another'?
Haha whoops! Thanks :)
README.md line 193 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
typo just 'has'?
Indeed - thanks!
README.md line 313 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
typo just "USX", i.e. without the backticks?
Fixed - thanks!
src/markers-3.1.json line 6 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
BTW the plus at the end doesn't look like part of a commit hash.
Indeed; it's explained in the TSDocs to be an indication that there were working changes when this was generated.
That being said, I don't think this should have been generated when there were working changes. I'll regenerate it
src/markers-map.model.template.ts line at r4 (raw file):
Previously, lyonsil (Matt Lyons) wrote…
Hmm. That's fair. One day it would be nice to get all of these dependencies setup "properly."
😮💨 indeed
src/markers-map.model.template.ts line 31 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
BTW what about using
@examplein all your TSDocs?
Good idea! Helped with reorganizing some things too
src/markers-map.util.ts line 61 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
Err... what about just doing
element.nextElementSibling ?? undefined? If it's not that simple then perhaps update the comment.
Ah gotcha; makes sense that would be confusing. nextElementSibling is not available in Node; there are a number of utilities that are in browser environments that aren't in Node, sadly
src/markers-map.util.ts line 142 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
BTW use
console.warninstead? Here and elsewhere.
I believe warn goes on stderr, but I wanted these things to go on stdout for unit testing purposes
src/markers-map.util.ts line 333 at r5 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
typo change 'arrays' to 'strings'
Oops! Thanks :)
irahopkinson
left a comment
There was a problem hiding this comment.
@irahopkinson reviewed 5 of 5 files at r6, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)
src/markers-map.util.ts line 142 at r5 (raw file):
Previously, tjcouch-sil (TJ Couch) wrote…
I believe
warngoes onstderr, but I wanted these things to go onstdoutfor unit testing purposes
Gotcha
src/markers-map.util.ts line 62 at r6 (raw file):
* Helper function to get next child element of this element's parent. Almost the exact same as * `element.nextElementSibling` (which is not available in Node), but this returns `undefined` * because `null` is dumb
FYI every time I read this it makes be laugh out loud - even after reading it so many times already. 😆
Code quote:
because `null` is dumb
tjcouch-sil
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)
src/markers-map.util.ts line 62 at r6 (raw file):
Previously, irahopkinson (Ira Hopkinson) wrote…
FYI every time I read this it makes be laugh out loud - even after reading it so many times already. 😆
😂
a3f77b4 to
b347585
Compare
lyonsil
left a comment
There was a problem hiding this comment.
Thanks for the additional documentation to help clarify this messy topic.
@lyonsil reviewed 5 of 5 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)
irahopkinson
left a comment
There was a problem hiding this comment.
@irahopkinson reviewed 5 of 5 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @tjcouch-sil)
Goes with paranext/paranext-core#1887
For https://paratextstudio.atlassian.net/browse/PT-2358, I need to know what types each marker is and which attribute is default. So here's some stuff to make that happen.
The code is rather messy because I started with vibe coding and then had to hand-edit a bunch of stuff to make it work. :/
This change is