Skip to content

hwajongpark/mdx-validate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mdx-validate — catch broken content before your readers do.

npm version npm downloads License: MIT Node 18+

mdx-validate

Catches broken content before your readers do: links that go nowhere, pages that will not load, and frontmatter your code can no longer read.

The Problem

Some of your content is wired to other things. A link is wired to another page. The body of a page is wired to whatever renders it. A frontmatter field is wired to the code that reads it by name.

Any of those wires can break. The painful part is that they break quietly. The page does not crash and nothing turns red. It just ships wrong, and you find out from a reader, or from Google Search Console, weeks later.

  • A page gets renamed, and every old link to it now leads to a dead end.
  • Someone leaves a stray character in a page, and that one page stops loading entirely.
  • Someone renames a frontmatter field, and the code that built that part of the page can no longer find it, so the page ships with a blank section.

A spell-checker or a style linter cannot catch any of this, because the words still look fine. The only way to catch a broken wire is to check both ends of it. That is the whole job of this tool: it checks both ends of every wire in your content, before you deploy.

What it checks

Check In plain words The silent failure it prevents
Links resolve Every internal link points to a page you actually have A renamed or deleted page leaves dead links that 404
Pages render Every page actually compiles One stray character takes a page down at render time
Frontmatter parses The settings block at the top of each file is valid A typo up top breaks the build
Fields match Each field your code reads is named and shaped the way the code expects You rename a field, the code cannot find it, and that part of the page ships blank

The first three are the obvious ones. The last one is the one nothing else catches, and it is the reason this tool exists.

Here is the bug that made me build this. A site I run keeps each guide's FAQs in its frontmatter, and the code turns that list into two things: the FAQ section readers see, and the hidden markup that makes those questions show up directly in Google results. The code expected each entry to be labeled q and a. One day a batch of translated guides arrived labeled question and answer instead. Nothing crashed. The guides looked perfect in the editor and in preview. But every FAQ section shipped empty, and the questions quietly fell out of Google. I did not notice for weeks, until Google Search Console reported the markup had broken across a dozen pages.

A spell-checker would not catch it. A link-checker would not catch it. The labels were simply wrong, and only the code knew which labels were right. So this tool lets you write the right labels down once, "the faq field must have q and a," and it catches the mismatch the moment it appears, before anything ships. You decide which fields matter and what shape they take, and the tool holds your content to it.

Why these checks exist

That FAQ bug was not a one-off. Each check here is in the tool because something like it broke in production on a real multi-language content site:

  • A leftover review note took whole pages down. A translator left an HTML comment (<!-- check this -->) in a guide. MDX does not allow HTML comments, so the entire guides listing for two languages returned a 500. (Pages render.)
  • Renamed guides left dead links across five languages. Internal links kept pointing at slugs that no longer existed, and they 404'd for weeks, burning crawl budget. (Links resolve.)
  • A "HowTo" with no steps quietly became a plain article. A page declared schema: HowTo but the steps were dropped in translation, so the Google rich result silently downgraded. (Fields match.)
  • A stray quote in the frontmatter broke the build. An unescaped " inside a YAML string crashed the parser. (Frontmatter parses.)

Demo

The demo runs against examples/content/, which deliberately contains broken files so you can see what a catch looks like. It exits non-zero on purpose:

mdx-validate: FAIL. 5 error(s) across 4 of 5 file(s).

examples/content/broken-faq.mdx
  [SHAPE] faq[0] is missing q, a (it has: question, answer); your code reads { q, a } and will silently drop this entry

examples/content/dead-link.mdx
  [BROKEN-LINK] links to "/guides/this-page-was-renamed" but no content resolves to slug "this-page-was-renamed"

examples/content/howto-no-steps.mdx
  [SHAPE-MISSING] "steps" is required here but missing (expected an array of { name, text })

examples/content/html-comment.mdx
  [HTML-COMMENT] body contains an HTML comment <!-- ... -->; MDX rejects it (use {/* ... */} or remove)
  [MDX-COMPILE] Unexpected character `!` (U+0021) before name, expected a character that can start a name

The one file that passes, good-guide.mdx, is not listed. Clean files stay quiet.

Quick Start

See it work:

git clone https://github.com/hwajongpark/mdx-validate
cd mdx-validate
npm install
npm run demo

Use it on your own content:

npm install --save-dev mdx-validate

# copy the example config and point it at your content
cp node_modules/mdx-validate/examples/mdx-validate.config.example.json ./mdx-validate.config.json

# edit mdx-validate.config.json, then:
npx mdx-validate

Wire it into your build so a broken page fails the deploy, not the reader:

{
  "scripts": {
    "prebuild": "mdx-validate"
  }
}

Configuration

One mdx-validate.config.json at your project root. The included examples/mdx-validate.config.example.json is the demo config.

{
  "contentDir": "content",
  "extensions": [".mdx"],
  "requiredFrontmatter": ["title", "description"],
  "shapeContracts": [
    { "field": "faq", "itemShape": ["q", "a"] },
    { "field": "sources", "itemShape": ["label", "url"] },
    { "field": "steps", "itemShape": ["name", "text"], "when": { "field": "schema", "equals": "HowTo" } }
  ],
  "internalLink": {
    "pattern": "/guides/([a-z0-9-]+)",
    "resolveTargetsFrom": "content",
    "targetExtensions": [".mdx"],
    "extraValidSlugs": []
  }
}
  • contentDir: the folder to check. No glob needed; it walks the tree.
  • requiredFrontmatter: fields that must be present and not empty.
  • shapeContracts: the "fields match" check. Each line says "this field must be a list of items with these keys." Add "when" to make it conditional, so steps is only required when schema is HowTo.
  • internalLink: a pattern with one capture group for the slug, and where to find valid slugs (the filenames of your content). extraValidSlugs covers pages served by a redirect instead of a file.

How It Works

It checks both ends of every wire. A link is only good if the page it points to exists. A field is only good if it is shaped the way the code that reads it expects. The tool knows both ends and compares them.

You define what matters, not the tool. Your fields and shapes are yours to declare. The tool has no opinion about your content, only that your content matches the rules you wrote down.

It compiles the real MDX, it does not guess. The render check runs the actual MDX compiler, so it catches anything that would throw on a real page, not just patterns a regex anticipated.

It reports, you fix. It finds the break and tells you exactly where. Fixing is your call, because sometimes a change is intentional. It never rewrites your content.

Almost no moving parts. It walks folders with the standard library, reads frontmatter with gray-matter, and compiles with @mdx-js/mdx. That is the entire toolchain. Exit codes are built for CI: 0 clean, 1 something broke, 2 a config or internal error.

What It Does Not Do

  • It is not a style or spelling checker. For formatting and prose, use a style linter or Prettier. This checks that things are wired correctly, not that they read well.
  • It does not check that external websites are up. It only checks that your own internal links point to content you have.
  • It does not auto-fix. It shows you every break; you decide the fix.

Contributing

Contributions are welcome, and the most useful kind is a class of silent break this tool does not yet catch. If you have hit one, open an issue describing it, or a pull request. The fastest way to land a fix is a reproduction in examples/content/: a small broken file plus the catch you expected. Bug reports and false positives are welcome too.

License

MIT

About

Catches broken content before your readers do: links that go nowhere, pages that won't compile, and frontmatter your code can no longer read. A pre-deploy check for MDX sites.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors