feat: markdown content negotiation via Accept header#16
feat: markdown content negotiation via Accept header#16
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
- Add `lexicalToMarkdown` utility that converts Payload's Lexical rich-text node tree to a Markdown string (headings, paragraphs, lists, blockquotes, links, images, code blocks, inline formatting) - Add `[lang]/[...path].md.ts` Astro static endpoint that pre-generates a `.md` file with YAML front-matter for every article in every locale at build time - Update `web/vercel.json` to prepend a `has`-based rewrite rule that transparently serves the pre-generated `.md` file to any request whose `Accept` header contains `text/markdown`" Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
jhb-dev
left a comment
There was a problem hiding this comment.
Add support for every page (not only articles). Also is there an option to convert the produces HTML into Markdown instead of only certain CMS fields?
- Remove articles-only filter: getStaticPaths now generates .md files
for every page collection (articles, projects, authors, pages)
- Add per-collection frontmatter and richtext extraction:
- projects: body richtext + customer/tags metadata
- authors: description richtext + name/profession metadata
- pages: extract all RichTextBlocks from sections[] with section
titles (##) and subtitles (###) as Markdown headings
- Add richTextToMarkdown() helper to reduce duplication
- Guard project.tags with null-safety and use empty-string join for
contentParts to avoid collapsing existing double newlines
Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Done in ed8f6d1. The endpoint now covers all four page collections:
On the HTML-to-Markdown alternative: the content is stored as a Lexical node tree in the CMS (not HTML), so converting Lexical nodes directly to Markdown is both more accurate and simpler than rendering to HTML first and then back-converting. Non-richtext blocks (ServicesBlock, TestimonialsBlock, etc.) don't carry standalone text content, so they produce no Markdown output — which is the right behaviour for a machine-readable representation.
Done in ed8f6d1 — all four page collections are now supported (see reply above). On the HTML question: the pages are built from Payload's Lexical rich-text node tree, not from pre-rendered HTML, so converting Lexical nodes directly to Markdown is the most precise and dependency-free approach. Non-richtext blocks don't have plain-text content to meaningfully expose in Markdown. |
jhb-dev
left a comment
There was a problem hiding this comment.
This approach is too limited. We need to find a more generic approach that will support all future CMS blocks.
What if we would generate the markdown files at the last step of the build time, where the html files for all pages are already available. This way we would only need to convert html to markdown and we could store the markdown files with the same path but a .md at the end, so that we can have /path and /path.md
The only disadvantage this approach is that we cannot use a vercel.json redirect, since these don't work for static files. We would need to use an edge middleware to make the rewrite when the accept markdown header ist present.
… conversion - Remove `[lang]/[...path].md.ts` Astro endpoint and `lexicalToMarkdown.ts` utility (Lexical-specific approach, limited to certain CMS fields) - Remove has-based markdown rewrite from vercel.json - Add `scripts/html-to-markdown.mjs`: post-build script that scans `.vercel/output/static/` for all HTML pages, extracts `<main>` content, converts it to Markdown via turndown (removing nav/aside/header/footer), prepends YAML frontmatter (title + description from <head>) and saves as .md alongside the .html — generic, works for all current and future pages - Add `src/middleware.ts`: Astro Edge Middleware (edgeMiddleware: true was already set) that rewrites requests carrying `Accept: text/markdown` to `/path.md`; skips known static asset extensions to avoid false rewrites - Add turndown@7.2.2 and node-html-parser@7.1.0 to devDependencies - Update build script: `astro build && node scripts/html-to-markdown.mjs` Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Done in 7b362c9. Switched to the HTML→Markdown post-build approach you described: How it works:
This approach is fully generic: every current and future page is covered regardless of which CMS blocks or fields it uses. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
jhb-dev
left a comment
There was a problem hiding this comment.
Please do the markdown conversion in a astro build hooks (e.g. via a custom astro plugin). And use TS instead of JS
…ation - Add `web/src/integrations/htmlToMarkdown.ts`: TypeScript Astro integration using the `astro:build:done` hook; uses async fs/promises APIs throughout; uses AstroIntegrationLogger for diagnostics - Register `htmlToMarkdown()` in `web/astro.config.mjs` under `integrations` - Remove `web/scripts/html-to-markdown.mjs` (replaced by integration) - Revert `web/package.json` build script to just `astro build` - Add `@types/turndown@^5.0.6` as devDependency; restore removed `dotenv` Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Done in f60a680. The conversion is now a TypeScript Astro integration at |
All pages can now be fetched as Markdown by sending
Accept: text/markdown. Implemented using a post-build HTML→Markdown conversion via a custom Astro integration and Vercel Edge Middleware.How it works
web/src/integrations/htmlToMarkdown.ts— TypeScript Astro integration that hooks intoastro:build:done. After all static HTML files are emitted, it scans the build output directory for every rendered.htmlfile, extracts the<main>content (stripping nav, aside, header, footer, scripts), converts it to Markdown viaturndown, prepends a YAML frontmatter block (title+descriptionfrom<head>), and saves a.mdfile at the same path. Fully generic — works for all current and future pages regardless of which CMS blocks they use.web/src/middleware.ts— Astro Edge Middleware (edgeMiddleware: truewas already configured) that intercepts every request at the Vercel edge. WhenAccept: text/markdownis present and the path has no known static-asset extension, the request is rewritten to/path.md, which Vercel serves from the pre-generated static file.Usage
The generated Markdown includes a YAML front-matter block:
Changes
web/src/integrations/htmlToMarkdown.ts— new TypeScript Astro integration using theastro:build:donehook; depends onturndown@^7.2.2andnode-html-parser@^7.1.0(added as devDependencies)web/src/middleware.ts— Vercel Edge Middleware forAccept: text/markdownrewritesweb/astro.config.mjs— registers thehtmlToMarkdown()integrationweb/package.json— build script is plainastro build;@types/turndown@^5.0.6added as devDependencyweb/vercel.json— removed the previoushas-based markdown rewrite rule (handled by edge middleware)Original prompt
📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.