Skip to content

feat: markdown content negotiation via Accept header#16

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-markdown-negotiation
Draft

feat: markdown content negotiation via Accept header#16
Copilot wants to merge 5 commits intomainfrom
copilot/add-markdown-negotiation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 15, 2026

All pages can now be fetched as Markdown by sending Accept: text/markdown. Implemented using a post-build HTML→Markdown conversion via a custom Astro integration and Vercel Edge Middleware.

How it works

  1. web/src/integrations/htmlToMarkdown.ts — TypeScript Astro integration that hooks into astro:build:done. After all static HTML files are emitted, it scans the build output directory for every rendered .html file, extracts the <main> content (stripping nav, aside, header, footer, scripts), converts it to Markdown via turndown, prepends a YAML frontmatter block (title + description from <head>), and saves a .md file at the same path. Fully generic — works for all current and future pages regardless of which CMS blocks they use.

  2. web/src/middleware.ts — Astro Edge Middleware (edgeMiddleware: true was already configured) that intercepts every request at the Vercel edge. When Accept: text/markdown is present and the path has no known static-asset extension, the request is rewritten to /path.md, which Vercel serves from the pre-generated static file.

Usage

# HTML (default)
curl https://jhb.software/de/articles/my-article

# Markdown
curl -H "Accept: text/markdown" https://jhb.software/de/articles/my-article

The generated Markdown includes a YAML front-matter block:

---
title: "My Article"
description: "..."
---

## Introduction

Article body…

Changes

  • web/src/integrations/htmlToMarkdown.ts — new TypeScript Astro integration using the astro:build:done hook; depends on turndown@^7.2.2 and node-html-parser@^7.1.0 (added as devDependencies)
  • web/src/middleware.ts — Vercel Edge Middleware for Accept: text/markdown rewrites
  • web/astro.config.mjs — registers the htmlToMarkdown() integration
  • web/package.json — build script is plain astro build; @types/turndown@^5.0.6 added as devDependency
  • web/vercel.json — removed the previous has-based markdown rewrite rule (handled by edge middleware)
Original prompt

This section details on the original issue you should resolve

<issue_title>markdown negotiation</issue_title>
<issue_description>Add markdown negotiation, via request headers.

Is this possible at the Vervel CDN layer? Since because of the routes being static, I guess it is not possible on the application layer, right?

Ensure that you research it first. </issue_description>

Comments on the Issue (you are @copilot in this section)


📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
website Ignored Ignored Mar 15, 2026 1:18pm
website-cms Ignored Ignored Mar 15, 2026 1:18pm

Request Review

- Add `lexicalToMarkdown` utility that converts Payload's Lexical
  rich-text node tree to a Markdown string (headings, paragraphs,
  lists, blockquotes, links, images, code blocks, inline formatting)
- Add `[lang]/[...path].md.ts` Astro static endpoint that
  pre-generates a `.md` file with YAML front-matter for every article
  in every locale at build time
- Update `web/vercel.json` to prepend a `has`-based rewrite rule that
  transparently serves the pre-generated `.md` file to any request
  whose `Accept` header contains `text/markdown`"

Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Copilot AI changed the title [WIP] Add markdown negotiation via request headers feat: markdown content negotiation via Accept header Mar 15, 2026
Copilot AI requested a review from jhb-dev March 15, 2026 11:29
jhb-dev

This comment was marked as outdated.

@jhb-dev jhb-dev self-requested a review March 15, 2026 11:54
Copy link
Copy Markdown
Contributor

@jhb-dev jhb-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support for every page (not only articles). Also is there an option to convert the produces HTML into Markdown instead of only certain CMS fields?

- Remove articles-only filter: getStaticPaths now generates .md files
  for every page collection (articles, projects, authors, pages)
- Add per-collection frontmatter and richtext extraction:
  - projects: body richtext + customer/tags metadata
  - authors: description richtext + name/profession metadata
  - pages: extract all RichTextBlocks from sections[] with section
    titles (##) and subtitles (###) as Markdown headings
- Add richTextToMarkdown() helper to reduce duplication
- Guard project.tags with null-safety and use empty-string join for
  contentParts to avoid collapsing existing double newlines

Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 15, 2026

Add support for every page (not only articles). Also is there an option to convert the produces HTML into Markdown instead of only certain CMS fields?

Done in ed8f6d1. The endpoint now covers all four page collections:

  • articlescontent richtext field
  • projectsbody richtext field + customer/tags frontmatter
  • authorsdescription richtext field + name/profession frontmatter
  • pages → all RichTextBlock entries inside sections[].blocks[], with section title rendered as ## and subTitle as ###

On the HTML-to-Markdown alternative: the content is stored as a Lexical node tree in the CMS (not HTML), so converting Lexical nodes directly to Markdown is both more accurate and simpler than rendering to HTML first and then back-converting. Non-richtext blocks (ServicesBlock, TestimonialsBlock, etc.) don't carry standalone text content, so they produce no Markdown output — which is the right behaviour for a machine-readable representation.

Add support for every page (not only articles). Also is there an option to convert the produces HTML into Markdown instead of only certain CMS fields?

Done in ed8f6d1 — all four page collections are now supported (see reply above). On the HTML question: the pages are built from Payload's Lexical rich-text node tree, not from pre-rendered HTML, so converting Lexical nodes directly to Markdown is the most precise and dependency-free approach. Non-richtext blocks don't have plain-text content to meaningfully expose in Markdown.

Copilot AI requested a review from jhb-dev March 15, 2026 12:00
Copy link
Copy Markdown
Contributor

@jhb-dev jhb-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is too limited. We need to find a more generic approach that will support all future CMS blocks.

What if we would generate the markdown files at the last step of the build time, where the html files for all pages are already available. This way we would only need to convert html to markdown and we could store the markdown files with the same path but a .md at the end, so that we can have /path and /path.md

The only disadvantage this approach is that we cannot use a vercel.json redirect, since these don't work for static files. We would need to use an edge middleware to make the rewrite when the accept markdown header ist present.

… conversion

- Remove `[lang]/[...path].md.ts` Astro endpoint and `lexicalToMarkdown.ts`
  utility (Lexical-specific approach, limited to certain CMS fields)
- Remove has-based markdown rewrite from vercel.json

- Add `scripts/html-to-markdown.mjs`: post-build script that scans
  `.vercel/output/static/` for all HTML pages, extracts `<main>` content,
  converts it to Markdown via turndown (removing nav/aside/header/footer),
  prepends YAML frontmatter (title + description from <head>) and saves as
  .md alongside the .html — generic, works for all current and future pages

- Add `src/middleware.ts`: Astro Edge Middleware (edgeMiddleware: true was
  already set) that rewrites requests carrying `Accept: text/markdown` to
  `/path.md`; skips known static asset extensions to avoid false rewrites

- Add turndown@7.2.2 and node-html-parser@7.1.0 to devDependencies
- Update build script: `astro build && node scripts/html-to-markdown.mjs`

Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 15, 2026

This approach is too limited. We need to find a more generic approach that will support all future CMS blocks.

What if we would generate the markdown files at the last step of the build time, where ...

Done in 7b362c9. Switched to the HTML→Markdown post-build approach you described:

How it works:

  1. astro build runs as usual, placing fully-rendered HTML in .vercel/output/static/
  2. node scripts/html-to-markdown.mjs then scans every .html file, extracts the <main> content (navigation, header, footer, scripts are stripped), converts it to Markdown with turndown, prepends a YAML frontmatter block (title + description from <head>), and saves a .md file at the same path
  3. src/middleware.ts (Vercel Edge Middleware — edgeMiddleware: true was already set) rewrites any request with Accept: text/markdown to /path.md; static asset paths are excluded by extension allowlist

This approach is fully generic: every current and future page is covered regardless of which CMS blocks or fields it uses.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • docs.astro.build
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)
  • telemetry.astro.build
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/website/website/web/node_modules/.bin/../astro/astro.js check (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from jhb-dev March 15, 2026 13:04
Copy link
Copy Markdown
Contributor

@jhb-dev jhb-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do the markdown conversion in a astro build hooks (e.g. via a custom astro plugin). And use TS instead of JS

…ation

- Add `web/src/integrations/htmlToMarkdown.ts`: TypeScript Astro integration
  using the `astro:build:done` hook; uses async fs/promises APIs throughout;
  uses AstroIntegrationLogger for diagnostics
- Register `htmlToMarkdown()` in `web/astro.config.mjs` under `integrations`
- Remove `web/scripts/html-to-markdown.mjs` (replaced by integration)
- Revert `web/package.json` build script to just `astro build`
- Add `@types/turndown@^5.0.6` as devDependency; restore removed `dotenv`

Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 15, 2026

Please do the markdown conversion in a astro build hooks (e.g. via a custom astro plugin). And use TS instead of JS

Done in f60a680. The conversion is now a TypeScript Astro integration at src/integrations/htmlToMarkdown.ts, registered in astro.config.mjs under integrations. It runs via the astro:build:done hook (after all HTML has been emitted), uses the hook's dir URL directly (no hard-coded paths), async fs/promises APIs throughout, and the AstroIntegrationLogger for diagnostics. The build script is back to plain astro build.

Copilot AI requested a review from jhb-dev March 15, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

markdown negotiation

2 participants