Skip to content

fix: replace Markdown image links and raw <img> tags with alt text#187

Open
dsanders11 wants to merge 1 commit intomainfrom
fix/images-in-descriptions
Open

fix: replace Markdown image links and raw <img> tags with alt text#187
dsanders11 wants to merge 1 commit intomainfrom
fix/images-in-descriptions

Conversation

@dsanders11
Copy link
Copy Markdown
Member

Fixes #186.

Alt text seems like the easiest solution to this issue for the moment, and it fixes a real example from our existing docs which I've included as a test case.

@dsanders11 dsanders11 requested a review from a team as a code owner April 12, 2026 07:37
// Replace <br> elements with a newline
if (tokenToCheck.content.match(/<br\s*\/?>/)) {
joinedContent += '\n';
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost a break:?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional fallthrough as <img> can be both html_inline and html_block. Will add a comment about the intentional fallthrough.

}
case 'html_block':
// Replace <img> and <image> tags with [Image: <alt text>]
const imgMatch = tokenToCheck.content.match(/<(?:img|image)\b[^>]*>/i);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is <image really valid syntax? surely not

Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like <hr />

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is <image really valid syntax? surely not

Claude special, not sure why it decided <image> was syntax it needed to handle. 😅 Will clean up.

Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like <hr />

We can make this more robust, but I think "Good enough" might suffice here? If the regex approaches misses some cases they'll just be excluded from the output all together, which seems like an acceptable failure mode. Is there a main issue you want to avoid with more robust parsing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make it possible to embed images

2 participants