fix: replace Markdown image links and raw <img> tags with alt text#187
fix: replace Markdown image links and raw <img> tags with alt text#187dsanders11 wants to merge 1 commit intomainfrom
Conversation
| // Replace <br> elements with a newline | ||
| if (tokenToCheck.content.match(/<br\s*\/?>/)) { | ||
| joinedContent += '\n'; | ||
| } |
There was a problem hiding this comment.
Intentional fallthrough as <img> can be both html_inline and html_block. Will add a comment about the intentional fallthrough.
| } | ||
| case 'html_block': | ||
| // Replace <img> and <image> tags with [Image: <alt text>] | ||
| const imgMatch = tokenToCheck.content.match(/<(?:img|image)\b[^>]*>/i); |
There was a problem hiding this comment.
Is <image really valid syntax? surely not
Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like <hr />
There was a problem hiding this comment.
Is
<imagereally valid syntax? surely not
Claude special, not sure why it decided <image> was syntax it needed to handle. 😅 Will clean up.
Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like
<hr />
We can make this more robust, but I think "Good enough" might suffice here? If the regex approaches misses some cases they'll just be excluded from the output all together, which seems like an acceptable failure mode. Is there a main issue you want to avoid with more robust parsing?
Fixes #186.
Alt text seems like the easiest solution to this issue for the moment, and it fixes a real example from our existing docs which I've included as a test case.