Open
Conversation
Expand cleanup to handle more Pandoc/Calibre conversion artifacts:
- Attribute blocks with #id and .class (e.g. {#calibre_link-0 .calibre3})
- Escaped bracket wrapping \[...\] — only at line boundaries to
preserve LaTeX display math mid-line
- Empty headings (# with no text)
- Heading [*Title*] wrapping via specific regex match
Each rule includes before/after samples in the docstring.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
扩展
clean_calibre_markers()处理更多 Pandoc/Calibre 转换残留,每条规则附 before/after 样本。从 PR #2 拆出来的第二项改动(文本清理),按 review 建议修正了激进的全局替换。
改动内容
## Title {#calibre_link-0 .calibre3}## Title\[Some paragraph.\]Some paragraph.##[*text*]包裹## [*Some Title*]## Some Title针对 review 意见的修正
\[/\]不再全局替换 — 改用(?m)^\\?\\\[和(?m)\\?\\\]$,只匹配行首/行尾,不会破坏 LaTeX display math\[...\]strip('[]* ')已移除 — 标题清理改用re.match(r'^\[\s*\*(.+?)\*\s*\]$', text)精确匹配[*text*]模式,不会误删合法字符Test plan
text{.calibre5}→text\[...\]被移除\[x+1\]不受影响## [*Some Title*]→## Some Title[**Chapter One**]→**Chapter One**🤖 Generated with Claude Code