fix: improve Calibre artifact cleanup by fredchu · Pull Request #5 · deusyu/translate-book

fredchu · 2026-03-22T15:49:34Z

Summary

扩展 clean_calibre_markers() 处理更多 Pandoc/Calibre 转换残留，每条规则附 before/after 样本。

从 PR #2 拆出来的第二项改动（文本清理），按 review 建议修正了激进的全局替换。

改动内容

规则	Before	After
属性块（含 #id）	`## Title {#calibre_link-0 .calibre3}`	`## Title`
转义括号包裹	`\[Some paragraph.\]`	`Some paragraph.`
空标题	`##`	(removed)
标题 `[text]` 包裹	`## [Some Title]`	`## Some Title`

针对 review 意见的修正

\[ / \] 不再全局替换 — 改用 (?m)^\\?\\\[ 和 (?m)\\?\\\]$，只匹配行首/行尾，不会破坏 LaTeX display math \[...\]
strip('[]* ') 已移除 — 标题清理改用 re.match(r'^\[\s*\*(.+?)\*\s*\]$', text) 精确匹配 [*text*] 模式，不会误删合法字符

Test plan

属性块移除：text{.calibre5} → text
转义括号：行首行尾的 \[...\] 被移除
LaTeX 保护：行中的 \[x+1\] 不受影响
标题清理：## [*Some Title*] → ## Some Title
bold 括号：[**Chapter One**] → **Chapter One**

🤖 Generated with Claude Code

Expand cleanup to handle more Pandoc/Calibre conversion artifacts: - Attribute blocks with #id and .class (e.g. {#calibre_link-0 .calibre3}) - Escaped bracket wrapping \[...\] — only at line boundaries to preserve LaTeX display math mid-line - Empty headings (# with no text) - Heading [*Title*] wrapping via specific regex match Each rule includes before/after samples in the docstring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fredchu mentioned this pull request Mar 22, 2026

fix: improve Calibre artifact cleanup and output file naming #2

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve Calibre artifact cleanup#5

fix: improve Calibre artifact cleanup#5
fredchu wants to merge 1 commit intodeusyu:mainfrom
fredchu:fix/calibre-text-cleanup

fredchu commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fredchu commented Mar 22, 2026

Summary

改动内容

针对 review 意见的修正

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant