Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements version 0.2 with significant infrastructure and feature enhancements, focusing on deployment automation, file upload conflict resolution, and improved dark mode support.
Key Changes
- Deployment infrastructure: Replaces development scripts with production-ready deploy.sh featuring automated dependency installation, firewall configuration, and public network support
- File conflict handling: Implements comprehensive upload conflict detection and resolution (overwrite/rename) across frontend and backend with user-friendly dialog interfaces
- Database improvements: Adds paper-conversation binding and per-user filename uniqueness constraints to prevent duplicates
Reviewed changes
Copilot reviewed 31 out of 32 changed files in this pull request and generated 26 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/deploy.sh | New production deployment script with UFW firewall setup, parallel dependency installation, and public IP configuration |
| scripts/docker-infra.sh | Removed infrastructure-only startup script (consolidated into deploy.sh) |
| scripts/dev1.sh | Removed development environment script (consolidated into deploy.sh) |
| redis.conf | Changed bind address from 127.0.0.1 to 0.0.0.0 for broader network access |
| docker-compose.yml | Updated Postgres password from default to project-specific value |
| backend/config/magic-pdf.config.json | Switched device-mode from CPU to CUDA for GPU acceleration |
| backend/migrations/versions/20251223_upload_conflict_and_conversation_binding.py | Adds paper_id to conversations table and unique constraint on user_id+original_filename for uploads |
| backend/app/models/uploaded_paper.py | Adds conversations relationship to support paper-conversation binding |
| backend/app/models/conversation.py | Adds paper_id foreign key and relationship for document binding |
| backend/app/utils/file_naming.py | Implements build_storage_name_with_email for cross-user unique storage names |
| backend/app/services/mineru_service.py | Changes file writing from md_writer to direct Path.write_text with error handling |
| backend/app/workers/mineru_runner.py | Uses ensure_ascii=True to prevent UnicodeEncodeError with surrogate characters |
| backend/app/schemas/conversation.py | Adds optional paper_id field to conversation schema |
| backend/app/db/repository.py | Adds get_by_original_name and purge_cached_artifacts methods for conflict detection and cleanup |
| backend/app/db/note_repository.py | Adds detach_uploaded_paper for unlinking notes when papers are deleted |
| backend/app/db/conversation_repository.py | Adds paper-aware conversation retrieval and bulk deletion for paper cleanup |
| backend/app/api/v1/endpoints/users.py | Changes account deletion from soft-delete to hard-delete with full data cleanup |
| backend/app/api/v1/endpoints/papers.py | Implements upload conflict detection, resolution logic (overwrite/rename), and conversation cleanup |
| backend/app/api/v1/endpoints/library.py | Updates save_pdf_bytes to use email-based storage naming |
| frontend/src/lib/api-client.ts | Adds UploadConflictError class and conflict handling in uploadPaper function |
| frontend/src/components/note-editor/NoteEditorPanel.tsx | Adds resetEditor method, improves dark mode support with data-color-mode attributes |
| frontend/src/components/layout/dashboard-shell.tsx | Sets data-color-mode attributes for GitHub/Primer-based component theming |
| frontend/src/components/ai-agent/AgentChatPanel.tsx | Improves conversation history loading with better error handling and cache cleanup |
| frontend/src/app/upload/page.tsx | Adds conflict resolution dialog with rename/overwrite options |
| frontend/src/app/smart-reading/page.tsx | Implements conflict handling UI and conversation cache cleanup on overwrite |
| frontend/src/app/profile/page.tsx | Updates back button href from "/" to "/home" and adds data-color-mode support |
| frontend/src/app/notes/page.tsx | Fixes SSR hydration mismatch by deferring token read to client-side |
| frontend/src/app/notes/new/page.tsx | Adds hydration guard to prevent SSR/client state mismatches |
| frontend/src/app/notes/[id]/page.tsx | Enhances dark mode styling and adds token check in bootstrap |
| frontend/src/app/library/page.tsx | Changes anchor tag to Link component for client-side navigation |
| frontend/src/app/academic/page.tsx | Uses fallback chain for userName display |
Comments suppressed due to low confidence (1)
redis.conf:9
- Changing Redis to
bind 0.0.0.0whileprotected-mode nois set, combined with the Docker mapping6379:6379indocker-compose.yml, exposes an unauthenticated Redis instance on all network interfaces. An external attacker who can reach this host can connect to Redis, read/modify all cached data, and potentially achieve remote code execution via configuration changes or module loading. Restrict Redis to listen only on localhost or an internal network interface, re-enableprotected-mode, and avoid publishing port 6379 directly to untrusted networks.
bind 0.0.0.0
# Disable protected mode because access is restricted by Docker network
protected-mode no
| # 尝试自动获取公网 IP,如果失败则使用你提供的默认 IP | ||
| info "正在获取公网 IP 地址..." | ||
| PUBLIC_IP=$(curl -s --max-time 3 ifconfig.me || echo "36.103.203.223") |
There was a problem hiding this comment.
The public IP address is hardcoded as a fallback value, which appears to be a specific personal or organizational IP. This should not be committed to version control as it exposes infrastructure details. Consider using a placeholder like "YOUR_PUBLIC_IP" or removing the fallback entirely to force explicit configuration.
| # 尝试自动获取公网 IP,如果失败则使用你提供的默认 IP | |
| info "正在获取公网 IP 地址..." | |
| PUBLIC_IP=$(curl -s --max-time 3 ifconfig.me || echo "36.103.203.223") | |
| # 尝试自动获取公网 IP;如果失败,则要求通过环境变量 PUBLIC_IP 显式提供 | |
| info "正在获取公网 IP 地址..." | |
| if [ -z "${PUBLIC_IP-}" ]; then | |
| PUBLIC_IP="$(curl -s --max-time 3 ifconfig.me || true)" | |
| fi | |
| if [ -z "$PUBLIC_IP" ]; then | |
| error "无法自动获取公网 IP,请通过环境变量 PUBLIC_IP 指定,例如:PUBLIC_IP=\"YOUR_PUBLIC_IP\" ./scripts/deploy.sh" | |
| exit 1 | |
| fi |
| "uq_uploaded_papers_user_original", | ||
| "uploaded_papers", | ||
| ["user_id", "original_filename"], | ||
| ) | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| op.drop_constraint("uq_uploaded_papers_user_original", "uploaded_papers", type_="unique") |
There was a problem hiding this comment.
The unique constraint name "uq_uploaded_papers_user_original" could be more descriptive. Consider using a name that clearly indicates what is being constrained, such as "uq_uploaded_papers_user_id_original_filename" to make it easier to understand the constraint purpose when debugging database issues.
| "uq_uploaded_papers_user_original", | |
| "uploaded_papers", | |
| ["user_id", "original_filename"], | |
| ) | |
| def downgrade() -> None: | |
| op.drop_constraint("uq_uploaded_papers_user_original", "uploaded_papers", type_="unique") | |
| "uq_uploaded_papers_user_id_original_filename", | |
| "uploaded_papers", | |
| ["user_id", "original_filename"], | |
| ) | |
| def downgrade() -> None: | |
| op.drop_constraint( | |
| "uq_uploaded_papers_user_id_original_filename", | |
| "uploaded_papers", | |
| type_="unique", | |
| ) |
| # md_writer.write_string(f"{stem}.md", md_content) | ||
| (target_dir / f"{stem}.md").write_text(md_content, encoding="utf-8", errors="replace") | ||
|
|
||
| # 2. Content List (JSON) | ||
| content_list = union_make(pdf_info, MakeMode.CONTENT_LIST, "images") | ||
| md_writer.write_string(f"{stem}_content_list.json", json.dumps(content_list, ensure_ascii=False, indent=4)) | ||
| # md_writer.write_string(f"{stem}_content_list.json", json.dumps(content_list, ensure_ascii=False, indent=4)) | ||
| (target_dir / f"{stem}_content_list.json").write_text( | ||
| json.dumps(content_list, ensure_ascii=False, indent=4), | ||
| encoding="utf-8", | ||
| errors="replace" | ||
| ) | ||
|
|
||
| # 3. Middle JSON | ||
| md_writer.write_string(f"{stem}_middle.json", json.dumps(middle_json, ensure_ascii=False, indent=4)) | ||
| # md_writer.write_string(f"{stem}_middle.json", json.dumps(middle_json, ensure_ascii=False, indent=4)) | ||
| (target_dir / f"{stem}_middle.json").write_text( | ||
| json.dumps(middle_json, ensure_ascii=False, indent=4), | ||
| encoding="utf-8", | ||
| errors="replace" | ||
| ) |
There was a problem hiding this comment.
The commented-out code using md_writer should be removed rather than left as comments. If this is a temporary change during debugging or testing, add a TODO comment explaining when and why it should be reverted. Leaving commented code reduces maintainability and creates confusion about the intended behavior.
| "models-dir": "/tmp/magic-pdf/models", | ||
| "layoutreader-model-dir": "/tmp/magic-pdf/models/Layout", | ||
| "device-mode": "cpu", | ||
| "device-mode": "cuda", |
There was a problem hiding this comment.
Changing device-mode from "cpu" to "cuda" assumes CUDA is available on all deployment environments. This will cause runtime errors on systems without GPU support. Consider making this configurable via environment variable or add a fallback mechanism to detect CUDA availability at runtime.
| "device-mode": "cuda", | |
| "device-mode": "cpu", |
| <div className="mx-auto w-full px-4 pt-6 md:px-6 md:pt-8"> | ||
| <Link | ||
| href="/" | ||
| href="/home" |
There was a problem hiding this comment.
The link now navigates to "/home" instead of "/". Ensure that the "/home" route exists and is properly configured in your routing setup. If "/" was the intended home route, this change might cause a 404 error. Verify this is intentional and that the route is defined.
| href="/home" | |
| href="/" |
| const message = error instanceof Error ? error.message : ""; | ||
| if (message.includes("不存在") || message.toLowerCase().includes("not found")) { |
There was a problem hiding this comment.
The error message check uses message.includes("不存在") || message.toLowerCase().includes("not found"), but this is fragile as it depends on specific error message text. If the backend error messages change, this logic will break. Consider using HTTP status codes or custom error codes instead of parsing error message strings for business logic decisions.
| const message = error instanceof Error ? error.message : ""; | |
| if (message.includes("不存在") || message.toLowerCase().includes("not found")) { | |
| const anyError = error as any; | |
| const status = anyError?.status ?? anyError?.response?.status; | |
| if (status === 404) { |
| existing = await repo.get_by_original_name(current_user.id, target_display_name) | ||
|
|
||
| # 解析冲突处理策略 | ||
| resolution = (conflict_resolution or "").strip().lower() or None |
There was a problem hiding this comment.
The variable resolution is normalized to lowercase and then compared against None and string literals. However, if conflict_resolution is an empty string after stripping, the expression resolves to None due to "or None". This logic could be clearer. Consider: resolution = conflict_resolution.strip().lower() if conflict_resolution and conflict_resolution.strip() else None
| resolution = (conflict_resolution or "").strip().lower() or None | |
| resolution = ( | |
| conflict_resolution.strip().lower() | |
| if conflict_resolution and conflict_resolution.strip() | |
| else None | |
| ) |
| # 2. 核心安全防护:允许 SSH (端口22) 从任何地方访问,或者从你当前的 IP 访问 | ||
| # 建议先开放全网 SSH 端口,防止失联。如果追求极高安全,可以只写你现在的 IP。 | ||
| sudo ufw allow 22/tcp comment 'Allow SSH' |
There was a problem hiding this comment.
The firewall configuration allows SSH from anywhere (port 22), which could be a security risk. While the comment acknowledges this, it's recommended to restrict SSH access to known IP ranges or use additional security measures like fail2ban, key-only authentication, or a VPN. Consider adding a warning or requiring explicit confirmation before running this setup.
| # 2. 核心安全防护:允许 SSH (端口22) 从任何地方访问,或者从你当前的 IP 访问 | |
| # 建议先开放全网 SSH 端口,防止失联。如果追求极高安全,可以只写你现在的 IP。 | |
| sudo ufw allow 22/tcp comment 'Allow SSH' | |
| # 2. 核心安全防护:SSH(22) 访问控制 | |
| warn "即将配置 SSH(22) 访问规则:从任意 IP 开放 SSH 存在安全风险(暴力破解、扫描等)。" | |
| read -r -p "是否允许从任意 IP 访问 SSH(22)?输入 'yes' 确认,默认仅允许当前公网 IP: " ssh_anywhere | |
| if [[ "${ssh_anywhere:-}" == "yes" ]]; then | |
| info "允许从任意 IP 访问 SSH(22)。请确保已配置强密码/密钥登录等安全措施。" | |
| sudo ufw allow 22/tcp comment 'Allow SSH from anywhere' | |
| else | |
| CURRENT_MY_IP=$(curl -s https://ifconfig.me || true) | |
| if [[ -n "${CURRENT_MY_IP:-}" ]]; then | |
| info "仅允许当前公网 IP ${CURRENT_MY_IP} 访问 SSH(22)。" | |
| sudo ufw allow from "$CURRENT_MY_IP" to any port 22 proto tcp comment 'Allow SSH from current IP' | |
| else | |
| warn "无法自动获取当前公网 IP,请手动配置 SSH 访问规则(例如:sudo ufw allow from <YOUR_IP>/32 to any port 22 proto tcp)。" | |
| fi | |
| fi |
| if paper_id is not None: | ||
| stmt = stmt.where(Conversation.paper_id == paper_id) |
There was a problem hiding this comment.
The condition paper_id is not None uses explicit None comparison, but on line 48 the comparison uses == paper_id. For consistency with Python best practices and the existing code style, use explicit None comparison throughout: if paper_id is not None:
| NEXT_PUBLIC_BACKEND_URL="http://$PUBLIC_IP:8000" npm run dev -- -H 0.0.0.0 | ||
| ) & | ||
| FRONTEND_PID=$! | ||
|
|
||
| # --- 4) 最终信息输出 --- | ||
| # 清屏并显示最终访问地址 | ||
| sleep 2 | ||
| success "================================================" | ||
| success " InsightReading 服务已全部在公网启动!" | ||
| success "================================================" | ||
| info "▶ 前端访问地址: http://$PUBLIC_IP:3000" | ||
| info "▶ 后端接口地址: http://$PUBLIC_IP:8000/docs" |
There was a problem hiding this comment.
The deployment script exposes the backend API over plain HTTP (http://$PUBLIC_IP:8000) and instructs users to access it directly, even though the backend implements authentication and password handling. This means login credentials, session tokens, and other sensitive API traffic can be intercepted or modified by any attacker on the network path between clients and the server. Terminate TLS in front of the backend (e.g., via a reverse proxy or load balancer) and ensure both the public URLs and NEXT_PUBLIC_BACKEND_URL use HTTPS for all external access.
No description provided.