Skip to content

Add hosted anti-abuse guards#1

Open
Restuta wants to merge 2 commits intomainfrom
feat/anti-abuse-guards
Open

Add hosted anti-abuse guards#1
Restuta wants to merge 2 commits intomainfrom
feat/anti-abuse-guards

Conversation

@Restuta
Copy link
Owner

@Restuta Restuta commented Mar 20, 2026

Why

The hosted pubmd instance is live and usable, but it still lacks the basic protections needed for a free public write surface. This change adds a first-pass set of controls so namespace claiming and publishing are harder to abuse without adding a heavyweight auth system.

What

  • add reserved namespace validation
  • add markdown size caps
  • add claim and publish rate limits in the service layer
  • persist rate-limit buckets in both file-backed and Blob-backed storage
  • add reclaim-on-claim behavior for stale unpublished namespaces
  • extend integration coverage for reserved names, rate limits, oversize payloads, reclaim behavior, wrong-token publish, and slug conflicts
  • record the controls in the project plan and progress docs

Test plan

  • npm run verify
  • Blob-backed repository unit tests pass
  • HTTP integration tests cover claim/publish abuse cases
  • existing CLI integration flow still passes

@vercel
Copy link

vercel bot commented Mar 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
publish-it Ready Ready Preview, Comment Mar 22, 2026 3:51am

Request Review

@Restuta
Copy link
Owner Author

Restuta commented Mar 20, 2026

Self-review

  • Scope stayed focused on first-pass hosted anti-abuse controls.
  • The implementation lives in the service layer, with persistence added to both file-backed and Blob-backed stores.
  • I kept the controls lightweight: reserved namespaces, markdown size caps, claim/publish rate limits, and lazy reclaim for stale unpublished namespaces.
  • I did not add token rotation or heavier auth flows in this PR.

Risks / notes

  • File-store and Blob-store rate-limit buckets are intentionally simple fixed-window counters. Good enough for first-pass control, but not yet a full abuse analytics system.
  • The hosted instance still relies on the existing Vercel edge-cached read path; this PR does not revisit routing architecture.

Verification

  • npm run verify
  • 5 test files / 30 tests passing
  • Integration coverage added for reserved namespaces, rate limits, oversize markdown, reclaim behavior, wrong-token publish, and slug conflicts

@Restuta
Copy link
Owner Author

Restuta commented Mar 20, 2026

Code Review — Anti-Abuse Guards

Overall

Solid work. The scope is right — reserved namespaces, rate limits, size caps, and stale reclaim. All the things the discussion called out as missing from MVP. Test coverage is thorough (7 new integration tests).

Approved with minor notes

Good decisions:

  • Rate limiting in the service layer, not at the edge — portable across hosting targets
  • IP hashing with sha256().slice(0, 16) — no PII stored, just enough for bucketing
  • isReclaimableNamespace with configurable timeout — clean separation of policy
  • Replaced NamespacePageIndexSchema with blob.list() prefix scan — eliminates the race-prone read-modify-write on the namespace index. This is a meaningful architectural improvement, not just an abuse feature
  • computeNextRateLimitRecord is a pure function — easy to test independently
  • writeJson with flag: "wx" for atomic claim — prevents TOCTOU on namespace creation in FileStore
  • All limits are configurable via PublishServiceOptions with sane defaults
  • New error types map cleanly to HTTP status codes (409, 429, 413)

Minor observations (not blocking):

  1. Rate limit race conditionincrementRateLimit reads then writes without locking. Two concurrent requests could both read count=29, both write count=30, and neither gets rejected at limit=30. Same latent issue as the old namespace index race. Acceptable for a single-user tool, but worth noting if traffic grows.

  2. requestIp doesn't check x-forwarded-for — Vercel sets x-forwarded-for as the primary client IP header. x-real-ip and cf-connecting-ip are alternatives. Consider adding x-forwarded-for (first value only) as a fallback.

  3. ReservedNamespaceError returns 409 (Conflict) — 403 (Forbidden) might be more semantically correct since the namespace isn't "in conflict" — it's just not allowed. Minor.

  4. Publish rate limit test does 30 sequential HTTP requests — this test will be slow (~1-2s). Fine for now, but if the test suite grows, consider lowering the limit in test config.

  5. claimNamespace signature changed — went from claimNamespace(namespace: string) to claimNamespace(input: ClaimNamespaceInput). This is a breaking change to the PublishService interface. The app.ts was updated correctly, but the /publish skill and any external callers would need updating too.

Summary

Ship it. The implementation matches the project plan's "Phase 1 Controls" section exactly. The namespace index removal is a bonus win.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-pass hosted anti-abuse protections to the pubmd hosted write surface by enforcing reserved namespaces, markdown size caps, claim/publish rate limits (persisted in storage), and reclaiming stale unpublished namespace claims.

Changes:

  • Enforce reserved namespaces, markdown max size, and service-layer claim/publish rate limits (with persisted rate-limit buckets).
  • Add reclaim-on-claim behavior for stale unpublished namespaces.
  • Expand HTTP integration coverage for abuse/edge cases and update project planning docs.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/blob-store.test.ts Extends Blob mock to support listing blobs for new list-based page indexing.
tests/integration/test-server.ts Exposes dataDir from the test server for reclaim behavior tests that edit on-disk records.
tests/integration/server.test.ts Adds integration tests for reserved namespaces, rate limits, oversize markdown, reclaim behavior, wrong-token publish, and slug conflicts.
src/server/app.ts Plumbs request IP into claim/publish flows and maps new abuse-control errors to HTTP status codes.
src/core/repository.ts Introduces abuse-control error types and repository methods for persisting rate-limit records and overwriting namespace records.
src/core/publish-service.ts Implements reserved namespace checks, markdown size caps, claim/publish rate limiting, and reclaim-on-claim logic.
src/core/markdown.ts Small CSS tweak for blockquote strong styling (inherit).
src/core/file-store.ts Persists rate-limit buckets on disk and supports namespace record overwrites for reclaim behavior.
src/core/blob-store.ts Persists rate-limit buckets in Blob storage and changes page listing to be derived from lookup blobs.
docs/project-plan.md Documents cost model and the anti-abuse control plan; marks milestones complete.
docs/progress.md Records completion of the anti-abuse plan, implementation, and integration coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +68 to +72
tokenHash,
createdAt: new Date().toISOString(),
},
"wx",
);
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claimNamespace relies on writeFile(..., { flag: "wx" }) throwing on existing files. If two claim requests race, this will surface as a raw EEXIST filesystem error (currently mapped to a generic 400), instead of a consistent 409 NamespaceExistsError. Consider catching EEXIST here (or in the service) and throwing NamespaceExistsError so concurrent claims behave predictably.

Copilot uses AI. Check for mistakes.
Comment on lines 90 to +95
async function listPages(namespace: string): Promise<StoredPage[]> {
const index = await readJsonBlob(
namespaceIndexPath(namespace),
NamespacePageIndexSchema,
);
const pages = index?.pages ?? [];
const lookupResults = await list({
limit: 1000,
prefix: `${lookupPrefix(namespace)}/`,
token: metadataToken,
});
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listPages hard-codes limit: 1000 and ignores pagination (hasMore/cursor). Namespaces with >1000 pages will be silently truncated. Consider iterating through all pages using the list API’s pagination fields so listing remains correct for larger namespaces.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +114
return pages
.filter((page): page is StoredPage => page !== null)
.sort((left, right) => right.updatedAt.localeCompare(left.updatedAt));
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listPages can return duplicates if multiple lookup blobs point at the same pageId (e.g., if cleanup of an old slug lookup fails after a rename). Since the method maps lookups -> findPageById without deduping, the same page can appear multiple times. Consider deduping by pageId (or slug) after resolving pages.

Suggested change
return pages
.filter((page): page is StoredPage => page !== null)
.sort((left, right) => right.updatedAt.localeCompare(left.updatedAt));
const nonNullPages = pages.filter(
(page): page is StoredPage => page !== null,
);
const uniquePagesById = new Map<string, StoredPage>();
for (const page of nonNullPages) {
// Deduplicate by page identifier to avoid returning the same page multiple times
if (!uniquePagesById.has(page.id)) {
uniquePagesById.set(page.id, page);
}
}
return Array.from(uniquePagesById.values()).sort((left, right) =>
right.updatedAt.localeCompare(left.updatedAt),
);

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +41
tokenHash,
createdAt: new Date().toISOString(),
},
false,
);
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When allowOverwrite is false, put() will fail if the namespace blob already exists (including race conditions between getNamespace and claimNamespace). That failure currently propagates as a storage error and will be mapped to a generic HTTP 400/500. Consider translating overwrite-conflict errors into NamespaceExistsError to keep claim behavior stable under concurrency.

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +221
function requestIp(context: Context): string | undefined {
return (
context.req.header("x-real-ip") ??
context.req.header("cf-connecting-ip") ??
undefined
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestIp trusts x-real-ip / cf-connecting-ip directly. In deployments not sitting behind a proxy that strips/overwrites these headers, clients can spoof them and bypass IP-based rate limits. Consider either (a) only trusting these headers when trustProxy is enabled / the immediate peer is a known proxy, or (b) using the runtime’s remote address as the primary source and falling back to headers only in trusted environments.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants