⚡ Bolt: Pre-compile PERSONA_KEYWORDS regex patterns#236
Conversation
This change pre-compiles `PERSONA_KEYWORDS` at the module level into `_COMPILED_PERSONA_KEYWORDS` to bypass Python's regex cache lookup and function call overhead on every loop iteration inside `pick_persona`. Tests that monkeypatch the global dictionary have been updated to also patch the pre-compiled cache.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Optimizes persona detection in app.heuristics by pre-compiling persona keyword regexes once at import time and updating the matching loop to use compiled patterns, with a corresponding test adjustment to patch the compiled cache.
Changes:
- Added module-level
_COMPILED_PERSONA_KEYWORDSand updatedpick_persona()to iterate compiled regex objects instead of callingre.search()per pattern. - Updated extended heuristics test to monkeypatch the compiled persona cache alongside
PERSONA_KEYWORDS. - Documented the monkeypatch + precompiled-cache pattern in
.jules/bolt.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
app/heuristics/__init__.py |
Pre-compiles persona regexes and switches persona matching to compiled patterns. |
tests/test_extended_heuristics.py |
Adjusts monkeypatching to keep raw and compiled persona keyword maps consistent in tests. |
.jules/bolt.md |
Adds guidance on handling monkeypatched constants when introducing pre-compiled caches. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Bolt Optimization: Pre-compile regexes for developer persona | ||
| _COMPILED_DEV_PATS = [re.compile(p) for p in PERSONA_KEYWORDS["developer"]] | ||
|
|
||
| # Bolt Optimization: Pre-compile regexes for all personas | ||
| _COMPILED_PERSONA_KEYWORDS = {k: [re.compile(p) for p in v] for k, v in PERSONA_KEYWORDS.items()} | ||
|
|
||
|
|
There was a problem hiding this comment.
_COMPILED_DEV_PATS now duplicates compilation work already done in _COMPILED_PERSONA_KEYWORDS["developer"], which risks the two compiled caches drifting if either one is patched/updated. Consider compiling developer patterns only once (e.g., derive _COMPILED_DEV_PATS from _COMPILED_PERSONA_KEYWORDS["developer"] or remove _COMPILED_DEV_PATS and reuse the shared cache in detect_coding_context).
| # Bolt Optimization: Pre-compile regexes for developer persona | |
| _COMPILED_DEV_PATS = [re.compile(p) for p in PERSONA_KEYWORDS["developer"]] | |
| # Bolt Optimization: Pre-compile regexes for all personas | |
| _COMPILED_PERSONA_KEYWORDS = {k: [re.compile(p) for p in v] for k, v in PERSONA_KEYWORDS.items()} | |
| # Bolt Optimization: Pre-compile regexes for all personas | |
| _COMPILED_PERSONA_KEYWORDS = {k: [re.compile(p) for p in v] for k, v in PERSONA_KEYWORDS.items()} | |
| # Bolt Optimization: Convenience alias for developer persona patterns | |
| _COMPILED_DEV_PATS = _COMPILED_PERSONA_KEYWORDS["developer"] |
| def pick_persona(text: str) -> tuple[str, dict]: | ||
| lower = text.lower() | ||
| scores = {k: 0 for k in PERSONA_KEYWORDS} | ||
| evidence: dict[str, list[str]] = {k: [] for k in PERSONA_KEYWORDS} | ||
| for persona, pats in PERSONA_KEYWORDS.items(): | ||
| for p in pats: | ||
| if re.search(p, lower): | ||
|
|
||
| for persona, compiled_pats in _COMPILED_PERSONA_KEYWORDS.items(): | ||
| if persona not in scores: | ||
| scores[persona] = 0 | ||
| evidence[persona] = [] |
There was a problem hiding this comment.
pick_persona() now matches against _COMPILED_PERSONA_KEYWORDS but still exposes/initializes from PERSONA_KEYWORDS. Any runtime modification to PERSONA_KEYWORDS (including tests/consumers that patch it) will no longer affect matching unless _COMPILED_PERSONA_KEYWORDS is also updated, which is a behavioral/API change. Consider either (a) providing an explicit recompile_persona_keywords() helper (and using it in places that mutate PERSONA_KEYWORDS), or (b) treating the raw dict as private/immutable and deriving scores/evidence keys from _COMPILED_PERSONA_KEYWORDS to keep a single source of truth.
💡 What: Pre-compiled
PERSONA_KEYWORDSat the module level into_COMPILED_PERSONA_KEYWORDSand updatedpick_personato iterate through the compiled regexes rather than callingre.searchrepeatedly. Tests usingmonkeypatchwere updated to patch the pre-compiled cache directly to avoid invalidation issues.🎯 Why: Using
re.search(p, text)within a hot loop for every persona pattern incurs the overhead of Python's internal regex compilation cache lookup and a function call. Pre-compiling to the module level bypasses this, yielding a significant speedup.📊 Impact: Reduces regex overhead significantly during persona matching, making text processing faster by eliminating repeated runtime regex evaluations.
🔬 Measurement: Verified via targeted
test_persona.py,test_developer_persona.py, andtest_extended_heuristics.pytests. The full test suite was also run to ensure no regressions.PR created automatically by Jules for task 1550938755926524643 started by @madara88645