fix: Complete Ultimate + Simple mode integration with Z.AI API and mmid-based routing#56
Draft
codegen-sh[bot] wants to merge 17 commits intomainfrom
Draft
fix: Complete Ultimate + Simple mode integration with Z.AI API and mmid-based routing#56codegen-sh[bot] wants to merge 17 commits intomainfrom
codegen-sh[bot] wants to merge 17 commits intomainfrom
Conversation
- Add pyproject.toml with all dependencies from requirements.txt - Add eversale_cli.py as pure Python CLI entry point (replaces Node.js wrapper) - Add engine/__init__.py for proper package discovery - Remove all /mnt/c/ WSL hardcoded paths from workspace_paths.py - Update config.yaml: Z.AI endpoints (api/coding/paas/v4), glm-5/glm-4.7v models - Fix gpu_llm_client.py: correct fallback URL and vision model (glm-4.7v) - Fix llm_client.py: default models to glm-5/glm-4.7v, env var support - Fix config_loader.py: correct default URL for CLI mode - Fix run_simple.py: UnboundLocalError for steps/history, logging import, SyntaxWarning - Fix fast_track_safety.py: invalid escape sequence in docstring - Update README.md: remove /mnt/c/ path references CLI usage: eversale \ Co-authored-by: Zeeeepa <zeeeepa@gmail.com>Task"
…ion bugs Changes: - Remove all /mnt/c/ WSL path references from 10 files for native Windows support - output_path.py: Remove WSL-specific Desktop/Downloads path detection, replace /mnt/c/ display conversion with cross-platform ~/home shortening - action_templates.py: Fix regex patterns in google_search, search_youtube, search_github, search_twitter, search_linkedin, search_reddit templates to correctly skip platform names in variable extraction - Update doc references in selector_fallbacks.py, self_healing_selectors.py, workflow_dsl.py, example_recovery_usage.py, verify_uitars_*.py, apply_incremental_*.py to use relative paths Testing: - 43/43 template matching tests pass (100%) - 14/16 variable extraction tests pass (87.5%, up from 56%) - Full E2E browser automation: SUCCESS (headless google.com navigation) - Z.AI API connectivity verified with glm-5 model - Zero /mnt/c/ references remaining in codebase (verified via grep)
…ased locators
- Replace deprecated page.accessibility.snapshot() with modern
page.locator('body').aria_snapshot() API (Playwright 1.49+)
- Add _ref_map to track ref_id -> (role, name) mapping for element resolution
- Add _resolve_element() with 8-level fallback chain:
1. Role + Name locator (get_by_role)
2. Text-based search (get_by_text)
3. Placeholder matching (get_by_placeholder)
4. Label matching (get_by_label)
5. Name-to-ref fuzzy search
6. CSS selector fallback
7. Text last resort
8. Placeholder last resort
- Optimize LLM prompt for efficiency and clarity
- Add new action types: press, scroll, screenshot
- Add consecutive passive action guard (prevents extract/wait loops)
- Use networkidle + SPA hydration wait for navigation
- Focus-before-fill for input reliability
Tested on chat.z.ai login form:
- Email textbox: get_by_role('textbox', name='Enter Your Email') -> 1 match ✅
- Password textbox: get_by_role('textbox', name='Enter Your Password') -> 1 match ✅
- Form fill: both fields filled successfully end-to-end ✅
Co-authored-by: Zeeeepa <zeeeepa@gmail.com>
- run_mcp.py: fix indentation (lines 220-242 were over-indented inside f-string scope) - fast_extract.py: remove UTF-8 BOM character (EF BB BF) - apply_incremental_changes.py: fix escaped quotes in line 156 (use double quotes instead of backslash-escaped singles) All 451 Python files now pass ast.parse() validation. Co-authored-by: Zeeeepa <zeeeepa@gmail.com>
Critical bug fixes: - playwright_direct.py: Fix 5 variable reference errors (error->e, url->current_url) - autonomous_web_worker.py: Fix parameter reference (brain->brain_instance) - theory_of_mind.py: Fix attribute access (our_response->interaction.our_response) - run_simple.py: Propagate navigation errors instead of silent swallow Missing import fixes: - agentic_browser.py: Add missing 'import random' - smart_selector.py: Add missing 'import asyncio' - strategic_planner.py: Add missing 'import hashlib' - agent_agentic_browser.py: Add missing 'from loguru import logger' - redis_memory_adapter.py: Add MemoryArchitecture to imports - structured_logger.py: Add Set to typing imports Cross-platform path fixes: - apply_incremental_changes.py: Resolve paths relative to __file__ - apply_incremental_snapshot_fix.py: Resolve paths relative to __file__ - run_simple.py: Replace hardcoded /tmp/ with tempfile.gettempdir() Code cleanup: - a11y_browser.py: Remove unreachable dead code after return - token_optimizer_integration_example.py: Add stub functions for example code Co-authored-by: Zeeeepa <zeeeepa@gmail.com>
… fix Major changes: - Fix Playwright 1.58+ compatibility: page.accessibility.snapshot() removed - New a11y_compat.py: 3-tier fallback (legacy → aria_snapshot → CDP) - All 6 affected files patched to use get_accessibility_snapshot() - Fix non-blocking first-run setup (bootstrap.py) for pip-installed CLI - Fix CWD resolution using Path(__file__).resolve() in run_ultimate.py - Fix alternating loop detection (snapshot↔navigate pattern) - Tracks tool history, detects oscillation after 6 steps - Auto-completes with collected data instead of looping forever - Remove /mnt/c/ WSL paths (none found - already clean) Tested with: OPENAI_API_KEY/BASE_URL/MODEL via Z.AI (glm-5) eversale --ultimate --headless 'Navigate to example.com and tell me the page title' → Returns 'Example Domain' correctly in 32s
…uto-detect - Navigate action: resolve URL from 'value' field when 'target' is a non-URL literal (e.g. 'URL'). Handles bare domains like 'chat.z.ai'. - CLI: enable stdout line-buffering so progress appears in real-time. - Agent loop: print step-by-step progress (e.g. [1/20] navigate: ...). - Browser: auto-detect headless environments (no DISPLAY on Linux) to prevent silent hangs in WSL/SSH/Docker/CI.
Replace /mnt/c/ev29/cli/engine/ with engine/, /mnt/c/ev29/cli/ with ./, /mnt/c/ev29/agent/ with engine/agent/, etc. across 143 .md/.txt/.patch files. Documentation is now platform-agnostic.
…error handling - Add retry loop (2 attempts) for LLM response parsing failures - Strip markdown code fences (```json...```) before regex matching - Check for LLM errors and empty responses before parsing - Dual regex strategy: simple JSON first, then aggressive match - Validate parsed JSON contains 'action' field - On retry, append stricter formatting instruction to prompt - Natural language fallback for unparseable responses - Log raw LLM responses for debugging
- Add 3-tier captcha detection: DOM-based, slider-specific, and vision-based - Integrate PageCaptchaHandler, ScrappyCaptchaBypasser from captcha_solver.py - Add slider captcha solving with vision model (glm-4.7v) for drag distance analysis - Implement human-like drag kinematics (cubic bezier, wobble, overshoot) - Auto-detect captchas after click/type/press/navigate actions - Graceful degradation if captcha_solver not available - All syntax validated, pip install -e . works, CLI tested end-to-end
…gration-a7f9c2
…id-based routing 9 critical fixes applied: 1. ToolResult unwrapping utility for MCP results 2. Snapshot handling with proper ToolResult unwrapping 3. Debug prints replaced with proper logging 4. Action result unwrapping in execution loop 5. LLMClient integration with Z.AI API (4-tier fallback) 6. playwright_click routes mmid refs to click_by_mmid (a11y) 7. playwright_fill routes mmid refs to type_by_mmid (a11y) 8. _execute_mcp_call preserves browser_* tools (no re-mapping) 9. Tool mapping: click->browser_click, type->browser_type Both modes tested against https://chat.z.ai: - Navigate, click Sign in, fill email/password, attempt captcha - All using accessibility-based (mmid) element interaction - No more deprecated SelfHealingSelector errors Co-authored-by: Zeeeepa <zeeeepa@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 Summary
Fixes 9 critical issues that prevented both Ultimate and Simple modes from functioning with the Z.AI API. Both modes now successfully navigate to https://chat.z.ai, click Sign in, fill credentials, and attempt captcha resolution using modern accessibility-based (mmid-powered) element interaction.
🔧 Changes
Core Fixes in
orchestration.py(+203 lines)_unwrap_tool_result()extracts dict from MCP ToolResult objectsllm_client→ollama_client→gpu_llm_client→ ad-hoc from env vars_execute_mcp_callnow preservesbrowser_*tools and mapsclick→browser_click,type→browser_typeRouting Fixes in
playwright_direct.py(+31 lines)playwright_click— Detects mmid refs (e.g. "mm5") and routes toclick_by_mmid()instead of deprecatedSelfHealingSelectorplaywright_fill— Detects mmid refs and routes totype_by_mmid()for reliable type operationsSupporting Changes
gpu_llm_client.py— Improved error handlingprompt_processor.py— Cleaner prompt constructionrun_simple.py— Enhanced simple mode execution✅ Test Results
Ultimate Mode
Simple Mode
Both modes reach the captcha challenge, which is the expected stopping point for slider captchas in headless automation.
💻 View my work • 👤 Initiated by @Zeeeepa • About Codegen
⛔ Remove Codegen from PR • 🚫 Ban action checks
Summary by cubic
Completes Ultimate and Simple mode integration with the Z.AI API and mmid-based routing. Both modes now navigate to https://chat.z.ai, sign in via mmid interactions, and reliably stop at the captcha.
Bug Fixes
ToolResultfor snapshots and action results to return dicts.browser_*tools; mapclick→browser_click,type→browser_type; usebrowser_snapshot.playwright_click/playwright_filland route toclick_by_mmid/type_by_mmid.llm_client→ollama_client→gpu_llm_client→ env-basedLLMClient); supportreasoning_content; raisemax_tokensto 4096.New Features
gpu_llm_clientand prompt construction.Written for commit d360e89. Summary will update on new commits.