[English] | 中文版本
AgenticX-GUIAgent is a multi-agent system built on agenticx (v0.2.1). It automates complex Android GUI operations from natural language instructions, integrating multimodal reasoning, knowledge management, and learning for continuous improvement.
- Multi-agent collaboration: Manager, Executor, Reflector, Notetaker work together.
- Knowledge-driven: A shared knowledge pool improves success rates over time.
- Learning loop: A data flywheel enables continuous optimization.
- Multimodal understanding: Reasoning over screenshots and UI context.
- Extensible: Easy to add agents, tools, and capabilities.
The system is layered and modular. Core components are:
graph TD
subgraph "User/Developer"
A[User/Developer]
end
subgraph "AgenticX-GUIAgent System (core.system.AgenticXGUIAgentSystem)"
B(AgenticXGUIAgentSystem)
C(AgenticX Agent)
D(Workflow Engine)
E(Event Bus)
F(InfoPool)
B -- "manage" --> C
B -- "orchestrate" --> D
B -- "use" --> E
B -- "use" --> F
end
subgraph "Agents (agents)"
G(ManagerAgent)
H(ExecutorAgent)
I(ActionReflectorAgent)
J(NotetakerAgent)
C -- "contains" --> G
C -- "contains" --> H
C -- "contains" --> I
C -- "contains" --> J
end
subgraph "Core Coordination (core)"
K(AgentCoordinator)
L(BaseAgenticXGUIAgentAgent)
D -- "execute" --> K
K -- "coordinate" --> G
K -- "coordinate" --> H
K -- "coordinate" --> I
K -- "coordinate" --> J
C -- "inherit" --> L
end
subgraph "Knowledge (knowledge)"
M(KnowledgeManager)
N(KnowledgePool)
O(HybridEmbeddingManager)
F -- "contain" --> M
F -- "contain" --> N
M -- "use" --> O
J -- "write" --> N
G -- "read" --> N
H -- "read" --> N
I -- "read" --> N
end
subgraph "Learning (learning)"
P(RLEnhancedLearningEngine)
Q(LearningCoordinator)
R(RL Core)
S(Data Flywheel)
P -- "include" --> Q
P -- "include" --> R
R -- "drive" --> S
subgraph "RL Core Components"
R1(MobileGUIEnvironment)
R2(MultimodalStateEncoder)
R3(Policy Networks)
R4(ExperienceReplayBuffer)
R5(RewardCalculator)
R6(PolicyUpdater)
end
R -- "contain" --> R1
R -- "contain" --> R2
R -- "contain" --> R3
R -- "contain" --> R4
R -- "contain" --> R5
R -- "contain" --> R6
H -- "interact" --> R1
I -- "feedback" --> R5
J -- "knowledge" --> R2
S -- "optimize" --> R3
end
A -- "request" --> B
G -- "decompose" --> H
H -- "execute" --> I
I -- "reflect" --> G
I -- "generate experience" --> R4
J -- "record" --> M
This section covers the root files and key subdirectories under
AgenticX-GUIAgent/.
.gitignore: Git ignore rules.LICENSE: License.README.md: Project documentation (this file).README_zn.md: Chinese documentation.requirements.txt: Python dependencies.setup.sh: Automated environment setup script.config.yaml: Default runtime configuration (LLM, knowledge, learning, evaluation).config.py: Configuration data models and validation.main.py: System entry point (initialization, execution, interactive mode).utils.py: Common utilities (logging, config loading, retry, JSON).check_adb.py: ADB diagnostics for device connectivity.cli_knowledge_manager.py: Knowledge base CLI (status/query/export).
agents/: Core agent implementations (Manager/Executor/Reflector/Notetaker).core/: Core components (base agent, InfoPool, context, coordinator).tools/: GUI tools and executor (ADB/basic/smart tools).knowledge/: Knowledge management (storage, retrieval, embeddings).learning/: Learning engine (five-stage learning + RL core).evaluation/: Evaluation framework (metrics, benchmarks, reports).workflows/: Multi-agent collaboration workflow orchestration.docker/: Docker/Compose configs and optional services.tests/: Test cases and test resources.
- CPU: 4+ cores
- Memory: 8GB+ (16GB recommended)
- Storage: 10GB free
- Android device: Android 8.0+ with ADB debugging enabled
- Python 3.9+
- Conda (Anaconda/Miniconda)
- ADB (Android Debug Bridge)
- Git
We provide both automated and manual setup.
Run setup.sh to prepare the environment and dependencies:
bash setup.shThis script will:
- Check Conda, ADB, Python.
- Create
agenticx-guiagentconda env. - Install dependencies and AgenticX (editable).
- Create a
run.shlauncher. - Attempt to create
.envfrom.env.exampleif it exists (otherwise create.envmanually).
If you prefer manual setup, follow the steps below.
conda create -n agenticx-guiagent python=3.9 -y
conda activate agenticx-guiagentpip install --upgrade pip
pip install -r requirements.txt
# Install AgenticX (editable)
# cd /path/to/AgenticX
# pip install -e .
# Optional mobile control tools
pip install adbutils pure-python-adbCreate a .env file (or use your own environment variable manager). The default config.yaml uses Bailian; adjust as needed.
nano .envExample .env:
# Bailian (default in config.yaml)
BAILIAN_API_KEY=your_bailian_api_key
BAILIAN_CHAT_MODEL=qwen-vl-max
BAILIAN_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
BAILIAN_EMBEDDING_MODEL=text-embedding-v4
# Optional app settings
DEBUG=true
LOG_LEVEL=INFO
- Enable Developer Options:
- Settings → About phone → tap "Build number" 7 times
- Enable USB debugging:
- Settings → Developer options → USB debugging
- Settings → Developer options → USB installation
- Connect device:
- Connect via USB
- Authorize USB debugging on device
- Verify ADB:
adb version adb start-server adb devices
Make sure your Android device is connected.
-
Interactive mode:
./run.sh --interactive # or python main.py --interactive -
Single-task mode:
./run.sh --task "Open WeChat and send a message to Alice" # or python main.py --task "Open WeChat and send a message to Alice"
# Enable evaluation
python main.py --task "Open Settings" --evaluate
# Use custom config
python main.py --config custom_config.yaml
# Set log level
python main.py --log-level DEBUG"Send a WeChat message to Jennifer: I will be home for dinner tonight."
Typical flow:
- Manager decomposes the task
- Executor performs GUI actions
- ActionReflector validates results
- Notetaker records knowledge
"Set an alarm for 8:00 AM tomorrow with note: meeting"
Typical flow: Open Clock → Create alarm → Set time → Add note → Save
"Open TikTok, search for food videos, like the top 3."
Typical flow: Launch app → Search → Browse results → Like top 3
We provide Docker and Docker Compose configs for containerized environments.
- Enter the
dockerdirectory:cd docker - Configure environment variables:
cp env.example .env nano .env
- Start services:
Note: For USB access, you may need
docker-compose up --build
--privileged -v /dev/bus/usb:/dev/bus/usb.
See docker/README.md for details.
- ADB connection fails:
adb kill-server adb start-server adb devices
- Dependency install fails:
conda activate agenticx-guiagent pip install --upgrade pip pip cache purge pip install -r requirements.txt --force-reinstall
- LLM API failures:
- Check API keys in
.env. - Ensure network access.
- Verify account quota.
- Check API keys in
- Device actions fail:
- Ensure device is unlocked.
- Ensure target app is installed.
- Ensure USB debugging is enabled.
python main.py --log-level DEBUG
# Tail file logs if configured:
# tail -f logs/agenticx-guiagent.log- Run tests:
pytest
- Code style:
pre-commit install pre-commit run --all-files
- Repo:
https://github.com/DemonDamon/AgenticX-GUIAgent(replace with actual) - Issues: open a GitHub issue
Note: Make sure environment variables and device connections are configured before running. Start with a simple task first.