GeneFlow is an advanced bioinformatics analysis platform powered by Google ADK (Agentic Development Kit) that combines multi-agent architecture with generative AI capabilities. It provides researchers and bioinformaticians with intelligent, conversational tools for DNA sequence analysis, protein prediction, literature search, and hypothesis generation.
- 𧬠Intelligent Sequence Analysis: GC content, ORF detection, motif scanning
- π¬ Protein Prediction: Physicochemical properties from DNA sequences
- π Literature Integration: AI-powered research paper discovery and synthesis
- π‘ Hypothesis Generation: AI-driven research direction suggestions
- π Advanced Visualizations: Interactive plots and 3D structure modeling
- π€ Multi-Agent Architecture: Specialized agents for different bioinformatics tasks
- πΎ Session Management: Persistent conversation history and context
- π Performance Monitoring: Real-time metrics and cost tracking
- Python 3.10 or higher
- Google API Key (for generative AI capabilities)
- 4GB RAM minimum
-
Clone the repository
git clone https://github.com/suriyasureshok/geneflow.git cd GeneFlow -
Create and activate virtual environment
python -m venv gene gene\Scripts\activate # On Windows source gene/bin/activate # On macOS/Linux
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create .env file in root directory echo GOOGLE_API_KEY=your_api_key_here > .env
-
Launch the application
python main.py
The application will automatically:
- Check all dependencies
- Create necessary directories (
sessions/,metrics/,geneflow_plots/) - Launch the Streamlit UI at
http://localhost:8501
βββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit Web UI β
β (Home, Dashboard, Chat, Analysis Pages) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββ
β UnifiedCoordinator (Router) β
β - Routes to Chat or Analysis based on input β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
ββββββββββββ΄βββββββββββ
β β
βββββββΌββββββ βββββββΌβββββββββββ
β ChatAgent β β ADKCoordinator β
β (Fast) β β (Comprehensive)β
βββββββββββββ βββββββ¬βββββββββββ
β
ββββββββββββ΄βββββββββββ
β β
βββββββββΌβββββββ ββββββββΌβββββββββ
β Sequence β β Protein β
β Analyzer β β Prediction β
ββββββββββββββββ βββββββββββββββββ
GeneFlow/
βββ main.py # Application entry point
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ Architecture.md # System design documentation
βββ Modules.md # Module reference guide
β
βββ src/
β βββ agents/ # AI agent implementations
β β βββ adk_coordinator.py # Main ADK-based orchestrator
β β βββ unified_coordinator.py # Request router
β β βββ chat_agent.py # Lightweight chat
β β βββ sequence_analyzer.py # Sequence analysis agent
β β βββ protein_prediction.py # Protein analysis
β β βββ comparison.py # Sequence comparison
β β βββ hypothesis.py # Hypothesis generation
β β βββ literature.py # Literature search
β β βββ coordinator.py # Legacy coordinator
β β
β βββ core/ # Core infrastructure
β β βββ session_manager.py # User session management
β β βββ monitoring.py # Performance metrics
β β βββ adk_tools.py # ADK tool definitions
β β βββ agent_factory.py # Agent creation
β β βββ context_manager.py # Execution context
β β βββ memory.py # Memory management
β β
β βββ utils/ # Utility modules
β β βββ visualizer.py # Plot generation
β β βββ reporter.py # PDF report creation
β β βββ structure_generator.py # 3D structure modeling
β β
β βββ ui/ # Streamlit user interface
β β βββ Home.py # Landing page
β β βββ pages/
β β βββ 1_Dashboard.py # Analytics dashboard
β β βββ 2_Chat.py # Chat interface
β β βββ 3_Analysis.py # Full analysis
β β
β βββ data/ # Reference data
β β βββ known_sequences.fasta # Sequence database
β β
β βββ tests/ # Unit tests
β βββ test_adk_pipeline.py
β βββ test_*.py # Component tests
β βββ ...
β
βββ sessions/ # User session storage
βββ metrics/ # Performance metrics
βββ geneflow_plots/ # Generated visualizations
from src.agents.unified_coordinator import UnifiedCoordinator
coordinator = UnifiedCoordinator()
# Simple question - routes to ChatAgent
result = coordinator.process_message(
"What is GC content and why is it important?",
session_id="user_123"
)
print(result['response'])coordinator = UnifiedCoordinator()
# DNA sequence - routes to ADKCoordinator with full tools
sequence = "ATGAAATATAAAGCGTACGTGCTTGAATGCCTTATAAACGTAGCTAG"
result = coordinator.run_pipeline(
sequence=sequence,
session_id="user_123"
)
print(f"Analysis complete!")
print(f"GC Content: {result['results']['analysis']['gc_percent']}%")
print(f"ORFs Found: {len(result['results']['analysis']['orfs'])}")
print(f"Report saved to: {result['results']['report']['report_path']}")coordinator = UnifiedCoordinator()
session_id = "researcher_001"
# First message
result1 = coordinator.process_message(
"I'm studying bacterial resistance genes",
session_id=session_id
)
# Follow-up with context
result2 = coordinator.process_message(
"Can you analyze this sequence for me?",
session_id=session_id
)
# The agent remembers previous conversation context
print(result2['response'])| Operation | Time | Tokens |
|---|---|---|
| Chat Response | 1-3s | 200-500 |
| Sequence Analysis | 5-15s | 500-1000 |
| Full Pipeline | 60-120s | 2000-5000 |
| PDF Report Gen | 5-15s | - |
| 3D Structure Gen | 10-20s | - |
# Required
GOOGLE_API_KEY=your_api_key_here
# Optional
LOG_LEVEL=INFO # Logging level
SESSION_MAX_AGE_HOURS=24 # Session expiration
MAX_SEQUENCE_LENGTH=100000 # Max sequence size
CACHE_ENABLED=true # Enable caching
REDIS_URL=redis://localhost:6379 # Redis cache (optional)# In your initialization code
from src.core.session_manager import SessionManager
from src.core.monitoring import PerformanceMonitor
# Customize session storage
session_manager = SessionManager(
storage_path="custom_sessions",
max_session_age_hours=48 # Longer session lifetime
)
# Customize performance monitoring
monitor = PerformanceMonitor(
storage_path="custom_metrics",
enabled=True # Disable for production if needed
)
# Pass to coordinator
from src.agents.unified_coordinator import UnifiedCoordinator
coordinator = UnifiedCoordinator(
session_manager=session_manager,
performance_monitor=monitor
)- GC Content: Percentage of guanine and cytosine bases
- ORF Detection: Open Reading Frame identification (ATG to stop codon)
- Motif Scanning: Regulatory element detection (TATA box, Kozak sequence, etc.)
- Translation: DNA to amino acid conversion
- Molecular Weight: Protein mass calculation
- Hydrophobicity: Protein property analysis
- Signal Peptide: N-terminal signal detection
- Homology Search: Find similar sequences
- Alignment: Compare multiple sequences
- Similarity Scoring: Quantify sequence relationships
- PubMed Search: Scientific paper discovery
- Citation Analysis: Find related research
- Trend Analysis: Identify research directions
- Pattern-based: From sequence analysis results
- Literature-informed: Based on research context
- Confidence Scoring: Probability estimation
- GC Content Plots: Sliding window analysis
- ORF Maps: Linear genome representation
- 3D Structure: DNA/Protein visualization
- Property Charts: Physicochemical analysis
# Run all tests
pytest src/tests/
# Run specific test
pytest src/tests/test_sequence_analyzer.py -v
# With coverage
pytest src/tests/ --cov=src --cov-report=htmlSolution: Set the environment variable:
set GOOGLE_API_KEY=your_key # Windows
export GOOGLE_API_KEY=your_key # Mac/LinuxSolutions:
- Check network connectivity
- Verify API quota limits
- Reduce sequence length for initial analysis
- Enable local caching
Solution: Sessions expire after 24 hours by default. Create a new session or adjust SESSION_MAX_AGE_HOURS.
Solution: Reduce sequence length or enable Redis caching for session storage.
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see LICENSE file for details.
If you use GeneFlow in your research, please cite:
@software{geneflow2024,
author = {Suriya Sureshkumar},
title = {GeneFlow: ADK-Powered Bioinformatics Copilot},
year = {2024},
url = {https://github.com/suriyasureshok/geneflow}
}- π§ Email: suriyasureshkumarkannian@gmail.com
- π± Phone: +91 8072816532
- πΌ LinkedIn: Suriya Sureshkumar
- π¬ Issues: GitHub Issues
- π Documentation: Full Docs
Last Updated: November 2024
Version: 1.0.0