Leveraging Data Engineering + Finance + Biotech expertise for systematic alpha generation
A complete, production-ready platform that detects investment opportunities through automated signal detection across clinical trials, patent expirations, and insider activity.
# 1. Clone or navigate to the project
cd /Users/waiyang/Desktop/repo/dreamers-v2
# 2. Set up databases
createdb clinical_trials patent_intelligence insider_signals
# 3. Load schemas
psql clinical_trials < clinical_trial_signals/database/schema.sql
psql patent_intelligence < patent_intelligence/sql/schema.sql
psql insider_signals < insider_hiring_signals/schema.sql
# 4. Run the dashboard (demo mode works without databases)
cd dashboard
pip install -r requirements.txt
streamlit run app.pyAccess dashboard at: http://localhost:8501
Goal: Detect trial outcomes 2-4 weeks before announcements
Edge: Protocol amendments, enrollment patterns, patent filings signal trial success/failure
ROI Target: 5-10x on options plays
Goal: Identify patent cliff events 6-18 months in advance
Edge: Predict generic competition timing with 90%+ accuracy
ROI Target: 20-40% on pair trades (long generic, short branded)
Goal: Detect company trajectory through behavioral signals
Edge: Combine Form 4, 13F, and hiring data for directional conviction
ROI Target: 15-25% on 3-6 month positions
Streamlit-based dashboard combining all three systems:
- Combined scoring (weighted across all signals)
- Real-time data from all pipelines
- Interactive charts and filterable tables
- Watchlist management
- Configurable alerts
📁 Code | [🚀 Demo Mode Available]
Automated scheduling using Apache Airflow (or Prefect):
| Pipeline | Frequency | Time (ET) |
|---|---|---|
| Clinical Trial Signals | Daily | 6:00 PM |
| Patent/IP Intelligence | Weekly | Monday 8:00 AM |
| Form 4 Insider Trades | Every 30 min | Market hours |
| 13F Holdings | Quarterly | 45 days post-quarter |
Multi-channel alerts: Email, Slack, SMS
📁 Code | [🐳 Docker Deployment Available]
- Total Budget: $5,000
- Infrastructure Setup: ~$1,000
- Remaining for Trading: ~$4,000
- Local Deployment: $670/month
- Cloud Deployment (AWS): $600-800/month
Includes: Cloud infrastructure, AI APIs, LinkedIn access, PACER account, email/SMS
Based on signal accuracy and market volatility:
| Strategy | Allocation | Expected Return | Risk |
|---|---|---|---|
| Clinical trial options | 30% | 50-100% | High |
| Patent cliff pairs | 40% | 20-40% | Medium |
| Insider-driven longs | 30% | 15-25% | Low-Med |
Target Blended Return: 30-50% annually
dreamers-v2/
├── README.md # You are here
├── PROJECT_SUMMARY.md # Complete documentation
│
├── clinical_trial_signals/ # System 1: Trial signals
│ ├── main.py
│ ├── demo.py
│ └── database/, scrapers/, scoring/, alerts/
│
├── patent_intelligence/ # System 2: Patent cliffs
│ ├── src/pipeline.py
│ └── extractors/, transformers/, loaders/
│
├── insider_hiring_signals/ # System 3: Insider/hiring
│ ├── main.py
│ └── scrapers/, models/, reports/
│
├── dashboard/ # Integrated UI
│ ├── app.py
│ └── components/, pages/, utils/
│
└── orchestration/ # Automation
├── dags/ # Airflow DAGs
├── docker/ # Local deployment
└── cloud/ # AWS/GCP deployment
# Clinical Trial Signals
cd clinical_trial_signals
python main.py --full
# Patent Intelligence
cd patent_intelligence
python -m src.pipeline --top-drugs 50
# Insider/Hiring Signals
cd insider_hiring_signals
python main.py scrape --all
python main.py score --ticker MRNAcd orchestration/docker
docker-compose up -d
# Access services
# Airflow: http://localhost:8080 (admin/admin)
# Grafana: http://localhost:3000 (admin/admin)
# Health: http://localhost:9091/health- PROJECT_SUMMARY.md - Complete guide with architecture, strategy, and monetization
- SPECS_1_Clinical_Trial_Signals.md - Clinical trial system specs
- SPECS_2_Patent_IP_Intelligence.md - Patent/IP system specs
- SPECS_3_Insider_Hiring_Signals.md - Insider/hiring system specs
- orchestration/COST_ESTIMATES.md - Detailed cost breakdown
Each system folder also has its own README with specific setup instructions.
- Run all scrapers and verify data quality
- Review signal reports for accuracy
- Backtest scoring models on historical data
- Track 5-10 hypothetical trades
- Monitor signal accuracy
- Refine weights and thresholds
- Start with small positions ($500 each)
- Focus on highest-confidence signals (score >= 7)
- Implement strict stop losses
- Increase position sizes gradually
- Expand watchlist
- Consider automated execution
This platform provides an unfair advantage because:
- Rare Skill Combination: Data engineering + finance + biotech expertise is extremely uncommon
- Information Asymmetry: Automated detection of public signals that others miss
- Systematic Edge: Removes emotion from investment decisions
- Scalable: Can manage larger capital with same infrastructure
- Defensible: Can't be easily replicated by general AI consultants
Additional Systems (when budget allows):
- HEOR model validation
- Clinical trial cost benchmarking
- FDA submission QC service
- Biotech M&A due diligence
Advanced Features:
- Automated trade execution (via Interactive Brokers API)
- Machine learning models for signal weighting
- Real-time news sentiment analysis
- Options strategy optimizer
Monetization (after proven track record):
- Raise a biotech-focused fund
- Sell research reports/newsletter
- License platform to hedge funds
- Offer consulting services
- Max 10% per trade
- Max 40% per system
- 20% cash reserve
- Clinical trials: 50% (options decay)
- Patent cliffs: 15% (thesis break)
- Insider plays: 10% (momentum loss)
- Minimum confidence: 0.6
- Prefer 3+ clustered signals
- Always review contrarian signals
Each system has detailed troubleshooting in its README. Common issues:
Database errors: Check PostgreSQL is running (pg_isready)
API limits: Review rate limiting in configs
Missing data: Check scraper logs in logs/ folders
This is proprietary investment software. Unauthorized distribution is prohibited.
Track these KPIs weekly:
System Performance:
- Data freshness < 24 hours
- Uptime > 99%
- Signal accuracy > 70%
Investment Performance:
- Win rate > 60%
- Average return > 25%
- Sharpe ratio > 1.5
- Max drawdown < 20%
Built with: Python, PostgreSQL, Streamlit, Apache Airflow, Docker
Powered by: AI (Claude Sonnet, GPT-4), SEC EDGAR API, ClinicalTrials.gov API, USPTO API
Let's generate alpha! 📈