A curated list of DataEngine pipeline examples, reference pipelines from the VAST GitHub org and community contributions.
VAST DataEngine is a serverless computing platform built into the VAST AI Operating System:
DataEngine lets you build, deploy, and scale data processing functions without managing infrastructure, running compute directly where data lives to eliminate costly data movement and duplication. The platform handles scheduling, event detection, and resource allocation so you can focus on business logic. At its core, DataEngine gives you three building blocks:
- Functions: Your code built into container images and executed on VAST compute nodes (cnodes)
- Triggers: Event sources like S3 uploads or cron schedules
- Pipelines: Orchestration layer that connects triggers to functions
For a full overview, check out our recent blog post: VAST DataEngine: Bringing Compute to Your Data
Disclaimer: The pipelines listed here are provided for demonstration and educational purposes only. They are not guaranteed to be production-ready. Review, test, and harden any pipeline to meet your own requirements before deploying it in a production environment.
Small, self-contained pipelines intended for training and workshop use:
| Pipeline | Trigger | Runtime | Link | Description |
|---|---|---|---|---|
| python-cron-hello-world | cron | Python 3.12.12 | link | Pipeline with cron trigger that logs hello world. |
| python-s3-hello-world | s3 | Python 3.12.12 | Coming soon: Pipeline with S3 trigger that retrieves and logs file data. | |
| python-s3-llm | s3 | Python 3.12.12 | Coming soon: Pipeline with S3 trigger that integrates an LLM API. | |
| python-s3-video-ingestion | s3 | Python 3.12.12 | Coming soon: Pipeline with S3 trigger for video ingestion. | |
| python-s3-video-embeddings | s3 | Python 3.12.12 | Coming soon: Pipeline with S3 trigger to generate video embeddings. |
Reference pipelines by VAST:
| Pipeline | Runtime | Repo | Description |
|---|
Pipelines built and maintained by the community:
| Pipeline | Runtime | Repo | Author | Description |
|---|
See CONTRIBUTING.md for the full workflow and PR checklist.
To contribute, add an entry to registry.json and open a PR against main.
dataengine-pipelines/
├── scripts/
│ └── validate_function.py # Checks a function folder has all required files
└── registry.json # Machine-readable index of all pipelines

