VAST DataEngine Pipelines

A curated list of DataEngine pipeline examples, reference pipelines from the VAST GitHub org and community contributions.

What is VAST DataEngine?

VAST DataEngine is a serverless computing platform built into the VAST AI Operating System:

DataEngine lets you build, deploy, and scale data processing functions without managing infrastructure, running compute directly where data lives to eliminate costly data movement and duplication. The platform handles scheduling, event detection, and resource allocation so you can focus on business logic. At its core, DataEngine gives you three building blocks:

Functions: Your code built into container images and executed on VAST compute nodes (cnodes)
Triggers: Event sources like S3 uploads or cron schedules
Pipelines: Orchestration layer that connects triggers to functions

For a full overview, check out our recent blog post: VAST DataEngine: Bringing Compute to Your Data

Pipelines

Disclaimer: The pipelines listed here are provided for demonstration and educational purposes only. They are not guaranteed to be production-ready. Review, test, and harden any pipeline to meet your own requirements before deploying it in a production environment.

In this repo

Small, self-contained pipelines intended for training and workshop use:

Pipeline	Trigger	Runtime	Link	Description
python-cron-hello-world	cron	Python 3.12.12	link	Pipeline with cron trigger that logs hello world.
python-s3-hello-world	s3	Python 3.12.12		Coming soon: Pipeline with S3 trigger that retrieves and logs file data.
python-s3-llm	s3	Python 3.12.12		Coming soon: Pipeline with S3 trigger that integrates an LLM API.
python-s3-video-ingestion	s3	Python 3.12.12		Coming soon: Pipeline with S3 trigger for video ingestion.
python-s3-video-embeddings	s3	Python 3.12.12		Coming soon: Pipeline with S3 trigger to generate video embeddings.

Reference Pipelines

Reference pipelines by VAST:

Pipeline	Runtime	Repo	Description

Community

Pipelines built and maintained by the community:

Pipeline	Runtime	Repo	Author	Description

Contributing

See CONTRIBUTING.md for the full workflow and PR checklist.

To contribute, add an entry to registry.json and open a PR against main.

Folder layout

dataengine-pipelines/
├── scripts/
│   └── validate_function.py        # Checks a function folder has all required files
└── registry.json                   # Machine-readable index of all pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
python-cron-hello-world		python-cron-hello-world
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
dataengine.png		dataengine.png
image.png		image.png
registry.json		registry.json
vast-logo.jpg		vast-logo.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAST DataEngine Pipelines

What is VAST DataEngine?

Pipelines

In this repo

Reference Pipelines

Community

Contributing

Folder layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

VAST DataEngine Pipelines

What is VAST DataEngine?

Pipelines

In this repo

Reference Pipelines

Community

Contributing

Folder layout

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages