Skip to content

vast-data/dataengine-pipelines

Repository files navigation

alt text

VAST DataEngine Pipelines

A curated list of DataEngine pipeline examples, reference pipelines from the VAST GitHub org and community contributions.

What is VAST DataEngine?

VAST DataEngine is a serverless computing platform built into the VAST AI Operating System:

alt text

DataEngine lets you build, deploy, and scale data processing functions without managing infrastructure, running compute directly where data lives to eliminate costly data movement and duplication. The platform handles scheduling, event detection, and resource allocation so you can focus on business logic. At its core, DataEngine gives you three building blocks:

  • Functions: Your code built into container images and executed on VAST compute nodes (cnodes)
  • Triggers: Event sources like S3 uploads or cron schedules
  • Pipelines: Orchestration layer that connects triggers to functions

For a full overview, check out our recent blog post: VAST DataEngine: Bringing Compute to Your Data


Pipelines

Disclaimer: The pipelines listed here are provided for demonstration and educational purposes only. They are not guaranteed to be production-ready. Review, test, and harden any pipeline to meet your own requirements before deploying it in a production environment.

In this repo

Small, self-contained pipelines intended for training and workshop use:

Pipeline Trigger Runtime Link Description
python-cron-hello-world cron Python 3.12.12 link Pipeline with cron trigger that logs hello world.
python-s3-hello-world s3 Python 3.12.12 Coming soon: Pipeline with S3 trigger that retrieves and logs file data.
python-s3-llm s3 Python 3.12.12 Coming soon: Pipeline with S3 trigger that integrates an LLM API.
python-s3-video-ingestion s3 Python 3.12.12 Coming soon: Pipeline with S3 trigger for video ingestion.
python-s3-video-embeddings s3 Python 3.12.12 Coming soon: Pipeline with S3 trigger to generate video embeddings.

Reference Pipelines

Reference pipelines by VAST:

Pipeline Runtime Repo Description

Community

Pipelines built and maintained by the community:

Pipeline Runtime Repo Author Description

Contributing

See CONTRIBUTING.md for the full workflow and PR checklist.

To contribute, add an entry to registry.json and open a PR against main.


Folder layout

dataengine-pipelines/
├── scripts/
│   └── validate_function.py        # Checks a function folder has all required files
└── registry.json                   # Machine-readable index of all pipelines

About

A curated list of DataEngine pipeline examples, spanning reference pipelines from the VAST GitHub org and community contributions.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages