A high-performance, distributed task execution platform designed for compute-intensive workloads in cloud-native environments.
ArmoniK enables organizations to scale computational workloads from single nodes to massive distributed clusters seamlessly. Built with cloud-native principles, it provides fault-tolerant task orchestration, auto-scaling capabilities, and enterprise-grade monitoring for mission-critical applications.
While there are many task orchestrators available, ArmoniK stands out by focusing on modularity and customization. Different use cases have unique requirements around data transfer patterns, task durations, and computational constraints. ArmoniK doesn't try to be everything to everyone; instead, it provides a flexible foundation that you can adapt to your specific use case.
This repository provides a complete reference deployment of ArmoniK for various cloud providers (AWS, GCP) as well as a generic Kubernetes deployment for on-premise deployments and local development (K3s) purposes.
Key advantages:
- Modular Architecture: Pick and choose components that fit your needs
- Easy to Tailor: Shape ArmoniK to match your specific workload
- Cloud-Native Design: Built from the ground up for modern cloud environments
Moreover, ArmoniK is already being used in production by multiple financial services and insurance companies, allowing them to run distributed workloads made up of millions of tasks without worrying about the low-level orchestration, data shuffling, or failure recovery mechanisms.
ArmoniK is suited for any workload that can benefit from high concurrency and task distribution, including:
- Scientific computation and simulations
- Machine learning pipelines
- Batch analytics pipelines
- Real-time distributed processing
- Financial simulations (Risk calculation) or combinatorial workloads
- Scalable algorithmic processing (e.g., bioinformatics, Monte Carlo, rendering)
ArmoniK is structured as multiple interconnected projects:
| Project | Description |
|---|---|
| ArmoniK.Infra | Infrastructure building blocks and deployment components. |
| ArmoniK.Core | Core orchestration logic and essential system components. |
| ArmoniK.Api | gRPC services and low-level APIs for ArmoniK integration (C#, C++, Rust, Python, Javascript). |
| ArmoniK.Samples | Example implementations and use cases in C++, Python, C#, Java, and Rust. |
| ArmoniK.Admin.GUI | Dashboard for monitoring your ArmoniK cluster. |
| ArmoniK.CLI | Command-line interface for monitoring and managing ArmoniK clusters. |
| ArmoniK.TaskReRunner | Debugging tool for rerunning previously submitted and processed tasks locally. |
| ArmoniK.Spec | Formal specification of ArmoniK's scheduling algorithm using TLA+. |
| ArmoniK.Utils | Common utilities used in the project. |
| ArmoniK.Community | Home to ArmoniK contribution guidelines and community proposed enhancement proposals. |
| ArmoniK.Infra.Plugins | Terraform modules for the different cloud resources and components of ArmoniK such as the load balancer. |
Higher level SDKs:
- ArmoniK.Extensions.Csharp - C# high-level abstractions
- ArmoniK.Extensions.Cpp - C++ bindings
- ArmoniK.Extensions.Java - Java bindings
- PymoniK - Python client library
- Dynamic scaling based on workload demands, optimized for preemptible computing resources
- Intelligent resource sharing between applications
- gRPC-based architecture supports multiple programming languages
- Officially supported: C#, C++, Python, Rust, Java, and JavaScript
- High-level and low-level API options available
- Multiple architectures: x86, ARM, GPUs, etc.
- Operating systems: Linux, Windows
- Environments: On-premises, cloud, hybrid deployments
- Continues functioning when nodes fail
- Task-level error management and recovery
- Robust handling of transient failures
- Battle-tested and used in real-world production environments
- Sub-tasking: Scheduling a task graph that evolves during execution. Larger tasks can dynamically be split into smaller tasks at runtime, enabling DAG reshaping and adaptive workflows.
- Pipelining: Downloading the data for the upcoming tasks during task execution to massively improve throughput.
- And more!
- Swap components without modifying core ArmoniK code
- Customize to suit specific user needs and constraints
- Extensible architecture
- Getting Started Guide - Complete documentation
- Community Discussions - Ask questions and share ideas
- Issue Tracker - Report bugs and request features
- Contributing - How to contribute to the project
This project was funded by AWS and built upon the HTCGrid project's foundation. We're grateful for their support in making distributed computing more accessible.
This project is licensed under the Apache License, Version 2.0. However, please note that the ArmoniK.Core component is under the AGPL license. See the LICENSE file for details.
⭐ Star this repository if you find ArmoniK useful!
Made with ❤️ by the ArmoniK team