Skip to content

Flink Kubernetes Support#2

Open
esevastyanov wants to merge 13 commits intorelease-1.7from
release-1.7-k8s
Open

Flink Kubernetes Support#2
esevastyanov wants to merge 13 commits intorelease-1.7from
release-1.7-k8s

Conversation

@esevastyanov
Copy link
Copy Markdown

The current implementation of Kubernetes support is made for a session cluster only.
For additional information please see README file


## Task Manager
Task manager is a temporary essence and is created (and deleted) by a job manager for a particular slot.
No deployments/jobs/services are created for a task manager only pods.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for a task manager, only pods" comma missing?

Example:
```
kubectl create -f jobmanager-deployment.yaml
kubectl create -f jobmanager-service.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobmanager-exposer-deployment.yaml ?
Also, a question сomes up instantly how exactly it exposes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That creates the deployment with one job manager and service around it that exposes
(ClusterIP/NodePort/LoadBalancer/ExternalName) the job manager
https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types

TBD

## Kubernetes Resource Management
Resource management uses a default service account every pod contains. It should has admin privileges to be able
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should have"

package org.apache.flink.kubernetes.client;

/**
* represent a endpoint.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what endpoint?

void terminateClusterPod(ResourceID resourceID) throws KubernetesClientException;

/**
* stop cluster and clean up all resources, include services, auxiliary services and all running pods.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments begin with a capital letter and some don't

public Collection<ResourceProfile> startNewWorker(ResourceProfile resourceProfile) {
LOG.info("Starting a new worker.");
try {
nodeManagerClient.createClusterPod(resourceProfile);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at higher level we provide a worker with one slot only, does that strategy have a downside?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it is our basis, we consciously do the same on samza.
It's a reasonable solution because in this case, different slot threads will not compete for the CPU and memory (since task manager doesn't isolate these resources). Also recovering is easier. However, we will use a slot sharing feature and share slots between different Flink operations according to pipeline logic to get rid of high network usage between task managers.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for a downside, you asked, I may mention the absence of resource sharing. In the case of low job utilization, a task manager will simply stand idle without much load.
Also, in this case, there will be no slot grouping. This feature tends to reduce network traffic by allocating slots on a single task manager. However, we will use slot sharing instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants