Kubernetes之Controllers一

时间:2024-10-03 20:34:26

ReplicaSet is the next-generation Replication Controller. The only difference between a ReplicaSet and a Replication Controller right now is the selector support. ReplicaSet supports the new set-based selector requirements as described in the labels user guidewhereas a Replication Controller only supports equality-based selector requirements.

How to use a ReplicaSet

Most kubectl commands that support Replication Controllers also support ReplicaSets.

One exception is the rolling-updatecommand. If you want the rolling update functionality please consider using Deployments instead.Also, the rolling-updatecommand is imperative whereas Deployments are declarative, so we recommend using Deployments through the rolloutcommand.

While ReplicaSets can be used independently, today it’s mainly used by Deployments as a mechanism to orchestrate pod creation, deletion and updates.

When you use Deployments you don’t have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets.

When to use a ReplicaSet

A ReplicaSet ensures that a specified number of pod replicas are running at any given time.

However, a Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to pods along with a lot of other useful features.

Therefore, we recommend using Deployments instead of directly using ReplicaSets, unless you require custom update orchestration or don’t require updates at all.

This actually means that you may never need to manipulate ReplicaSet objects: use a Deployment instead, and define your application in the spec section.

Example

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: ReplicaSet
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
# this replicas value is default
# modify it according to your case
replicas: 3
selector:
matchLabels:
tier: frontend
matchExpressions:
- {key: tier, operator: In, values: [frontend]}
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
# If your cluster config does not include a dns service, then to
# instead access environment variables to find service host
# info, comment out the 'value: dns' line above, and uncomment the
# line below.
# value: env
ports:
- containerPort: 80

Saving this manifest into frontend.yaml and submitting it to a Kubernetes cluster should create the defined ReplicaSet and the pods that it manages.

$ kubectl create -f frontend.yaml
replicaset "frontend" created
$ kubectl describe rs/frontend
Name: frontend
Namespace: default
Selector: tier=frontend,tier in (frontend)
Labels: app=guestbook
tier=frontend
Annotations: <none>
Replicas: 3 current / 3 desired
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=guestbook
tier=frontend
Containers:
php-redis:
Image: gcr.io/google_samples/gb-frontend:v3
Port: 80/TCP
Requests:
cpu: 100m
memory: 100Mi
Environment:
GET_HOSTS_FROM: dns
Mounts: <none>
Volumes: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: frontend-qhloh
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: frontend-dnjpy
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: frontend-9si5l
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
frontend-9si5l 1/1 Running 0 1m
frontend-dnjpy 1/1 Running 0 1m
frontend-qhloh 1/1 Running 0 1m

  

Writing a ReplicaSet Spec

As with all other Kubernetes API objects, a ReplicaSet needs the apiVersionkind, and metadata fields. For general information about working with manifests, see Object Management.

A ReplicaSet also needs a .spec section.

Pod Template

The .spec.template is the only required field of the .spec. The .spec.template is a pod template. It has exactly the same schema as a pod, except that it is nested and does not have an apiVersion or kind.

In addition to required fields of a pod, a pod template in a ReplicaSet must specify appropriate labels and an appropriate restart policy.

For labels, make sure to not overlap with other controllers. For more information, see pod selector.

For restart policy, the only allowed value for .spec.template.spec.restartPolicy is Always, which is the default.

For local container restarts, ReplicaSet delegates to an agent on the node, for example the Kubelet or Docker.

Pod Selector

The .spec.selector field is a label selector.

A ReplicaSet manages all the pods with labels that match the selector.

It does not distinguish between pods that it created or deleted and pods that another person or process created or deleted.

This allows the ReplicaSet to be replaced without affecting the running pods.

The .spec.template.metadata.labels must match the .spec.selector, or it will be rejected by the API.

In Kubernetes 1.9 the API version apps/v1 on the ReplicaSet kind is the current version and is enabled by default.

The API version apps/v1beta2 is deprecated.

Also you should not normally create any pods whose labels match this selector, either directly, with another ReplicaSet, or with another controller such as a Deployment. If you do so, the ReplicaSet thinks that it created the other pods.

Kubernetes does not stop you from doing this.

If you do end up with multiple controllers that have overlapping selectors, you will have to manage the deletion yourself.

Labels on a ReplicaSet

The ReplicaSet can itself have labels (.metadata.labels).

Typically, you would set these the same as the .spec.template.metadata.labels.

However, they are allowed to be different, and the .metadata.labels do not affect the behavior of the ReplicaSet.

Replicas

You can specify how many pods should run concurrently by setting .spec.replicas.

The number running at any time may be higher or lower, such as if the replicas were just increased or decreased, or if a pod is gracefully shut down, and a replacement starts early.

If you do not specify .spec.replicas, then it defaults to 1.

Working with ReplicaSets

Deleting a ReplicaSet and its Pods

To delete a ReplicaSet and all its pods, use kubectl delete. Kubectl will scale the ReplicaSet to zero and wait for it to delete each pod before deleting the ReplicaSet itself. If this kubectl command is interrupted, it can be restarted.

When using the REST API or go client library, you need to do the steps explicitly (scale replicas to 0, wait for pod deletions, then delete the ReplicaSet).

Deleting just a ReplicaSet

You can delete a ReplicaSet without affecting any of its pods, using kubectl delete with the --cascade=false option.

When using the REST API or go client library, simply delete the ReplicaSet object.

Once the original is deleted, you can create a new ReplicaSet to replace it. As long as the old and new .spec.selector are the same, then the new one will adopt the old pods.

However, it will not make any effort to make existing pods match a new, different pod template.

To update pods to a new spec in a controlled way, use a rolling update.

Isolating pods from a ReplicaSet

Pods may be removed from a ReplicaSet’s target set by changing their labels.

This technique may be used to remove pods from service for debugging, data recovery, etc.

Pods that are removed in this way will be replaced automatically ( assuming that the number of replicas is not also changed).

Scaling a ReplicaSet

A ReplicaSet can be easily scaled up or down by simply updating the .spec.replicas field.

The ReplicaSet controller ensures that that a desired number of pods with a matching label selector are available and operational.

ReplicaSet as an Horizontal Pod Autoscaler Target

A ReplicaSet can also be a target for Horizontal Pod Autoscalers (HPA). That is, a ReplicaSet can be auto-scaled by an HPA. Here is an example HPA targeting the ReplicaSet we created in the previous example.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
scaleTargetRef:
kind: ReplicaSet
name: frontend
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 50

Saving this manifest into hpa-rs.yaml and submitting it to a Kubernetes cluster should create the defined HPA that autoscales the target ReplicaSet depending on the CPU usage of the replicated pods.

kubectl create -f hpa-rs.yaml

Alternatively, you can use the kubectl autoscale command to accomplish the same (and it’s easier!)
kubectl autoscale rs frontend

Alternatives to ReplicaSet

Deployment (Recommended)

Deployment is a higher-level API object that updates its underlying ReplicaSets and their Pods in a similar fashion as kubectl rolling-update.

Deployments are recommended if you want this rolling update functionality, because unlike kubectl rolling-update, they are declarative, server-side, and have additional features.

For more information on running a stateless application using a Deployment, please read Run a Stateless Application Using a Deployment.

Bare Pods

Unlike the case where a user directly created pods, a ReplicaSet replaces pods that are deleted or terminated for any reason, such as in the case of node failure or disruptive node maintenance, such as a kernel upgrade.

For this reason, we recommend that you use a ReplicaSet even if your application requires only a single pod.

Think of it similarly to a process supervisor, only it supervises multiple pods across multiple nodes instead of individual processes on a single node.

A ReplicaSet delegates local container restarts to some agent on the node (for example, Kubelet or Docker).

Job

Use a Job instead of a ReplicaSet for pods that are expected to terminate on their own (that is, batch jobs).

DaemonSet

Use a DaemonSet instead of a ReplicaSet for pods that provide a machine-level function, such as machine monitoring or machine logging.

These pods have a lifetime that is tied to a machine lifetime: the pod needs to be running on the machine before other pods start, and are safe to terminate when the machine is otherwise ready to be rebooted/shutdown.

Replication Controller

NOTE: A Deployment that configures a ReplicaSet is now the recommended way to set up replication.

ReplicationController ensures that a specified number of pod replicas are running at any one time.

In other words, a ReplicationController makes sure that a pod or a homogeneous set of pods is always up and available.

How a ReplicationController Works

If there are too many pods, the ReplicationController terminates the extra pods.

If there are too few, the ReplicationController starts more pods.

Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted, or are terminated. For example, your pods are re-created on a node after disruptive maintenance such as a kernel upgrade.

For this reason, you should use a ReplicationController even if your application requires only a single pod.

A ReplicationController is similar to a process supervisor, but instead of supervising individual processes on a single node, the ReplicationController supervises multiple pods across multiple nodes.

ReplicationController is often abbreviated to “rc” or “rcs” in discussion, and as a shortcut in kubectl commands.

A simple case is to create one ReplicationController object to reliably run one instance of a Pod indefinitely.

A more complex use case is to run several identical replicas of a replicated service, such as web servers.

Running an example ReplicationController

This example ReplicationController config runs three copies of the nginx web server.

apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80

Run the example job by downloading the example file and then running this command:

$ kubectl create -f ./replication.yaml
replicationcontroller "nginx" created

Check on the status of the ReplicationController using this command:

$ kubectl describe replicationcontrollers/nginx
Name: nginx
Namespace: default
Selector: app=nginx
Labels: app=nginx
Annotations: <none>
Replicas: 3 current / 3 desired
Pods Status: 0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=nginx
Containers:
nginx:
Image: nginx
Port: 80/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- ---- ------ -------
20s 20s 1 {replication-controller } Normal SuccessfulCreate Created pod: nginx-qrm3m
20s 20s 1 {replication-controller } Normal SuccessfulCreate Created pod: nginx-3ntk0
20s 20s 1 {replication-controller } Normal SuccessfulCreate Created pod: nginx-4ok8v

Here, three pods are created, but none is running yet, perhaps because the image is being pulled. A little later, the same command may show:

Pods Status:    3 Running / 0 Waiting / 0 Succeeded / 0 Failed

To list all the pods that belong to the ReplicationController in a machine readable form, you can use a command like this:

$ pods=$(kubectl get pods --selector=app=nginx --output=jsonpath={.items..metadata.name})
echo $pods
nginx-3ntk0 nginx-4ok8v nginx-qrm3m

Here, the selector is the same as the selector for the ReplicationController (seen in the kubectl describe output, and in a different form in replication.yaml. The --output=jsonpath option specifies an expression that just gets the name from each pod in the returned list.

Writing a ReplicationController Spec

As with all other Kubernetes config, a ReplicationController needs apiVersionkind, and metadata fields. For general information about working with config files, see Object Management.

A ReplicationController also needs a .spec section.

Pod Template

The .spec.template is the only required field of the .spec.

The .spec.template is a pod template. It has exactly the same schema as a pod, except it is nested and does not have an apiVersion or kind.

In addition to required fields for a Pod, a pod template in a ReplicationController must specify appropriate labels and an appropriate restart policy.

For labels, make sure not to overlap with other controllers. See pod selector.

Only a .spec.template.spec.restartPolicy equal to Always is allowed, which is the default if not specified.

For local container restarts, ReplicationControllers delegate to an agent on the node, for example the Kubelet or Docker.

Labels on the ReplicationController

The ReplicationController can itself have labels (.metadata.labels).

Typically, you would set these the same as the .spec.template.metadata.labels;

if .metadata.labels is not specified then it defaults to.spec.template.metadata.labels.

However, they are allowed to be different, and the .metadata.labels do not affect the behavior of the ReplicationController.

Pod Selector

The .spec.selector field is a label selector.

A ReplicationController manages all the pods with labels that match the selector.

It does not distinguish between pods that it created or deleted and pods that another person or process created or deleted.

This allows the ReplicationController to be replaced without affecting the running pods.

If specified, the .spec.template.metadata.labels must be equal to the .spec.selector, or it will be rejected by the API.

If .spec.selector is unspecified, it will be defaulted to .spec.template.metadata.labels.

Also you should not normally create any pods whose labels match this selector, either directly, with another ReplicationController, or with another controller such as Job.

If you do so, the ReplicationController thinks that it created the other pods. Kubernetes does not stop you from doing this.

If you do end up with multiple controllers that have overlapping selectors, you will have to manage the deletion yourself (see below).

Multiple Replicas

You can specify how many pods should run concurrently by setting .spec.replicas to the number of pods you would like to have running concurrently.

The number running at any time may be higher or lower, such as if the replicas were just increased or decreased, or if a pod is gracefully shutdown, and a replacement starts early.

If you do not specify .spec.replicas, then it defaults to 1.

Working with ReplicationControllers

Deleting a ReplicationController and its Pods

To delete a ReplicationController and all its pods, use kubectl delete.

Kubectl will scale the ReplicationController to zero and wait for it to delete each pod before deleting the ReplicationController itself. If this kubectl command is interrupted, it can be restarted.

When using the REST API or go client library, you need to do the steps explicitly (scale replicas to 0, wait for pod deletions, then delete the ReplicationController).

Deleting just a ReplicationController

You can delete a ReplicationController without affecting any of its pods.

Using kubectl, specify the --cascade=false option to kubectl delete.

When using the REST API or go client library, simply delete the ReplicationController object.

Once the original is deleted, you can create a new ReplicationController to replace it.

As long as the old and new .spec.selector are the same, then the new one will adopt the old pods.

However, it will not make any effort to make existing pods match a new, different pod template. To update pods to a new spec in a controlled way, use a rolling update.

Isolating pods from a ReplicationController

Pods may be removed from a ReplicationController’s target set by changing their labels.

This technique may be used to remove pods from service for debugging, data recovery, etc.

Pods that are removed in this way will be replaced automatically (assuming that the number of replicas is not also changed).

Common usage patterns

Rescheduling

As mentioned above, whether you have 1 pod you want to keep running, or 1000, a ReplicationController will ensure that the specified number of pods exists, even in the event of node failure or pod termination (for example, due to an action by another control agent).

Scaling

The ReplicationController makes it easy to scale the number of replicas up or down, either manually or by an auto-scaling control agent, by simply updating the replicas field.

Rolling updates

The ReplicationController is designed to facilitate rolling updates to a service by replacing pods one-by-one.

As explained in #1353, the recommended approach is to create a new ReplicationController with 1 replica, scale the new (+1) and old (-1) controllers one by one, and then delete the old controller after it reaches 0 replicas.

This predictably updates the set of pods regardless of unexpected failures.

Ideally, the rolling update controller would take application readiness into account, and would ensure that a sufficient number of pods were productively serving at any given time.

The two ReplicationControllers would need to create pods with at least one differentiating label, such as the image tag of the primary container of the pod, since it is typically image updates that motivate rolling updates.

Rolling update is implemented in the client tool kubectl rolling-update. Visit kubectl rolling-update task for more concrete examples.

Multiple release tracks

In addition to running multiple releases of an application while a rolling update is in progress, it’s common to run multiple releases for an extended period of time, or even continuously, using multiple release tracks. The tracks would be differentiated by labels.

For instance, a service might target all pods with tier in (frontend), environment in (prod).

Now say you have 10 replicated pods that make up this tier. But you want to be able to ‘canary’ a new version of this component. You could set up a ReplicationController with replicas set to 9 for the bulk of the replicas, with labels tier=frontend, environment=prod, track=stable, and another ReplicationController with replicas set to 1 for the canary, with labels tier=frontend, environment=prod, track=canary.

Now the service is covering both the canary and non-canary pods.

But you can mess with the ReplicationControllers separately to test things out, monitor the results, etc.

Using ReplicationControllers with Services

Multiple ReplicationControllers can sit behind a single service, so that, for example, some traffic goes to the old version, and some goes to the new version.

A ReplicationController will never terminate on its own, but it isn’t expected to be as long-lived as services.

Services may be composed of pods controlled by multiple ReplicationControllers, and it is expected that many ReplicationControllers may be created and destroyed over the lifetime of a service (for instance, to perform an update of pods that run the service).

Both services themselves and their clients should remain oblivious to the ReplicationControllers that maintain the pods of the services.

Writing programs for Replication

Pods created by a ReplicationController are intended to be fungible and semantically identical, though their configurations may become heterogeneous over time.

This is an obvious fit for replicated stateless servers, but ReplicationControllers can also be used to maintain availability of master-elected, sharded, and worker-pool applications.

Such applications should use dynamic work assignment mechanisms, such as the etcd lock module or RabbitMQ work queues, as opposed to static/one-time customization of the configuration of each pod, which is considered an anti-pattern.

Any pod customization performed, such as vertical auto-sizing of resources (for example, cpu or memory), should be performed by another online controller process, not unlike the ReplicationController itself.

Responsibilities of the ReplicationController

The ReplicationController simply ensures that the desired number of pods matches its label selector and are operational.

Currently, only terminated pods are excluded from its count.

In the future, readiness and other information available from the system may be taken into account, we may add more controls over the replacement policy, and we plan to emit events that could be used by external clients to implement arbitrarily sophisticated replacement and/or scale-down policies.

The ReplicationController is forever constrained to this narrow responsibility.

It itself will not perform readiness nor liveness probes. Rather than performing auto-scaling, it is intended to be controlled by an external auto-scaler (as discussed in #492), which would change its replicas field.

We will not add scheduling policies (for example, spreading) to the ReplicationController. Nor should it verify that the pods controlled match the currently specified template, as that would obstruct auto-sizing and other automated processes.

Similarly, completion deadlines, ordering dependencies, configuration expansion, and other features belong elsewhere. We even plan to factor out the mechanism for bulk pod creation (#170).

The ReplicationController is intended to be a composable building-block primitive.

We expect higher-level APIs and/or tools to be built on top of it and other complementary primitives for user convenience in the future.

The “macro” operations currently supported by kubectl (run, scale, rolling-update) are proof-of-concept examples of this.

For instance, we could imagine something like Asgard managing ReplicationControllers, auto-scalers, services, scheduling policies, canaries, etc.

API Object

Replication controller is a top-level resource in the Kubernetes REST API.

More details about the API object can be found at: ReplicationController API object.

Alternatives to ReplicationController

ReplicaSet

ReplicaSet is the next-generation ReplicationController that supports the new set-based label selector.

It’s mainly used by Deployment as a mechanism to orchestrate pod creation, deletion and updates.

Note that we recommend using Deployments instead of directly using Replica Sets, unless you require custom update orchestration or don’t require updates at all.

Deployment (Recommended)

Deployment is a higher-level API object that updates its underlying Replica Sets and their Pods in a similar fashion as kubectl rolling-update.

Deployments are recommended if you want this rolling update functionality, because unlike kubectl rolling-update, they are declarative, server-side, and have additional features.

Bare Pods

Unlike in the case where a user directly created pods, a ReplicationController replaces pods that are deleted or terminated for any reason, such as in the case of node failure or disruptive node maintenance, such as a kernel upgrade.

For this reason, we recommend that you use a ReplicationController even if your application requires only a single pod.

Think of it similarly to a process supervisor, only it supervises multiple pods across multiple nodes instead of individual processes on a single node.

A ReplicationController delegates local container restarts to some agent on the node (for example, Kubelet or Docker).

Job

Use a Job instead of a ReplicationController for pods that are expected to terminate on their own (that is, batch jobs).

DaemonSet

Use a DaemonSet instead of a ReplicationController for pods that provide a machine-level function, such as machine monitoring or machine logging.

These pods have a lifetime that is tied to a machine lifetime: the pod needs to be running on the machine before other pods start, and are safe to terminate when the machine is otherwise ready to be rebooted/shutdown.