• United States+1
  • United Kingdom+44
  • Afghanistan (‫افغانستان‬‎)+93
  • Albania (Shqipëri)+355
  • Algeria (‫الجزائر‬‎)+213
  • American Samoa+1684
  • Andorra+376
  • Angola+244
  • Anguilla+1264
  • Antigua and Barbuda+1268
  • Argentina+54
  • Armenia (Հայաստան)+374
  • Aruba+297
  • Australia+61
  • Austria (Österreich)+43
  • Azerbaijan (Azərbaycan)+994
  • Bahamas+1242
  • Bahrain (‫البحرين‬‎)+973
  • Bangladesh (বাংলাদেশ)+880
  • Barbados+1246
  • Belarus (Беларусь)+375
  • Belgium (België)+32
  • Belize+501
  • Benin (Bénin)+229
  • Bermuda+1441
  • Bhutan (འབྲུག)+975
  • Bolivia+591
  • Bosnia and Herzegovina (Босна и Херцеговина)+387
  • Botswana+267
  • Brazil (Brasil)+55
  • British Indian Ocean Territory+246
  • British Virgin Islands+1284
  • Brunei+673
  • Bulgaria (България)+359
  • Burkina Faso+226
  • Burundi (Uburundi)+257
  • Cambodia (កម្ពុជា)+855
  • Cameroon (Cameroun)+237
  • Canada+1
  • Cape Verde (Kabu Verdi)+238
  • Caribbean Netherlands+599
  • Cayman Islands+1345
  • Central African Republic (République centrafricaine)+236
  • Chad (Tchad)+235
  • Chile+56
  • China (中国)+86
  • Christmas Island+61
  • Cocos (Keeling) Islands+61
  • Colombia+57
  • Comoros (‫جزر القمر‬‎)+269
  • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
  • Congo (Republic) (Congo-Brazzaville)+242
  • Cook Islands+682
  • Costa Rica+506
  • Côte d’Ivoire+225
  • Croatia (Hrvatska)+385
  • Cuba+53
  • Curaçao+599
  • Cyprus (Κύπρος)+357
  • Czech Republic (Česká republika)+420
  • Denmark (Danmark)+45
  • Djibouti+253
  • Dominica+1767
  • Dominican Republic (República Dominicana)+1
  • Ecuador+593
  • Egypt (‫مصر‬‎)+20
  • El Salvador+503
  • Equatorial Guinea (Guinea Ecuatorial)+240
  • Eritrea+291
  • Estonia (Eesti)+372
  • Ethiopia+251
  • Falkland Islands (Islas Malvinas)+500
  • Faroe Islands (Føroyar)+298
  • Fiji+679
  • Finland (Suomi)+358
  • France+33
  • French Guiana (Guyane française)+594
  • French Polynesia (Polynésie française)+689
  • Gabon+241
  • Gambia+220
  • Georgia (საქართველო)+995
  • Germany (Deutschland)+49
  • Ghana (Gaana)+233
  • Gibraltar+350
  • Greece (Ελλάδα)+30
  • Greenland (Kalaallit Nunaat)+299
  • Grenada+1473
  • Guadeloupe+590
  • Guam+1671
  • Guatemala+502
  • Guernsey+44
  • Guinea (Guinée)+224
  • Guinea-Bissau (Guiné Bissau)+245
  • Guyana+592
  • Haiti+509
  • Honduras+504
  • Hong Kong (香港)+852
  • Hungary (Magyarország)+36
  • Iceland (Ísland)+354
  • India (भारत)+91
  • Indonesia+62
  • Iran (‫ایران‬‎)+98
  • Iraq (‫العراق‬‎)+964
  • Ireland+353
  • Isle of Man+44
  • Israel (‫ישראל‬‎)+972
  • Italy (Italia)+39
  • Jamaica+1876
  • Japan (日本)+81
  • Jersey+44
  • Jordan (‫الأردن‬‎)+962
  • Kazakhstan (Казахстан)+7
  • Kenya+254
  • Kiribati+686
  • Kosovo+383
  • Kuwait (‫الكويت‬‎)+965
  • Kyrgyzstan (Кыргызстан)+996
  • Laos (ລາວ)+856
  • Latvia (Latvija)+371
  • Lebanon (‫لبنان‬‎)+961
  • Lesotho+266
  • Liberia+231
  • Libya (‫ليبيا‬‎)+218
  • Liechtenstein+423
  • Lithuania (Lietuva)+370
  • Luxembourg+352
  • Macau (澳門)+853
  • Macedonia (FYROM) (Македонија)+389
  • Madagascar (Madagasikara)+261
  • Malawi+265
  • Malaysia+60
  • Maldives+960
  • Mali+223
  • Malta+356
  • Marshall Islands+692
  • Martinique+596
  • Mauritania (‫موريتانيا‬‎)+222
  • Mauritius (Moris)+230
  • Mayotte+262
  • Mexico (México)+52
  • Micronesia+691
  • Moldova (Republica Moldova)+373
  • Monaco+377
  • Mongolia (Монгол)+976
  • Montenegro (Crna Gora)+382
  • Montserrat+1664
  • Morocco (‫المغرب‬‎)+212
  • Mozambique (Moçambique)+258
  • Myanmar (Burma) (မြန်မာ)+95
  • Namibia (Namibië)+264
  • Nauru+674
  • Nepal (नेपाल)+977
  • Netherlands (Nederland)+31
  • New Caledonia (Nouvelle-Calédonie)+687
  • New Zealand+64
  • Nicaragua+505
  • Niger (Nijar)+227
  • Nigeria+234
  • Niue+683
  • Norfolk Island+672
  • North Korea (조선 민주주의 인민 공화국)+850
  • Northern Mariana Islands+1670
  • Norway (Norge)+47
  • Oman (‫عُمان‬‎)+968
  • Pakistan (‫پاکستان‬‎)+92
  • Palau+680
  • Palestine (‫فلسطين‬‎)+970
  • Panama (Panamá)+507
  • Papua New Guinea+675
  • Paraguay+595
  • Peru (Perú)+51
  • Philippines+63
  • Poland (Polska)+48
  • Portugal+351
  • Puerto Rico+1
  • Qatar (‫قطر‬‎)+974
  • Réunion (La Réunion)+262
  • Romania (România)+40
  • Russia (Россия)+7
  • Rwanda+250
  • Saint Barthélemy (Saint-Barthélemy)+590
  • Saint Helena+290
  • Saint Kitts and Nevis+1869
  • Saint Lucia+1758
  • Saint Martin (Saint-Martin (partie française))+590
  • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
  • Saint Vincent and the Grenadines+1784
  • Samoa+685
  • San Marino+378
  • São Tomé and Príncipe (São Tomé e Príncipe)+239
  • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
  • Senegal (Sénégal)+221
  • Serbia (Србија)+381
  • Seychelles+248
  • Sierra Leone+232
  • Singapore+65
  • Sint Maarten+1721
  • Slovakia (Slovensko)+421
  • Slovenia (Slovenija)+386
  • Solomon Islands+677
  • Somalia (Soomaaliya)+252
  • South Africa+27
  • South Korea (대한민국)+82
  • South Sudan (‫جنوب السودان‬‎)+211
  • Spain (España)+34
  • Sri Lanka (ශ්‍රී ලංකාව)+94
  • Sudan (‫السودان‬‎)+249
  • Suriname+597
  • Svalbard and Jan Mayen+47
  • Swaziland+268
  • Sweden (Sverige)+46
  • Switzerland (Schweiz)+41
  • Syria (‫سوريا‬‎)+963
  • Taiwan (台灣)+886
  • Tajikistan+992
  • Tanzania+255
  • Thailand (ไทย)+66
  • Timor-Leste+670
  • Togo+228
  • Tokelau+690
  • Tonga+676
  • Trinidad and Tobago+1868
  • Tunisia (‫تونس‬‎)+216
  • Turkey (Türkiye)+90
  • Turkmenistan+993
  • Turks and Caicos Islands+1649
  • Tuvalu+688
  • U.S. Virgin Islands+1340
  • Uganda+256
  • Ukraine (Україна)+380
  • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
  • United Kingdom+44
  • United States+1
  • Uruguay+598
  • Uzbekistan (Oʻzbekiston)+998
  • Vanuatu+678
  • Vatican City (Città del Vaticano)+39
  • Venezuela+58
  • Vietnam (Việt Nam)+84
  • Wallis and Futuna+681
  • Western Sahara (‫الصحراء الغربية‬‎)+212
  • Yemen (‫اليمن‬‎)+967
  • Zambia+260
  • Zimbabwe+263
  • Åland Islands+358
Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Autoscaling, a key feature of Kubernetes, lets you improve the resource utilization of your cluster by automatically adjusting the application’s resources or replicas depending on the load at that time.

This blog talks about Pod Autoscaling in Kubernetes and how to set up and configure autoscalers to optimize the resource utilization of your application.

Horizontal Pod Autoscaling

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) scales the number of pods of a replica-set/ deployment/ statefulset based on per-pod metrics received from resource metrics API (metrics.k8s.io) provided by metrics-server, the custom metrics API (custom.metrics.k8s.io), or the external metrics API (external.metrics.k8s.io).

Horizontal Pod Autoscaling
Fig:- Horizontal Pod Autoscaling

Prerequisite

Verify that the metrics-server is already deployed and running using the command below, or deploy it using instructions here.

kubectl get deployment metrics-server -n kube-system

HPA using Multiple Resource Metrics

HPA fetches per-pod resource metrics (like CPU, memory) from the resource metrics API and calculates the current metric value based on the mean values of all targeted pods. It compares the current metric value with the target metric value specified in the HPA spec and produces a ratio used to scale the number of desired replicas.

A. Setup: Create a Deployment and HPA resource

In this blog post, I have used the config below to create a deployment of 3 replicas, with some memory load defined by “--vm-bytes", "850M”.

apiVersion: apps/v1
kind: Deployment
metadata:
name: autoscale-tester
spec:
replicas: 3
selector:
matchLabels:
app: autoscale-tester
template:
metadata:
labels:
app: autoscale-tester
spec:
containers:
- args: [ "--vm", "1", "--vm-bytes", "850M", "--vm-hang", "1"]
command:
- stress
image: polinux/stress
name: autoscale-tester
resources:
limits:
cpu: "1"
memory: 1000Mi
requests:
cpu: "1"
memory: 1000Mi

NOTE: It’s recommended not to use HPA and VPA on the same pods or deployments.

kubectl top po
NAME CPU(cores) MEMORY(bytes)
autoscale-tester-878b8c6c8-42gmk 326m 853Mi
autoscale-tester-878b8c6c8-gp45f 410m 852Mi
autoscale-tester-878b8c6c8-tz4mg 388m 852Mi

Lets create an HPA resource for this deployment with multiple metric blocks defined. The HPA will consider each metric one-by-one and calculate the desired replica counts based on each of the metrics, and then select the one with the highest replica count.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: autoscale-tester
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: autoscale-tester
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 500Mi
view raw hpa.yaml hosted with ❤ by GitHub

  • We have defined the minimum number  of replicas HPA can scale down to as 1 and the maximum number that it can scale up to as 10.
  • Target Average Utilization and Target Average Values implies that the HPA should scale the replicas up/down to keep the Current Metric Value equal or closest to Target Metric Value.

B. Understanding the HPA Algorithm

kubectl describe hpa autoscale-tester
Name: autoscale-tester
Namespace: autoscale-tester
...
Metrics: ( current / target )
resource memory on pods: 894188202666m / 500Mi
resource cpu on pods (as a percentage of request): 36% (361m) / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 6 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 6
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 7s horizontal-pod-autoscaler New size: 6; reason: memory resource above target
view raw describe-hpa.sh hosted with ❤ by GitHub

  • HPA calculates pod utilization as total usage of all containers in the pod divided by total request. It looks at all containers individually and returns if container doesn't have request.
  • The calculated  Current Metric Value for memory, i,e., 894188202666m, is higher than the Target Average Value of 500Mi, so the replicas need to be scaled up.
  • The calculated  Current Metric Value for CPU i.e., 36%, is lower than the Target Average Utilization of 50, so  hence the replicas need to be scaled down.
  • Replicas are calculated based on both metrics and the highest replica count selected. So, the replicas are scaled up to 6 in this case.

HPA using Custom metrics

We will use the prometheus-adapter resource to expose custom application metrics to custom.metrics.k8s.io/v1beta1, which are retrieved by HPA. By defining our own metrics through the adapter’s configuration, we can let HPA perform scaling based on our custom metrics.

A. Setup: Install Prometheus Adapter

Create prometheus-adapter.yaml with the content below:

prometheus:
url: http://prometheus-server
port: 0
image:
tag: latest
rules:
custom:
- seriesQuery: 'container_network_receive_packets_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "container_network_receive_packets_total"
as: "packets_in"
metricsQuery: <<.Series>>{<<.LabelMatchers>>}

helm install stable/prometheus -n prometheus --namespace prometheus
helm install stable/prometheus-adapter -n prometheus-adapter --namespace prometheus -f prometheus-adapter.yaml

Once the charts are deployed, verify the metrics are exposed at v1beta1.custom.metrics.k8s.io:

kubectl get apiservice
NAME SERVICE AVAILABLE AGE
v1beta1.custom.metrics.k8s.io prometheus/prometheus-adapter True 19m
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/*/packets_in | jq
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/%2A/packets_in"
},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "autoscale-hpa",
"name": "autoscale-tester-878b8c6c8-42gmk",
"apiVersion": "/v1"
},
"metricName": "packets_in",
"timestamp": "2020-07-31T05:59:33Z",
"value": "33",
"selector": null
},
{
"describedObject": {
"kind": "Pod",
"namespace": "autoscale-hpa",
"name": "autoscale-tester-878b8c6c8-hfts8",
"apiVersion": "/v1"
},
"metricName": "packets_in",
"timestamp": "2020-07-31T05:59:33Z",
"value": "11",
"selector": null
},
{
"describedObject": {
"kind": "Pod",
"namespace": "autoscale-hpa",
"name": "autoscale-tester-878b8c6c8-rb9v2",
"apiVersion": "/v1"
},
"metricName": "packets_in",
"timestamp": "2020-07-31T05:59:33Z",
"value": "10",
"selector": null
}
]
}

You can see the metrics value of all the replicas in the output.

B. Understanding Prometheus Adapter Configuration

The adapter considers metrics defined with the parameters below:

1. seriesQuery tells the Prometheus Metric name to the adapter

2. resources tells which Kubernetes resources each metric is associated with or which labels does the metric include, e.g., namespace, pod etc.

3. metricsQuery is the actual Prometheus query that needs to be performed to calculate the actual values.

4. name with which the metric should be exposed to the custom metrics API

For instance, if we want to calculate the rate of container_network_receive_packets_total, we will need to write this query in Prometheus UI:

sum(rate(container_network_receive_packets_total{namespace="autoscale-tester",pod=~"autoscale-tester.*"}[10m])) by (pod)

This query is represented as below in the adapter configuration:

metricsQuery: 'sum(rate(<<.series>>{<<.labelmatchers>>}10m])) by (<<.groupby>>)'</.groupby></.labelmatchers></.series>

C. Create an HPA resource

Now, let's create an HPA resource with the pod metric packets_in using the config below, and then describe the HPA resource.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: autoscale-tester
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: autoscale-tester
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: packets_in
target:
type: AverageValue
averageValue: 50

kubectl describe hpa autoscale-tester
Name: autoscale-tester
Namespace: autoscale-tester
...
Metrics: ( current / target )
"packets_in" on pods: 18666m / 50
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric packets_in
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 2s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
Normal SuccessfulRescale 2m51s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
kubectl describe hpa autoscale-tester
Name: autoscale-tester
Namespace: autoscale-tester
...
Metrics: ( current / target )
"packets_in" on pods: 18666m / 50
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric packets_in
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 2s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
Normal SuccessfulRescale 2m51s horizontal-pod-autoscaler New size: 1; reason: All metrics below target

Here, the current calculated metric value is 18666m. The m represents milli-units. So, for example, 18666m means 18.666 which is what we expect ((33 + 11 + 10 )/3 = 18.666). Since it's less than the target average value (i.e., 50), the HPA scales down the replicas to make the Current Metric Value : Target Metric Value ratio closest to 1. Hence, replicas are scaled down to 2 and later to 1.

Fig.2 Prometheus :container_network_receive_packets_total{namespace=”autoscale-tester}
Fig:- container_network_receive_packets_total


Prometheus: Ratio of avg(container_network_receive_packets_total{namespace=”autoscale-tester}) :Target Average Value
Fig:- Ratio to Target value

Vertical Pod Autoscaling

What is Vertical Pod Autoscaler?

Vertical Pod autoscaling (VPA) ensures that a container’s resources are not under- or over-utilized. It recommends optimized CPU and memory requests/limits values, and can also automatically update them for you so that the cluster resources are efficiently used.

Fig:- Vertical Pod Autoscaling

Architecture

VPA consists of 3 components:

  • VPA admission controller
    Once you deploy and enable the Vertical Pod Autoscaler in your cluster, every pod submitted to the cluster goes through this webhook, which checks whether a VPA object is referencing it.
  • VPA recommender
    The recommender pulls the current and past resource consumption (CPU and memory) data for each container from metrics-server running in the cluster and provides optimal resource recommendations based on it, so that a container uses only what it needs.
  • VPA updater
    The updater checks at regular intervals if a pod is running within the recommended range. Otherwise, it accepts it for update, and the pod is evicted by the VPA updater to apply resource recommendation.

Installation

If you are on Google Cloud Platform, you can simply enable vertical-pod-autoscaling:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling
view raw enable-vpa.sh hosted with ❤ by GitHub

To install it manually follow below steps:

  • Verify that the metrics-server deployment is running, or deploy it using instructions here.

kubectl get deployment metrics-server -n kube-system

  • Also, verify the API below is enabled:

kubectl api-versions | grep admissionregistration
admissionregistration.k8s.io/v1beta1
view raw check-api.sh hosted with ❤ by GitHub

  • Clone the kubernetes/autoscaler GitHub repository, and then deploy the Vertical Pod Autoscaler with the following command.

git clone https://github.com/kubernetes/autoscaler.git
./autoscaler/vertical-pod-autoscaler/hack/vpa-up.sh

Verify that the Vertical Pod Autoscaler pods are up and running:

kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
vpa-admission-controller-68c748777d-ppspd 1/1 Running 0 7s
vpa-recommender-6fc8c67d85-gljpl 1/1 Running 0 8s
vpa-updater-786b96955c-bgp9d 1/1 Running 0 8s
kubectl get crd
verticalpodautoscalers.autoscaling.k8s.io
view raw get-vpa-pods.sh hosted with ❤ by GitHub

VPA using Resource Metrics

A. Setup: Create a Deployment and VPA resource

Use the same deployment config to create a new deployment with "--vm-bytes", "850M". Then create a VPA resource in Recommendation Mode with updateMode : Off

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
name: autoscale-tester-recommender
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: autoscale-tester
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: autoscale-tester
minAllowed:
cpu: "500m"
memory: "500Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
view raw vpa.yaml hosted with ❤ by GitHub

  • minAllowed is an optional parameter that specifies the minimum CPU request and memory request allowed for the container. 
  • maxAllowed is an optional parameter that specifies the maximum CPU request and memory request allowed for the container.

B. Check the Pod’s Resource Utilization

Check the resource utilization of the pods. Below, you can see only ~50 Mi memory is being used out of 1000Mi and only ~30m CPU out of 1000m. This clearly indicates that the pod resources are underutilized.

Kubectl top po
NAME CPU(cores) MEMORY(bytes)
autoscale-tester-5d6b48d64f-8zgb9 39m 51Mi
autoscale-tester-5d6b48d64f-npts4 32m 50Mi
autoscale-tester-5d6b48d64f-vctx5 35m 50Mi
view raw top.sh hosted with ❤ by GitHub

If you describe the VPA resource, you can see the Recommendations provided. (It may take some time to show them.)

kubectl describe vpa autoscale-tester-recommender
Name: autoscale-tester-recommender
Namespace: autoscale-tester
...
Recommendation:
Container Recommendations:
Container Name: autoscale-tester
Lower Bound:
Cpu: 500m
Memory: 500Mi
Target:
Cpu: 500m
Memory: 500Mi
Uncapped Target:
Cpu: 93m
Memory: 262144k
Upper Bound:
Cpu: 4
Memory: 4Gi
view raw describe-vpa.sh hosted with ❤ by GitHub

C. Understand the VPA recommendations

Target: The recommended CPU request and memory request for the container that will be applied to the pod by VPA.

Uncapped Target: The recommended CPU request and memory request for the container if you didn’t configure upper/lower limits in the VPA definition. These values will not be applied to the pod. They’re used only as a status indication.

Lower Bound: The minimum recommended CPU request and memory request for the container. There is a --pod-recommendation-min-memory-mb flag that determines the minimum amount of memory the recommender will set—it defaults to 250MiB.

Upper Bound: The maximum recommended CPU request and memory request for the container.  It helps the VPA updater avoid eviction of pods that are close to the recommended target values. Eventually, the Upper Bound is expected to reach close to target recommendation.

Recommendation:
Container Recommendations:
Container Name: autoscale-tester
Lower Bound:
Cpu: 500m
Memory: 500Mi
Target:
Cpu: 500m
Memory: 500Mi
Uncapped Target:
Cpu: 93m
Memory: 262144k
Upper Bound:
Cpu: 500m
Memory: 1274858485

D. VPA processing with Update Mode Off/Auto

Now, if you check the logs of vpa-updater, you can see it's not processing VPA objects as the Update Mode is set as Off.

kubectl logs -f vpa-updater-675d47464b-k7xbx
1 updater.go:135] skipping VPA object autoscale-tester-recommender because its mode is not "Recreate" or "Auto"
1 updater.go:151] no VPA objects to process

VPA allows various Update Modes, detailed here.

Let's change the VPA updateMode to “Auto” to see the processing.

As soon as you do that, you can see vpa-updater has started processing objects, and it's terminating all 3 pods.

kubectl logs -f vpa-updater-675d47464b-k7xbx
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-8zgb9 with priority 1
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-npts4 with priority 1
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-vctx5 with priority 1
1 updater.go:193] evicting pod autoscale-tester-5d6b48d64f-8zgb9
1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"autoscale-tester", Name:"autoscale-tester-5d6b48d64f-8zgb9", UID:"ed8c54c7-a87a-4c39-a000-0e74245f18c6", APIVersion:"v1", ResourceVersion:"378376", FieldPath:""}):
type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

You can also check the logs of vpa-admission-controller:

kubectl logs -f vpa-admission-controller-bbf4f4cc7-cb6pb
Sending patches: [{add /metadata/annotations map[]} {add /spec/containers/0/resources/requests/cpu 500m} {add /spec/containers/0/resources/requests/memory 500Mi} {add /spec/containers/0/resources/limits/cpu 500m} {add /spec/containers/0/resources/limits/memory 500Mi} {add /metadata/annotations/vpaUpdates Pod resources updated by autoscale-tester-recommender: container 0: cpu request, memory request, cpu limit, memory limit} {add /metadata/annotations/vpaObservedContainers autoscale-tester}]
view raw gistfile1.txt hosted with ❤ by GitHub

NOTE: Ensure that you have more than 1 running replicas. Otherwise, the pods won’t be restarted, and vpa-updater will give you this warning:

1 pods_eviction_restriction.go:209] too few replicas for ReplicaSet autoscale-tester/autoscale-tester1-7698974f6. Found 1 live pods

Now, describe the new pods created and check that the resources match the Target recommendations:

kubectl get po
NAME READY STATUS RESTARTS AGE
autoscale-tester-5d6b48d64f-5dlb7 1/1 Running 0 77s
autoscale-tester-5d6b48d64f-9wq4w 1/1 Running 0 37s
autoscale-tester-5d6b48d64f-qrlxn 1/1 Running 0 17s
kubectl describe po autoscale-tester-5d6b48d64f-5dlb7
Name: autoscale-tester-5d6b48d64f-5dlb7
Namespace: autoscale-tester
...
Limits:
cpu: 500m
memory: 500Mi
Requests:
cpu: 500m
memory: 500Mi
Environment: <none>

The Target Recommendation can not go below the minAllowed defined in the VPA spec.

 Prometheus: Memory Usage Ratio
Fig:- Prometheus: Memory Usage Ratio

E. Stress Loading Pods

Let’s recreate the deployment with memory request and limit set to 2000Mi and "--vm-bytes", "500M".

Gradually stress load one of these pods to increase its memory utilization.
You can login to the pod and run stress --vm 1 --vm-bytes 1400M --timeout 120000s.

kubectl top po
NAME CPU(cores) MEMORY(bytes)
autoscale-tester-5d6b48d64f-5dlb7 1000m 1836Mi
autoscale-tester-5d6b48d64f-9wq4w 252m 501Mi
autoscale-tester-5d6b48d64f-qrlxn 252m 501Mi

Prometheus Memory Utilized by each Replica
Fig:- Prometheus memory utilized by each Replica

You will notice that the VPA recommendation is also calculated accordingly and applied to all replicas.

kubectl describe vpa autoscale-tester-recommender
Name: autoscale-tester-recommender
Namespace: autoscale-tester
...
Recommendation:
Container Recommendations:
Container Name: autoscale-tester
Lower Bound:
Cpu: 500m
Memory: 500Mi
Target:
Cpu: 500m
Memory: 628694953
Uncapped Target:
Cpu: 49m
Memory: 628694953
Upper Bound:
Cpu: 500m
Memory: 1553712527

Limits v/s Request
VPA always works with the requests defined for a container and not the limits. So, the VPA recommendations are also applied to the container requests, and it maintains a limit to request ratio specified for all containers.

For example, if the initial container configuration defines a 100m Memory Request and 300m Memory Limit, then when the VPA target recommendation is 150m Memory, the container Memory Request will be updated to 150m and Memory Limit to 450m.

Selective Container Scaling

If you have a pod with multiple containers and you want to opt-out some of them, you can use the "Off" mode to turn off recommendations for a container.

You can also set containerName: "*" to include all containers.

spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: autoscale-tester
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: autoscale-tester
minAllowed:
cpu: "500m"
memory: "500Mi"
maxAllowed:
cpu: "4"
memory: "4Gi"
- containerName: opt-out-container
mode: "Off"

Conclusion

Both the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler serve different purposes and one can be more useful than the other depending on your application’s requirement.

The HPA can be useful when, for example, your application is serving a large number of lightweight (low resource-consuming) requests. In that case, scaling number of replicas can distribute the workload on each of the pod. The VPA, on the other hand, can be useful when your application serves heavyweight requests, which requires higher resources.

Related Articles:

1. A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

2. Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Autoscaling in Kubernetes using HPA and VPA

Autoscaling, a key feature of Kubernetes, lets you improve the resource utilization of your cluster by automatically adjusting the application’s resources or replicas depending on the load at that time.

This blog talks about Pod Autoscaling in Kubernetes and how to set up and configure autoscalers to optimize the resource utilization of your application.

Horizontal Pod Autoscaling

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) scales the number of pods of a replica-set/ deployment/ statefulset based on per-pod metrics received from resource metrics API (metrics.k8s.io) provided by metrics-server, the custom metrics API (custom.metrics.k8s.io), or the external metrics API (external.metrics.k8s.io).

Horizontal Pod Autoscaling
Fig:- Horizontal Pod Autoscaling

Prerequisite

Verify that the metrics-server is already deployed and running using the command below, or deploy it using instructions here.

kubectl get deployment metrics-server -n kube-system

HPA using Multiple Resource Metrics

HPA fetches per-pod resource metrics (like CPU, memory) from the resource metrics API and calculates the current metric value based on the mean values of all targeted pods. It compares the current metric value with the target metric value specified in the HPA spec and produces a ratio used to scale the number of desired replicas.

A. Setup: Create a Deployment and HPA resource

In this blog post, I have used the config below to create a deployment of 3 replicas, with some memory load defined by “--vm-bytes", "850M”.

apiVersion: apps/v1
kind: Deployment
metadata:
name: autoscale-tester
spec:
replicas: 3
selector:
matchLabels:
app: autoscale-tester
template:
metadata:
labels:
app: autoscale-tester
spec:
containers:
- args: [ "--vm", "1", "--vm-bytes", "850M", "--vm-hang", "1"]
command:
- stress
image: polinux/stress
name: autoscale-tester
resources:
limits:
cpu: "1"
memory: 1000Mi
requests:
cpu: "1"
memory: 1000Mi

NOTE: It’s recommended not to use HPA and VPA on the same pods or deployments.

kubectl top po
NAME CPU(cores) MEMORY(bytes)
autoscale-tester-878b8c6c8-42gmk 326m 853Mi
autoscale-tester-878b8c6c8-gp45f 410m 852Mi
autoscale-tester-878b8c6c8-tz4mg 388m 852Mi

Lets create an HPA resource for this deployment with multiple metric blocks defined. The HPA will consider each metric one-by-one and calculate the desired replica counts based on each of the metrics, and then select the one with the highest replica count.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: autoscale-tester
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: autoscale-tester
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 500Mi
view raw hpa.yaml hosted with ❤ by GitHub

  • We have defined the minimum number  of replicas HPA can scale down to as 1 and the maximum number that it can scale up to as 10.
  • Target Average Utilization and Target Average Values implies that the HPA should scale the replicas up/down to keep the Current Metric Value equal or closest to Target Metric Value.

B. Understanding the HPA Algorithm

kubectl describe hpa autoscale-tester
Name: autoscale-tester
Namespace: autoscale-tester
...
Metrics: ( current / target )
resource memory on pods: 894188202666m / 500Mi
resource cpu on pods (as a percentage of request): 36% (361m) / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 6 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 6
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 7s horizontal-pod-autoscaler New size: 6; reason: memory resource above target
view raw describe-hpa.sh hosted with ❤ by GitHub

  • HPA calculates pod utilization as total usage of all containers in the pod divided by total request. It looks at all containers individually and returns if container doesn't have request.
  • The calculated  Current Metric Value for memory, i,e., 894188202666m, is higher than the Target Average Value of 500Mi, so the replicas need to be scaled up.
  • The calculated  Current Metric Value for CPU i.e., 36%, is lower than the Target Average Utilization of 50, so  hence the replicas need to be scaled down.
  • Replicas are calculated based on both metrics and the highest replica count selected. So, the replicas are scaled up to 6 in this case.

HPA using Custom metrics

We will use the prometheus-adapter resource to expose custom application metrics to custom.metrics.k8s.io/v1beta1, which are retrieved by HPA. By defining our own metrics through the adapter’s configuration, we can let HPA perform scaling based on our custom metrics.

A. Setup: Install Prometheus Adapter

Create prometheus-adapter.yaml with the content below:

prometheus:
url: http://prometheus-server
port: 0
image:
tag: latest
rules:
custom:
- seriesQuery: 'container_network_receive_packets_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "container_network_receive_packets_total"
as: "packets_in"
metricsQuery: <<.Series>>{<<.LabelMatchers>>}

helm install stable/prometheus -n prometheus --namespace prometheus
helm install stable/prometheus-adapter -n prometheus-adapter --namespace prometheus -f prometheus-adapter.yaml

Once the charts are deployed, verify the metrics are exposed at v1beta1.custom.metrics.k8s.io:

kubectl get apiservice
NAME SERVICE AVAILABLE AGE
v1beta1.custom.metrics.k8s.io prometheus/prometheus-adapter True 19m
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/*/packets_in | jq
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/%2A/packets_in"
},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "autoscale-hpa",
"name": "autoscale-tester-878b8c6c8-42gmk",
"apiVersion": "/v1"
},
"metricName": "packets_in",
"timestamp": "2020-07-31T05:59:33Z",
"value": "33",
"selector": null
},
{
"describedObject": {
"kind": "Pod",
"namespace": "autoscale-hpa",
"name": "autoscale-tester-878b8c6c8-hfts8",
"apiVersion": "/v1"
},
"metricName": "packets_in",
"timestamp": "2020-07-31T05:59:33Z",
"value": "11",
"selector": null
},
{
"describedObject": {
"kind": "Pod",
"namespace": "autoscale-hpa",
"name": "autoscale-tester-878b8c6c8-rb9v2",
"apiVersion": "/v1"
},
"metricName": "packets_in",
"timestamp": "2020-07-31T05:59:33Z",
"value": "10",
"selector": null
}
]
}

You can see the metrics value of all the replicas in the output.

B. Understanding Prometheus Adapter Configuration

The adapter considers metrics defined with the parameters below:

1. seriesQuery tells the Prometheus Metric name to the adapter

2. resources tells which Kubernetes resources each metric is associated with or which labels does the metric include, e.g., namespace, pod etc.

3. metricsQuery is the actual Prometheus query that needs to be performed to calculate the actual values.

4. name with which the metric should be exposed to the custom metrics API

For instance, if we want to calculate the rate of container_network_receive_packets_total, we will need to write this query in Prometheus UI:

sum(rate(container_network_receive_packets_total{namespace="autoscale-tester",pod=~"autoscale-tester.*"}[10m])) by (pod)

This query is represented as below in the adapter configuration:

metricsQuery: 'sum(rate(<<.series>>{<<.labelmatchers>>}10m])) by (<<.groupby>>)'</.groupby></.labelmatchers></.series>

C. Create an HPA resource

Now, let's create an HPA resource with the pod metric packets_in using the config below, and then describe the HPA resource.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: autoscale-tester
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: autoscale-tester
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: packets_in
target:
type: AverageValue
averageValue: 50

kubectl describe hpa autoscale-tester
Name: autoscale-tester
Namespace: autoscale-tester
...
Metrics: ( current / target )
"packets_in" on pods: 18666m / 50
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric packets_in
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 2s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
Normal SuccessfulRescale 2m51s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
kubectl describe hpa autoscale-tester
Name: autoscale-tester
Namespace: autoscale-tester
...
Metrics: ( current / target )
"packets_in" on pods: 18666m / 50
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric packets_in
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 2s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
Normal SuccessfulRescale 2m51s horizontal-pod-autoscaler New size: 1; reason: All metrics below target

Here, the current calculated metric value is 18666m. The m represents milli-units. So, for example, 18666m means 18.666 which is what we expect ((33 + 11 + 10 )/3 = 18.666). Since it's less than the target average value (i.e., 50), the HPA scales down the replicas to make the Current Metric Value : Target Metric Value ratio closest to 1. Hence, replicas are scaled down to 2 and later to 1.

Fig.2 Prometheus :container_network_receive_packets_total{namespace=”autoscale-tester}
Fig:- container_network_receive_packets_total


Prometheus: Ratio of avg(container_network_receive_packets_total{namespace=”autoscale-tester}) :Target Average Value
Fig:- Ratio to Target value

Vertical Pod Autoscaling

What is Vertical Pod Autoscaler?

Vertical Pod autoscaling (VPA) ensures that a container’s resources are not under- or over-utilized. It recommends optimized CPU and memory requests/limits values, and can also automatically update them for you so that the cluster resources are efficiently used.

Fig:- Vertical Pod Autoscaling

Architecture

VPA consists of 3 components:

  • VPA admission controller
    Once you deploy and enable the Vertical Pod Autoscaler in your cluster, every pod submitted to the cluster goes through this webhook, which checks whether a VPA object is referencing it.
  • VPA recommender
    The recommender pulls the current and past resource consumption (CPU and memory) data for each container from metrics-server running in the cluster and provides optimal resource recommendations based on it, so that a container uses only what it needs.
  • VPA updater
    The updater checks at regular intervals if a pod is running within the recommended range. Otherwise, it accepts it for update, and the pod is evicted by the VPA updater to apply resource recommendation.

Installation

If you are on Google Cloud Platform, you can simply enable vertical-pod-autoscaling:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling
view raw enable-vpa.sh hosted with ❤ by GitHub

To install it manually follow below steps:

  • Verify that the metrics-server deployment is running, or deploy it using instructions here.

kubectl get deployment metrics-server -n kube-system

  • Also, verify the API below is enabled:

kubectl api-versions | grep admissionregistration
admissionregistration.k8s.io/v1beta1
view raw check-api.sh hosted with ❤ by GitHub

  • Clone the kubernetes/autoscaler GitHub repository, and then deploy the Vertical Pod Autoscaler with the following command.

git clone https://github.com/kubernetes/autoscaler.git
./autoscaler/vertical-pod-autoscaler/hack/vpa-up.sh

Verify that the Vertical Pod Autoscaler pods are up and running:

kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
vpa-admission-controller-68c748777d-ppspd 1/1 Running 0 7s
vpa-recommender-6fc8c67d85-gljpl 1/1 Running 0 8s
vpa-updater-786b96955c-bgp9d 1/1 Running 0 8s
kubectl get crd
verticalpodautoscalers.autoscaling.k8s.io
view raw get-vpa-pods.sh hosted with ❤ by GitHub

VPA using Resource Metrics

A. Setup: Create a Deployment and VPA resource

Use the same deployment config to create a new deployment with "--vm-bytes", "850M". Then create a VPA resource in Recommendation Mode with updateMode : Off

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
name: autoscale-tester-recommender
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: autoscale-tester
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: autoscale-tester
minAllowed:
cpu: "500m"
memory: "500Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
view raw vpa.yaml hosted with ❤ by GitHub

  • minAllowed is an optional parameter that specifies the minimum CPU request and memory request allowed for the container. 
  • maxAllowed is an optional parameter that specifies the maximum CPU request and memory request allowed for the container.

B. Check the Pod’s Resource Utilization

Check the resource utilization of the pods. Below, you can see only ~50 Mi memory is being used out of 1000Mi and only ~30m CPU out of 1000m. This clearly indicates that the pod resources are underutilized.

Kubectl top po
NAME CPU(cores) MEMORY(bytes)
autoscale-tester-5d6b48d64f-8zgb9 39m 51Mi
autoscale-tester-5d6b48d64f-npts4 32m 50Mi
autoscale-tester-5d6b48d64f-vctx5 35m 50Mi
view raw top.sh hosted with ❤ by GitHub

If you describe the VPA resource, you can see the Recommendations provided. (It may take some time to show them.)

kubectl describe vpa autoscale-tester-recommender
Name: autoscale-tester-recommender
Namespace: autoscale-tester
...
Recommendation:
Container Recommendations:
Container Name: autoscale-tester
Lower Bound:
Cpu: 500m
Memory: 500Mi
Target:
Cpu: 500m
Memory: 500Mi
Uncapped Target:
Cpu: 93m
Memory: 262144k
Upper Bound:
Cpu: 4
Memory: 4Gi
view raw describe-vpa.sh hosted with ❤ by GitHub

C. Understand the VPA recommendations

Target: The recommended CPU request and memory request for the container that will be applied to the pod by VPA.

Uncapped Target: The recommended CPU request and memory request for the container if you didn’t configure upper/lower limits in the VPA definition. These values will not be applied to the pod. They’re used only as a status indication.

Lower Bound: The minimum recommended CPU request and memory request for the container. There is a --pod-recommendation-min-memory-mb flag that determines the minimum amount of memory the recommender will set—it defaults to 250MiB.

Upper Bound: The maximum recommended CPU request and memory request for the container.  It helps the VPA updater avoid eviction of pods that are close to the recommended target values. Eventually, the Upper Bound is expected to reach close to target recommendation.

Recommendation:
Container Recommendations:
Container Name: autoscale-tester
Lower Bound:
Cpu: 500m
Memory: 500Mi
Target:
Cpu: 500m
Memory: 500Mi
Uncapped Target:
Cpu: 93m
Memory: 262144k
Upper Bound:
Cpu: 500m
Memory: 1274858485

D. VPA processing with Update Mode Off/Auto

Now, if you check the logs of vpa-updater, you can see it's not processing VPA objects as the Update Mode is set as Off.

kubectl logs -f vpa-updater-675d47464b-k7xbx
1 updater.go:135] skipping VPA object autoscale-tester-recommender because its mode is not "Recreate" or "Auto"
1 updater.go:151] no VPA objects to process

VPA allows various Update Modes, detailed here.

Let's change the VPA updateMode to “Auto” to see the processing.

As soon as you do that, you can see vpa-updater has started processing objects, and it's terminating all 3 pods.

kubectl logs -f vpa-updater-675d47464b-k7xbx
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-8zgb9 with priority 1
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-npts4 with priority 1
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-vctx5 with priority 1
1 updater.go:193] evicting pod autoscale-tester-5d6b48d64f-8zgb9
1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"autoscale-tester", Name:"autoscale-tester-5d6b48d64f-8zgb9", UID:"ed8c54c7-a87a-4c39-a000-0e74245f18c6", APIVersion:"v1", ResourceVersion:"378376", FieldPath:""}):
type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

You can also check the logs of vpa-admission-controller:

kubectl logs -f vpa-admission-controller-bbf4f4cc7-cb6pb
Sending patches: [{add /metadata/annotations map[]} {add /spec/containers/0/resources/requests/cpu 500m} {add /spec/containers/0/resources/requests/memory 500Mi} {add /spec/containers/0/resources/limits/cpu 500m} {add /spec/containers/0/resources/limits/memory 500Mi} {add /metadata/annotations/vpaUpdates Pod resources updated by autoscale-tester-recommender: container 0: cpu request, memory request, cpu limit, memory limit} {add /metadata/annotations/vpaObservedContainers autoscale-tester}]
view raw gistfile1.txt hosted with ❤ by GitHub

NOTE: Ensure that you have more than 1 running replicas. Otherwise, the pods won’t be restarted, and vpa-updater will give you this warning:

1 pods_eviction_restriction.go:209] too few replicas for ReplicaSet autoscale-tester/autoscale-tester1-7698974f6. Found 1 live pods

Now, describe the new pods created and check that the resources match the Target recommendations:

kubectl get po
NAME READY STATUS RESTARTS AGE
autoscale-tester-5d6b48d64f-5dlb7 1/1 Running 0 77s
autoscale-tester-5d6b48d64f-9wq4w 1/1 Running 0 37s
autoscale-tester-5d6b48d64f-qrlxn 1/1 Running 0 17s
kubectl describe po autoscale-tester-5d6b48d64f-5dlb7
Name: autoscale-tester-5d6b48d64f-5dlb7
Namespace: autoscale-tester
...
Limits:
cpu: 500m
memory: 500Mi
Requests:
cpu: 500m
memory: 500Mi
Environment: <none>

The Target Recommendation can not go below the minAllowed defined in the VPA spec.

 Prometheus: Memory Usage Ratio
Fig:- Prometheus: Memory Usage Ratio

E. Stress Loading Pods

Let’s recreate the deployment with memory request and limit set to 2000Mi and "--vm-bytes", "500M".

Gradually stress load one of these pods to increase its memory utilization.
You can login to the pod and run stress --vm 1 --vm-bytes 1400M --timeout 120000s.

kubectl top po
NAME CPU(cores) MEMORY(bytes)
autoscale-tester-5d6b48d64f-5dlb7 1000m 1836Mi
autoscale-tester-5d6b48d64f-9wq4w 252m 501Mi
autoscale-tester-5d6b48d64f-qrlxn 252m 501Mi

Prometheus Memory Utilized by each Replica
Fig:- Prometheus memory utilized by each Replica

You will notice that the VPA recommendation is also calculated accordingly and applied to all replicas.

kubectl describe vpa autoscale-tester-recommender
Name: autoscale-tester-recommender
Namespace: autoscale-tester
...
Recommendation:
Container Recommendations:
Container Name: autoscale-tester
Lower Bound:
Cpu: 500m
Memory: 500Mi
Target:
Cpu: 500m
Memory: 628694953
Uncapped Target:
Cpu: 49m
Memory: 628694953
Upper Bound:
Cpu: 500m
Memory: 1553712527

Limits v/s Request
VPA always works with the requests defined for a container and not the limits. So, the VPA recommendations are also applied to the container requests, and it maintains a limit to request ratio specified for all containers.

For example, if the initial container configuration defines a 100m Memory Request and 300m Memory Limit, then when the VPA target recommendation is 150m Memory, the container Memory Request will be updated to 150m and Memory Limit to 450m.

Selective Container Scaling

If you have a pod with multiple containers and you want to opt-out some of them, you can use the "Off" mode to turn off recommendations for a container.

You can also set containerName: "*" to include all containers.

spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: autoscale-tester
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: autoscale-tester
minAllowed:
cpu: "500m"
memory: "500Mi"
maxAllowed:
cpu: "4"
memory: "4Gi"
- containerName: opt-out-container
mode: "Off"

Conclusion

Both the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler serve different purposes and one can be more useful than the other depending on your application’s requirement.

The HPA can be useful when, for example, your application is serving a large number of lightweight (low resource-consuming) requests. In that case, scaling number of replicas can distribute the workload on each of the pod. The VPA, on the other hand, can be useful when your application serves heavyweight requests, which requires higher resources.

Related Articles:

1. A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

2. Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings