Prometheus adapter로 구현한 custom metrics을 통해 HorizontalPodAutoscaler를 구현해본다.
prometheus adapter Rule 이해 - Deep Dive
https://blog.encicle.com/prometheus-adapter-rule-ihae-deep-dive
위 예제를 진행했다면, traefik의 메트릭에 대해서 아래와 같이 3개의 resource를 분리해서 커스텀 메트릭을 저장하고 있다.
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ |jq .resources[].name
"ingresses.networking.k8s.io/traefik_service_requests_sum_incr_1m"
"namespaces/traefik_service_requests_sum_incr_1m"
"pods/traefik_service_requests_sum_incr_1m"
traefix Deployment에 대해서 pods/traefik_service_requests_sum_incr_1m
메트릭 추가하기
아래와 같이 메트릭을 호출해서 저장되는 메트릭에 대한 정확한 정보 확인한다.
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/traefik_service_requests_sum_incr_1m | jq
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "default",
"name": "traefik-6cc49fcdfb-762cm",
"apiVersion": "/v1"
},
"metricName": "traefik_service_requests_sum_incr_1m",
"timestamp": "2024-11-18T07:34:58Z",
"value": "0",
"selector": null
},
...
}
아래와 같이 위 메트릭을 참조할 수 있게, HorizontalPodAutoscaler 설정을 적용해서 배포한다.
적용 내용 - 메트릭의 "kind" -> "type"으로 적용
- type: Pods
pods:
metric:
name: traefik_service_requests_sum_incr_1m
target:
type: AverageValue
averageValue: 300
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: traefik
namespace: default
spec:
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: traefik_service_requests_sum_incr_1m
target:
type: AverageValue
averageValue: 300
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: traefik
적용 후, HorizontalPodAutoscaler 확인해보면 아래와 같이 확인된다.
# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
...
traefik Deployment/traefik 6%/80%, 0/300 1 20 1 5d2h
자세한 내용을 describe로 확인해보면 정확하게 표현된다.
# kubectl describe hpa traefik
Name: traefik
Namespace: default
Labels: app.kubernetes.io/instance=traefik-default
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-33.0.0
Annotations: <none>
CreationTimestamp: Wed, 13 Nov 2024 14:03:48 +0900
Reference: Deployment/traefik
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 5% (5m) / 80%
"traefik_service_requests_sum_incr_1m" on pods: 0 / 300
Min replicas: 1
Max replicas: 20
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 4m5s (x2 over 5d) horizontal-pod-autoscaler New size: 1; reason: All metrics below target
kubernetes에 traefik로 Ingress를 구성하고 서비스 구성하기 - https://blog.encicle.com/kubernetese-traefiklo-ingressleul-guseonghago-seobiseu-guseonghagi
위와 같이 서비스를 구성해서 echo pod로 부하를 발생 시킬 경우 traefik의 traefik_service_requests_sum_incr_1m 메트릭의 부하가 발생해서 POD가 증가하기 시작한다.
# kubectl describe hpa traefik
Name: traefik
Namespace: default
Labels: app.kubernetes.io/instance=traefik-default
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-33.0.0
Annotations: <none>
CreationTimestamp: Wed, 13 Nov 2024 14:03:48 +0900
Reference: Deployment/traefik
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 345% (345m) / 80%
"traefik_service_requests_sum_incr_1m" on pods: 19702666m / 300
Min replicas: 1
Max replicas: 20
Deployment pods: 4 current / 8 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 8
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric traefik_service_requests_sum_incr_1m
ScalingLimited True ScaleUpLimit the desired replica count is increasing faster than the maximum scale rate
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 5m51s (x2 over 5d) horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Normal SuccessfulRescale 20s horizontal-pod-autoscaler New size: 4; reason: pods metric traefik_service_requests_sum_incr_1m above target
Normal SuccessfulRescale 5s horizontal-pod-autoscaler New size: 8; reason: pods metric traefik_service_requests_sum_incr_1m above target
ingress 수집 메트릭을 통한 echo-server deployment HorizontalPodAutoscaler 구현
traefik의 메트릭을 수집해서 메트릭을 수집하고 있다.
traefik 메트릭은 traefik deployment 데이터이기 때문에 echo-server deployment와 연계가 되지 않는다.
그러나 ingress로 리소스 분리된 메트릭은 연계가 가능하다.
echo-server deployment와 연동된 ingress의 데이트는 아래와 같다.
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/ingresses.networking.k8s.io/echo-server-ing/traefik_service_requests_sum_incr_1m | jq
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"describedObject": {
"kind": "Ingress",
"namespace": "default",
"name": "echo-server-ing",
"apiVersion": "networking.k8s.io/v1"
},
"metricName": "traefik_service_requests_sum_incr_1m",
"timestamp": "2024-11-18T07:51:44Z",
"value": "0",
"selector": null
}
]
}
ingress를 HorizontalPodAutoscaler로 적용하는 방법은 위의 POD 적용 방법과 조금 다르다.
- type: Object
object:
metric:
name: traefik_service_requests_sum_incr_1m
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: echo-server-ing
target:
type: AverageValue
averageValue: 200
위와 같이 API에서 추출된 apiVersion
, apiVersion
, kind
, metric
정보 모두 필요하다.
아래와 같이 구성해서 적용한다.
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: echo-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: echo-server-dpm
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Object
object:
metric:
name: traefik_service_requests_sum_incr_1m
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: echo-server-ing
target:
type: AverageValue
averageValue: 200
적용 후, 아래와 같이 나오면 정상적으로 연동된 것이다.
# kubectl describe hpa echo-server-hpa
Name: echo-server-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 13 Nov 2024 14:04:42 +0900
Reference: Deployment/echo-server-dpm
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 1% (1m) / 80%
"traefik_service_requests_sum_incr_1m" on Ingress/echo-server-ing (target average value): 0 / 200
Min replicas: 2
Max replicas: 20
Deployment pods: 2 current / 2 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooFewReplicas the desired replica count is less than the minimum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 12m (x4 over 5d1h) horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
부하를 발생시키면 아래와 같이 POD가 증가된다.
# kubectl describe hpa echo-server-hpa
Name: echo-server-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 13 Nov 2024 14:04:42 +0900
Reference: Deployment/echo-server-dpm
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 4% (4m) / 80%
"traefik_service_requests_sum_incr_1m" on Ingress/echo-server-ing (target average value): 381778m / 200
Min replicas: 2
Max replicas: 20
Deployment pods: 3 current / 6 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 6
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric traefik_service_requests_sum_incr_1m(nil)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 6m18s (x4 over 5d1h) horizontal-pod-autoscaler New size: 2; reason: All metrics below target
Normal SuccessfulRescale 33s horizontal-pod-autoscaler New size: 3; reason: external metric traefik_service_requests_sum_incr_1m(nil) above target
Normal SuccessfulRescale 2s horizontal-pod-autoscaler New size: 6; reason: external metric traefik_service_requests_sum_incr_1m(nil) above target