Prometheus adapter로 구현한 custom metrics을 통해 HorizontalPodAutoscaler를 구현해본다.

 

prometheus adapter Rule 이해 - Deep Dive
https://blog.encicle.com/prometheus-adapter-rule-ihae-deep-dive

 

위 예제를 진행했다면, traefik의 메트릭에 대해서 아래와 같이 3개의 resource를 분리해서 커스텀 메트릭을 저장하고 있다.

# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ |jq .resources[].name
"ingresses.networking.k8s.io/traefik_service_requests_sum_incr_1m"
"namespaces/traefik_service_requests_sum_incr_1m"
"pods/traefik_service_requests_sum_incr_1m"

 

traefix Deployment에 대해서 pods/traefik_service_requests_sum_incr_1m 메트릭 추가하기

아래와 같이 메트릭을 호출해서 저장되는 메트릭에 대한 정확한 정보 확인한다.

# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/traefik_service_requests_sum_incr_1m  | jq
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "traefik-6cc49fcdfb-762cm",
        "apiVersion": "/v1"
      },
      "metricName": "traefik_service_requests_sum_incr_1m",
      "timestamp": "2024-11-18T07:34:58Z",
      "value": "0",
      "selector": null
    },
    ...
}

아래와 같이 위 메트릭을 참조할 수 있게, HorizontalPodAutoscaler 설정을 적용해서 배포한다.

적용 내용 - 메트릭의 "kind" -> "type"으로 적용

  - type: Pods
    pods:
      metric:
        name: traefik_service_requests_sum_incr_1m
      target:
        type: AverageValue
        averageValue: 300
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: traefik
  namespace: default
spec:
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: traefik_service_requests_sum_incr_1m
      target:
        type: AverageValue
        averageValue: 300
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: traefik

적용 후, HorizontalPodAutoscaler 확인해보면 아래와 같이 확인된다.

# kubectl get hpa
NAME              REFERENCE                    TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
...
traefik           Deployment/traefik           6%/80%, 0/300   1         20        1          5d2h

자세한 내용을 describe로 확인해보면 정확하게 표현된다.

# kubectl describe hpa traefik
Name:                                                  traefik
Namespace:                                             default
Labels:                                                app.kubernetes.io/instance=traefik-default
                                                       app.kubernetes.io/managed-by=Helm
                                                       app.kubernetes.io/name=traefik
                                                       helm.sh/chart=traefik-33.0.0
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 13 Nov 2024 14:03:48 +0900
Reference:                                             Deployment/traefik
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  5% (5m) / 80%
  "traefik_service_requests_sum_incr_1m" on pods:      0 / 300
Min replicas:                                          1
Max replicas:                                          20
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age                From                       Message
  ----    ------             ----               ----                       -------
  Normal  SuccessfulRescale  4m5s (x2 over 5d)  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

 

kubernetes에 traefik로 Ingress를 구성하고 서비스 구성하기 - https://blog.encicle.com/kubernetese-traefiklo-ingressleul-guseonghago-seobiseu-guseonghagi

위와 같이 서비스를 구성해서 echo pod로 부하를 발생 시킬 경우 traefik의 traefik_service_requests_sum_incr_1m 메트릭의 부하가 발생해서 POD가 증가하기 시작한다.

# kubectl describe hpa traefik
Name:                                                  traefik
Namespace:                                             default
Labels:                                                app.kubernetes.io/instance=traefik-default
                                                       app.kubernetes.io/managed-by=Helm
                                                       app.kubernetes.io/name=traefik
                                                       helm.sh/chart=traefik-33.0.0
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 13 Nov 2024 14:03:48 +0900
Reference:                                             Deployment/traefik
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  345% (345m) / 80%
  "traefik_service_requests_sum_incr_1m" on pods:      19702666m / 300
Min replicas:                                          1
Max replicas:                                          20
Deployment pods:                                       4 current / 8 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    SucceededRescale  the HPA controller was able to update the target scale to 8
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric traefik_service_requests_sum_incr_1m
  ScalingLimited  True    ScaleUpLimit      the desired replica count is increasing faster than the maximum scale rate
Events:
  Type    Reason             Age                 From                       Message
  ----    ------             ----                ----                       -------
  Normal  SuccessfulRescale  5m51s (x2 over 5d)  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
  Normal  SuccessfulRescale  20s                 horizontal-pod-autoscaler  New size: 4; reason: pods metric traefik_service_requests_sum_incr_1m above target
  Normal  SuccessfulRescale  5s                  horizontal-pod-autoscaler  New size: 8; reason: pods metric traefik_service_requests_sum_incr_1m above target

 

ingress 수집 메트릭을 통한 echo-server deployment HorizontalPodAutoscaler 구현

traefik의 메트릭을 수집해서 메트릭을 수집하고 있다.

traefik 메트릭은 traefik deployment 데이터이기 때문에 echo-server deployment와 연계가 되지 않는다.
그러나 ingress로 리소스 분리된 메트릭은 연계가 가능하다.

echo-server deployment와 연동된 ingress의 데이트는 아래와 같다.

# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/ingresses.networking.k8s.io/echo-server-ing/traefik_service_requests_sum_incr_1m | jq
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Ingress",
        "namespace": "default",
        "name": "echo-server-ing",
        "apiVersion": "networking.k8s.io/v1"
      },
      "metricName": "traefik_service_requests_sum_incr_1m",
      "timestamp": "2024-11-18T07:51:44Z",
      "value": "0",
      "selector": null
    }
  ]
}

ingress를 HorizontalPodAutoscaler로 적용하는 방법은 위의 POD 적용 방법과 조금 다르다.

  - type: Object
    object:
      metric:
        name: traefik_service_requests_sum_incr_1m
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: echo-server-ing
      target:
        type: AverageValue
        averageValue: 200

위와 같이 API에서 추출된 apiVersion, apiVersion, kind, metric 정보 모두 필요하다.

아래와 같이 구성해서 적용한다.

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: echo-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: echo-server-dpm
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Object
    object:
      metric:
        name: traefik_service_requests_sum_incr_1m
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: echo-server-ing
      target:
        type: AverageValue
        averageValue: 200

적용 후, 아래와 같이 나오면 정상적으로 연동된 것이다.

# kubectl describe hpa echo-server-hpa
Name:                                                                                        echo-server-hpa
Namespace:                                                                                   default
Labels:                                                                                      <none>
Annotations:                                                                                 <none>
CreationTimestamp:                                                                           Wed, 13 Nov 2024 14:04:42 +0900
Reference:                                                                                   Deployment/echo-server-dpm
Metrics:                                                                                     ( current / target )
  resource cpu on pods  (as a percentage of request):                                        1% (1m) / 80%
  "traefik_service_requests_sum_incr_1m" on Ingress/echo-server-ing (target average value):  0 / 200
Min replicas:                                                                                2
Max replicas:                                                                                20
Deployment pods:                                                                             2 current / 2 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:
  Type    Reason             Age                   From                       Message
  ----    ------             ----                  ----                       -------
  Normal  SuccessfulRescale  12m (x4 over 5d1h)    horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target

부하를 발생시키면 아래와 같이 POD가 증가된다.

# kubectl describe hpa echo-server-hpa
Name:                                                                                        echo-server-hpa
Namespace:                                                                                   default
Labels:                                                                                      <none>
Annotations:                                                                                 <none>
CreationTimestamp:                                                                           Wed, 13 Nov 2024 14:04:42 +0900
Reference:                                                                                   Deployment/echo-server-dpm
Metrics:                                                                                     ( current / target )
  resource cpu on pods  (as a percentage of request):                                        4% (4m) / 80%
  "traefik_service_requests_sum_incr_1m" on Ingress/echo-server-ing (target average value):  381778m / 200
Min replicas:                                                                                2
Max replicas:                                                                                20
Deployment pods:                                                                             3 current / 6 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 6
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from external metric traefik_service_requests_sum_incr_1m(nil)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age                   From                       Message
  ----    ------             ----                  ----                       -------
  Normal  SuccessfulRescale  6m18s (x4 over 5d1h)  horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
  Normal  SuccessfulRescale  33s                   horizontal-pod-autoscaler  New size: 3; reason: external metric traefik_service_requests_sum_incr_1m(nil) above target
  Normal  SuccessfulRescale  2s                    horizontal-pod-autoscaler  New size: 6; reason: external metric traefik_service_requests_sum_incr_1m(nil) above target