Spring Boot Actuator 健康探针集成指南

发表于 2026-01-29 分类于 Java ， JavaJar ， Container ， Kubernets 阅读次数： Valine：

一、背景

1.1 什么是健康探针？

在云原生和微服务架构中，健康探针（Health Probes） 是容器编排平台（如 Kubernetes）用来监控应用程序运行状态的重要机制。通过定期调用应用暴露的健康检查端点，平台可以判断应用是否正常运行，并据此做出相应的调度决策。

Kubernetes 支持三种类型的探针：

探针类型	作用	失败后果
Liveness Probe（存活探针）	检测应用是否存活	重启容器
Readiness Probe（就绪探针）	检测应用是否准备好接收流量	从 Service 端点移除
Startup Probe（启动探针）	检测应用是否完成启动	阻止其他探针直到启动成功

1.2 为什么需要健康探针？

在生产环境中，应用可能会遇到各种异常情况：

死锁：应用进程存在，但无法处理请求
内存泄漏：应用响应越来越慢，最终无法服务
依赖服务故障：数据库连接断开、下游服务不可用
启动缓慢：应用需要较长时间初始化

如果没有健康探针，Kubernetes 只能通过进程是否存在来判断应用状态，无法感知这些 "僵尸" 状态。引入健康探针后，平台可以：

自动重启异常容器（Liveness 失败）
避免将流量发送到未就绪的实例（Readiness 失败）
实现零停机滚动更新

1.3 技术选型

对于 Spring Boot 项目，官方提供了 Spring Boot Actuator 模块，内置了完善的健康检查机制，并原生支持 Kubernetes 探针。相比自己实现 /health 接口，使用 Actuator 具有以下优势：

✅ 开箱即用，配置简单
✅ 自动集成数据库、Redis、MQ 等组件的健康检查
✅ 原生支持 Kubernetes Liveness/Readiness 探针
✅ 可扩展，支持自定义健康检查器
✅ 与 Spring Boot 版本同步更新，稳定可靠

二、项目集成

2.1 添加 Maven 依赖

在 pom.xml 文件的 <dependencies> 节点中添加 Actuator 依赖：

<!-- Spring Boot Actuator for health probes -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

说明：由于项目已继承 spring-boot-starter-parent（版本 2.3.5.RELEASE），无需显式指定版本号，Maven 会自动使用与父 POM 一致的版本。

2.2 配置 application.yml

在 src/main/resources/application.yml 中添加以下配置：

# Actuator 健康探针配置
management:
  endpoints:
    web:
      exposure:
        include: health,info
      base-path: /actuator
  endpoint:
    health:
      probes:
        enabled: true
      show-details: always
      group:
        liveness:
          include: livenessState
        readiness:
          include: readinessState,db
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true

三、配置详解

3.1 端点暴露配置

management:
  endpoints:
    web:
      exposure:
        include: health,info
      base-path: /actuator

配置项	说明
`exposure.include`	指定通过 HTTP 暴露的端点，默认只有 `health` 和 `info`。出于安全考虑，不建议暴露 `*`（所有端点）
`base-path`	端点的基础路径，默认为 `/actuator`，可根据需要修改

3.2 健康端点配置

endpoint:
  health:
    probes:
      enabled: true
    show-details: always
    group:
      liveness:
        include: livenessState
      readiness:
        include: readinessState,db

配置项	说明
`probes.enabled`	启用 Kubernetes 探针支持，会自动暴露 `/actuator/health/liveness` 和 `/actuator/health/readiness` 端点
`show-details`	健康详情显示级别：`never`（从不）、`when-authorized`（授权后）、`always`（始终）。生产环境建议使用 `when-authorized`
`group.liveness.include`	存活探针包含的检查项，`livenessState` 表示只检查应用本身是否存活
`group.readiness.include`	就绪探针包含的检查项，`readinessState` + `db` 表示检查应用就绪状态和数据库连接

3.3 健康状态指示器配置

health:
  livenessstate:
    enabled: true
  readinessstate:
    enabled: true

这两个配置启用了 Spring Boot 内置的存活状态和就绪状态指示器。当应用运行在非 Kubernetes 环境时，需要显式启用这些指示器才能使用 /health/liveness 和 /health/readiness 端点。

3.4 Liveness vs Readiness 设计原则

正确区分两种探针的检查内容至关重要：

探针	应该检查	不应该检查
Liveness	应用本身是否陷入不可恢复的状态（如死锁）	外部依赖（数据库、下游服务）
Readiness	应用是否能够处理请求、依赖服务是否可用	-

为什么 Liveness 不应检查外部依赖？

假设 Liveness 探针包含数据库检查，当数据库短暂故障时：

所有 Pod 的 Liveness 检查失败
Kubernetes 重启所有 Pod
Pod 重启后数据库仍不可用，继续失败
形成重启风暴，加剧系统不稳定

正确做法是：Liveness 只检查应用本身，Readiness 检查依赖。数据库故障时，Pod 被从 Service 移除，但不会重启，一旦数据库恢复，流量自动恢复。

四、部署流程

4.1 第一步：发布 Java 应用

更新依赖
```
mvn clean install
```

本地验证
启动应用后，访问以下端点确认配置生效：

# 综合健康检查
curl http://localhost:8080/actuator/health

# 存活探针
curl http://localhost:8080/actuator/health/liveness

# 就绪探针
curl http://localhost:8080/actuator/health/readiness

预期响应

/actuator/health/liveness 响应示例：

{
  "status": "UP",
  "components": {
    "livenessState": {
      "status": "UP"
    }
  }
}

/actuator/health/readiness 响应示例：

{
  "status": "UP",
  "components": {
    "db": {
      "status": "UP",
      "details": {
        "database": "MySQL",
        "validationQuery": "isValid()"
      }
    },
    "readinessState": {
      "status": "UP"
    }
  }
}

构建并推送镜像

# 打包
mvn package -DskipTests

# 构建 Docker 镜像
docker build -t your-registry/xxx-xxxxx-xxxxxx:v1.1.0 .

# 推送到镜像仓库
docker push your-registry/xxx-xxxxx-xxxxxx:v1.1.0

部署到 Kubernetes
更新 Deployment 使用新镜像，确保应用正常运行且健康端点可访问。

4.2 第二步：配置 Kubernetes 探针

在确认 Java 应用已成功部署并验证健康端点可用后，修改 Kubernetes Deployment 配置，添加探针：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sms-alert-center
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sms-alert-center
  template:
    metadata:
      labels:
        app: sms-alert-center
    spec:
      containers:
      - name: sms-alert-center
        image: your-registry/xxx-xxxxx-xxxxxx:v1.1.0
        ports:
        - containerPort: 8080
        
        # 存活探针
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60      # 容器启动后等待 60 秒开始探测
          periodSeconds: 10            # 每 10 秒探测一次
          timeoutSeconds: 5            # 探测超时时间
          failureThreshold: 3          # 连续失败 3 次后重启容器
          
        # 就绪探针
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 30      # 容器启动后等待 30 秒开始探测
          periodSeconds: 10            # 每 10 秒探测一次
          timeoutSeconds: 5            # 探测超时时间
          failureThreshold: 3          # 连续失败 3 次后从 Service 移除
          successThreshold: 1          # 成功 1 次后标记为就绪

探针参数说明：

参数	说明	建议值
`initialDelaySeconds`	首次探测前的等待时间，应大于应用启动时间	Liveness: 60s, Readiness: 30s
`periodSeconds`	探测间隔	10s
`timeoutSeconds`	单次探测超时时间	5s
`failureThreshold`	连续失败多少次后触发动作	3
`successThreshold`	连续成功多少次后标记为健康（仅 Readiness 有效）	1

4.3 为什么要先发布应用再配置探针？

这是一个重要的最佳实践，原因如下：

避免部署失败：如果在应用不支持健康端点时就配置了探针，Pod 将无法通过健康检查，导致持续重启或无法就绪。
便于回滚：分步部署可以更容易定位问题。如果应用发布后出现问题，可以确定是代码变更导致的；如果探针配置后出现问题，可以确定是探针参数不合理。
渐进式验证：先确保应用的健康端点在集群内可访问，再让 Kubernetes 依赖这些端点。

五、进阶配置

5.1 自定义健康检查器

如果需要添加自定义的健康检查逻辑，可以实现 HealthIndicator 接口：

import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;

@Component
public class AlertServiceHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        // 自定义检查逻辑
        boolean alertServiceHealthy = checkAlertService();
        
        if (alertServiceHealthy) {
            return Health.up()
                    .withDetail("alertService", "Available")
                    .build();
        } else {
            return Health.down()
                    .withDetail("alertService", "Unavailable")
                    .withDetail("error", "Cannot connect to alert service")
                    .build();
        }
    }
    
    private boolean checkAlertService() {
        // 实现具体的检查逻辑
        return true;
    }
}

5.2 将自定义检查器加入 Readiness 探针

management:
  endpoint:
    health:
      group:
        readiness:
          include: readinessState,db,alertService

5.3 安全配置

生产环境建议限制健康端点的访问：

management:
  endpoints:
    web:
      exposure:
        include: health,info
  endpoint:
    health:
      show-details: when-authorized  # 仅授权用户可见详情
  server:
    port: 8081  # 使用独立端口暴露管理端点

六、常见问题

Q1: 应用启动后立即收到 Liveness 探测失败，导致容器重启

原因：initialDelaySeconds 设置过短，应用还未完成初始化。

解决：

增加 initialDelaySeconds，确保大于应用启动时间
考虑使用 startupProbe（Kubernetes 1.16+）

Q2: Readiness 探测偶发失败

原因：可能是依赖服务短暂不可用，或超时时间设置过短。

解决：

适当增加 timeoutSeconds
增加 failureThreshold，避免偶发故障导致频繁切流

Q3: 健康端点返回 DOWN 但应用实际正常

原因：某个自动配置的 HealthIndicator 检测失败。

解决：

查看 /actuator/health 详情，定位具体哪个组件 DOWN

如果是不需要的检查项，可以禁用：

management:
  health:
    redis:
      enabled: false

七、总结

通过集成 Spring Boot Actuator，我们为 sms-alert-center 项目添加了 Kubernetes 原生支持的健康探针能力。这一改进带来以下收益：

提高可用性：异常实例会被自动重启或从负载均衡中移除
零停机部署：滚动更新时，只有通过就绪检查的 Pod 才会接收流量
快速故障恢复：依赖服务恢复后，应用自动恢复服务能力
运维可观测性：通过健康端点实时了解应用状态

记住部署的最佳实践：先发布应用，验证健康端点可用，再配置 Kubernetes 探针。这样可以确保平稳过渡，避免部署风险。

参考资料

Spring Boot Actuator 官方文档
Kubernetes Liveness, Readiness and Startup Probes
Spring Boot Kubernetes 探针支持