服务器健康状态如同人体的脉搏,实时监控是保障系统稳定运行的关键。掌握Java监控技术,让你对服务器状态了如指掌。
一、为什么需要监控服务器状态?
在实际生产环境中,服务器性能监控是系统稳定运行的生命线。通过实时获取CPU使用率、内存占用、磁盘空间等核心指标,我们可以:
- 预测和防止系统崩溃:在资源耗尽前及时预警
- 优化资源分配:合理调整应用部署策略
- 故障诊断:快速定位性能瓶颈
- 容量规划:为系统扩展提供数据支持
Java提供了多种方式来获取这些信息,从基础的JDK内置API到功能强大的第三方库,本文将一一为你解析。
二、监控技术方案对比
在选择具体实现方案前,我们先了解各种技术的特点:
| JDK内置API | 无需额外依赖、跨平台 | 功能有限、获取复杂 | 基础监控需求 |
| OSHI库 | 功能全面、跨平台、API友好 | 需要额外依赖 | 全面的系统监控 |
| JMX | 标准化、可远程监控 | 配置复杂、性能开销 | 企业级应用监控 |
| Sigar | 功能强大、历史悠久 | 维护较少、需要本地库 | 遗留系统 |
对于大多数现代应用,OSHI(Open Source Hardware Information) 是目前最推荐的选择,它纯Java实现,支持跨平台,无需安装本地库。
三、使用OSHI库获取系统信息
3.1 环境配置
首先在项目中添加OSHI依赖:
<!– Maven依赖 –>
<dependency>
<groupId>com.github.oshi</groupId>
<artifactId>oshi-core</artifactId>
<version>6.4.7</version>
</dependency>
<!– 或者Gradle –>
implementation 'com.github.oshi:oshi-core:6.4.7'
OSHI支持Java 8及以上版本,会自动检测操作系统并加载相应的本地库。
3.2 完整的服务器监控实现
以下是一个完整的服务器状态监控类,封装了获取各项指标的方法:
import oshi.SystemInfo;
import oshi.hardware.CentralProcessor;
import oshi.hardware.GlobalMemory;
import oshi.hardware.HardwareAbstractionLayer;
import oshi.software.os.FileSystem;
import oshi.software.os.OSFileStore;
import oshi.software.os.OperatingSystem;
import oshi.util.FormatUtil;
import oshi.util.Util;
import java.text.DecimalFormat;
import java.util.*;
/**
* 服务器状态监控工具类
* 使用OSHI库获取CPU、内存、磁盘等核心信息
*/
public class ServerMonitor {
private static final SystemInfo systemInfo = new SystemInfo();
private static final HardwareAbstractionLayer hardware = systemInfo.getHardware();
private static final OperatingSystem os = systemInfo.getOperatingSystem();
// 用于CPU使用率计算的缓存
private static long[] previousTicks;
private static long previousTime;
/**
* 获取CPU信息
*/
public static Map<String, Object> getCpuInfo() {
Map<String, Object> cpuInfo = new LinkedHashMap<>();
CentralProcessor processor = hardware.getProcessor();
// CPU基本信息
cpuInfo.put("CPU名称", processor.getProcessorIdentifier().getName());
cpuInfo.put("物理核心数", processor.getPhysicalProcessorCount());
cpuInfo.put("逻辑核心数", processor.getLogicalProcessorCount());
// CPU使用率(需要计算)
double cpuUsage = calculateCpuUsage(processor);
cpuInfo.put("系统使用率", String.format("%.2f%%", cpuUsage));
// CPU频率
long maxFreq = processor.getMaxFreq();
if (maxFreq > 0) {
cpuInfo.put("最大频率", FormatUtil.formatHertz(maxFreq));
}
// 每个逻辑处理器的使用率(按核心显示)
double[] load = processor.getProcessorCpuLoadBetweenTicks(previousTicks);
if (previousTicks == null) {
previousTicks = processor.getSystemCpuLoadTicks();
}
List<Map<String, String>> perCoreUsage = new ArrayList<>();
for (int i = 0; i < load.length; i++) {
Map<String, String> coreMap = new HashMap<>();
coreMap.put("核心" + i, String.format("%.2f%%", load[i] * 100));
perCoreUsage.add(coreMap);
}
cpuInfo.put("各核心使用率", perCoreUsage);
return cpuInfo;
}
/**
* 计算CPU总使用率
*/
private static double calculateCpuUsage(CentralProcessor processor) {
long[] ticks = processor.getSystemCpuLoadTicks();
if (previousTicks != null) {
long user = ticks[CentralProcessor.TickType.USER.getIndex()]
– previousTicks[CentralProcessor.TickType.USER.getIndex()];
long nice = ticks[CentralProcessor.TickType.NICE.getIndex()]
– previousTicks[CentralProcessor.TickType.NICE.getIndex()];
long system = ticks[CentralProcessor.TickType.SYSTEM.getIndex()]
– previousTicks[CentralProcessor.TickType.SYSTEM.getIndex()];
long idle = ticks[CentralProcessor.TickType.IDLE.getIndex()]
– previousTicks[CentralProcessor.TickType.IDLE.getIndex()];
long iowait = ticks[CentralProcessor.TickType.IOWAIT.getIndex()]
– previousTicks[CentralProcessor.TickType.IOWAIT.getIndex()];
long irq = ticks[CentralProcessor.TickType.IRQ.getIndex()]
– previousTicks[CentralProcessor.TickType.IRQ.getIndex()];
long softirq = ticks[CentralProcessor.TickType.SOFTIRQ.getIndex()]
– previousTicks[CentralProcessor.TickType.SOFTIRQ.getIndex()];
long steal = ticks[CentralProcessor.TickType.STEAL.getIndex()]
– previousTicks[CentralProcessor.TickType.STEAL.getIndex()];
long totalCpu = user + nice + system + idle + iowait + irq + softirq + steal;
if (totalCpu > 0) {
long nonIdle = user + nice + system + irq + softirq + steal;
return (nonIdle * 100.0) / totalCpu;
}
}
previousTicks = ticks;
return processor.getSystemCpuLoadBetweenTicks(previousTicks)[0] * 100;
}
/**
* 获取内存信息
*/
public static Map<String, String> getMemoryInfo() {
Map<String, String> memoryInfo = new LinkedHashMap<>();
GlobalMemory memory = hardware.getMemory();
// 总内存和可用内存
long totalMemory = memory.getTotal();
long availableMemory = memory.getAvailable();
long usedMemory = totalMemory – availableMemory;
memoryInfo.put("总内存", FormatUtil.formatBytes(totalMemory));
memoryInfo.put("已用内存", FormatUtil.formatBytes(usedMemory));
memoryInfo.put("可用内存", FormatUtil.formatBytes(availableMemory));
memoryInfo.put("使用率", String.format("%.2f%%", (usedMemory * 100.0) / totalMemory));
// 交换空间信息(如果启用)
long swapTotal = memory.getVirtualMemory().getSwapTotal();
if (swapTotal > 0) {
long swapUsed = memory.getVirtualMemory().getSwapUsed();
memoryInfo.put("交换空间总量", FormatUtil.formatBytes(swapTotal));
memoryInfo.put("已用交换空间", FormatUtil.formatBytes(swapUsed));
memoryInfo.put("交换空间使用率",
String.format("%.2f%%", (swapUsed * 100.0) / swapTotal));
}
return memoryInfo;
}
/**
* 获取磁盘信息
*/
public static List<Map<String, String>> getDiskInfo() {
List<Map<String, String>> diskList = new ArrayList<>();
FileSystem fileSystem = os.getFileSystem();
List<OSFileStore> fileStores = fileSystem.getFileStores();
DecimalFormat df = new DecimalFormat("#.##");
for (OSFileStore fs : fileStores) {
// 跳过特殊文件系统(如内存文件系统)
if (fs.getType().toLowerCase().contains("tmpfs") ||
fs.getType().toLowerCase().contains("devtmpfs")) {
continue;
}
Map<String, String> diskInfo = new LinkedHashMap<>();
long totalSpace = fs.getTotalSpace();
long freeSpace = fs.getFreeSpace();
long usableSpace = fs.getUsableSpace();
long usedSpace = totalSpace – freeSpace;
double usagePercentage = (usedSpace * 100.0) / totalSpace;
diskInfo.put("磁盘名称", fs.getName());
diskInfo.put("挂载点", fs.getMount());
diskInfo.put("文件系统类型", fs.getType());
diskInfo.put("总空间", FormatUtil.formatBytes(totalSpace));
diskInfo.put("已用空间", FormatUtil.formatBytes(usedSpace));
diskInfo.put("可用空间", FormatUtil.formatBytes(freeSpace));
diskInfo.put("使用率", df.format(usagePercentage) + "%");
// 添加警告信息(如果使用率超过85%)
if (usagePercentage > 85) {
diskInfo.put("状态", "警告: 磁盘空间不足");
} else if (usagePercentage > 70) {
diskInfo.put("状态", "注意: 磁盘空间紧张");
} else {
diskInfo.put("状态", "正常");
}
diskList.add(diskInfo);
}
return diskList;
}
/**
* 获取系统负载(Linux/Unix系统)
*/
public static Map<String, String> getSystemLoad() {
Map<String, String> loadInfo = new LinkedHashMap<>();
CentralProcessor processor = hardware.getProcessor();
// 系统负载平均值(1分钟,5分钟,15分钟)
double[] loadAverages = processor.getSystemLoadAverage(3);
if (loadAverages[0] >= 0) {
loadInfo.put("1分钟负载", String.format("%.2f", loadAverages[0]));
loadInfo.put("5分钟负载", String.format("%.2f", loadAverages[1]));
loadInfo.put("15分钟负载", String.format("%.2f", loadAverages[2]));
// 负载解释
int logicalCpuCount = processor.getLogicalProcessorCount();
String loadStatus;
double loadPerCpu = loadAverages[0] / logicalCpuCount;
if (loadPerCpu < 0.7) {
loadStatus = "正常";
} else if (loadPerCpu < 1.0) {
loadStatus = "偏高";
} else {
loadStatus = "过高";
}
loadInfo.put("负载状态", loadStatus);
loadInfo.put("建议", String.format("每核心负载: %.2f", loadPerCpu));
} else {
loadInfo.put("备注", "系统负载信息不可用(可能不是Linux/Unix系统)");
}
return loadInfo;
}
/**
* 获取系统基本信息
*/
public static Map<String, String> getSystemInfo() {
Map<String, String> sysInfo = new LinkedHashMap<>();
// 操作系统信息
sysInfo.put("操作系统", os.getFamily() + " " + os.getVersionInfo().getVersion());
sysInfo.put("系统架构", os.getBitness() + "位");
sysInfo.put("制造商", systemInfo.getHardware().getComputerSystem().getManufacturer());
sysInfo.put("型号", systemInfo.getHardware().getComputerSystem().getModel());
// 运行时间
long uptime = os.getSystemUptime();
long days = uptime / (24 * 3600);
long hours = (uptime % (24 * 3600)) / 3600;
long minutes = (uptime % 3600) / 60;
long seconds = uptime % 60;
sysInfo.put("运行时间",
String.format("%d天 %02d:%02d:%02d", days, hours, minutes, seconds));
// JVM信息
Runtime runtime = Runtime.getRuntime();
sysInfo.put("JVM总内存", FormatUtil.formatBytes(runtime.totalMemory()));
sysInfo.put("JVM可用内存", FormatUtil.formatBytes(runtime.freeMemory()));
sysInfo.put("JVM最大内存", FormatUtil.formatBytes(runtime.maxMemory()));
sysInfo.put("Java版本", System.getProperty("java.version"));
return sysInfo;
}
/**
* 获取所有监控信息的汇总报告
*/
public static Map<String, Object> getServerStatusReport() {
Map<String, Object> report = new LinkedHashMap<>();
report.put("采集时间", new Date());
report.put("系统信息", getSystemInfo());
report.put("CPU信息", getCpuInfo());
report.put("内存信息", getMemoryInfo());
report.put("磁盘信息", getDiskInfo());
report.put("系统负载", getSystemLoad());
// 整体健康状态评估
String healthStatus = assessServerHealth(report);
report.put("健康状态", healthStatus);
return report;
}
/**
* 评估服务器健康状态
*/
private static String assessServerHealth(Map<String, Object> report) {
List<String> warnings = new ArrayList<>();
// 检查CPU使用率
Map<String, Object> cpuInfo = (Map<String, Object>) report.get("CPU信息");
if (cpuInfo.containsKey("系统使用率")) {
String cpuUsageStr = (String) cpuInfo.get("系统使用率");
double cpuUsage = Double.parseDouble(cpuUsageStr.replace("%", ""));
if (cpuUsage > 90) {
warnings.add("CPU使用率过高: " + cpuUsageStr);
} else if (cpuUsage > 80) {
warnings.add("CPU使用率偏高: " + cpuUsageStr);
}
}
// 检查内存使用率
Map<String, String> memoryInfo = (Map<String, String>) report.get("内存信息");
String memoryUsageStr = memoryInfo.get("使用率");
double memoryUsage = Double.parseDouble(memoryUsageStr.replace("%", ""));
if (memoryUsage > 90) {
warnings.add("内存使用率过高: " + memoryUsageStr);
}
// 检查磁盘使用率
List<Map<String, String>> diskInfo = (List<Map<String, String>>) report.get("磁盘信息");
for (Map<String, String> disk : diskInfo) {
String diskUsageStr = disk.get("使用率");
double diskUsage = Double.parseDouble(diskUsageStr.replace("%", ""));
if (diskUsage > 90) {
warnings.add("磁盘[" + disk.get("磁盘名称") + "]空间严重不足: " + diskUsageStr);
} else if (diskUsage > 85) {
warnings.add("磁盘[" + disk.get("磁盘名称") + "]空间不足: " + diskUsageStr);
}
}
// 检查系统负载
Map<String, String> loadInfo = (Map<String, String>) report.get("系统负载");
if (loadInfo.containsKey("负载状态")) {
String loadStatus = loadInfo.get("负载状态");
if ("过高".equals(loadStatus)) {
warnings.add("系统负载过高: " + loadInfo.get("1分钟负载"));
}
}
if (warnings.isEmpty()) {
return "健康";
} else {
return "警告 – 存在" + warnings.size() + "个问题: " + String.join("; ", warnings);
}
}
/**
* 格式化输出监控报告
*/
public static void printServerStatusReport() {
Map<String, Object> report = getServerStatusReport();
System.out.println("========== 服务器状态监控报告 ==========");
System.out.println("采集时间: " + report.get("采集时间"));
System.out.println("健康状态: " + report.get("健康状态"));
System.out.println();
System.out.println("————— 系统信息 —————");
Map<String, String> systemInfo = (Map<String, String>) report.get("系统信息");
for (Map.Entry<String, String> entry : systemInfo.entrySet()) {
System.out.printf("%-15s: %s%n", entry.getKey(), entry.getValue());
}
System.out.println("\\n————— CPU信息 —————");
Map<String, Object> cpuInfo = (Map<String, Object>) report.get("CPU信息");
for (Map.Entry<String, Object> entry : cpuInfo.entrySet()) {
if ("各核心使用率".equals(entry.getKey())) {
System.out.println("各核心使用率:");
List<Map<String, String>> coreUsage = (List<Map<String, String>>) entry.getValue();
for (int i = 0; i < coreUsage.size(); i++) {
Map<String, String> core = coreUsage.get(i);
System.out.printf(" 核心%d: %s%n", i, core.get("核心" + i));
}
} else {
System.out.printf("%-15s: %s%n", entry.getKey(), entry.getValue());
}
}
System.out.println("\\n————— 内存信息 —————");
Map<String, String> memoryInfo = (Map<String, String>) report.get("内存信息");
for (Map.Entry<String, String> entry : memoryInfo.entrySet()) {
System.out.printf("%-15s: %s%n", entry.getKey(), entry.getValue());
}
System.out.println("\\n————— 磁盘信息 —————");
List<Map<String, String>> diskInfo = (List<Map<String, String>>) report.get("磁盘信息");
for (Map<String, String> disk : diskInfo) {
System.out.println("磁盘: " + disk.get("磁盘名称") +
" [" + disk.get("挂载点") + "]");
System.out.printf(" 类型: %s, 总空间: %s%n",
disk.get("文件系统类型"), disk.get("总空间"));
System.out.printf(" 已用: %s, 可用: %s%n",
disk.get("已用空间"), disk.get("可用空间"));
System.out.printf(" 使用率: %s, 状态: %s%n",
disk.get("使用率"), disk.get("状态"));
System.out.println();
}
System.out.println("————— 系统负载 —————");
Map<String, String> loadInfo = (Map<String, String>) report.get("系统负载");
for (Map.Entry<String, String> entry : loadInfo.entrySet()) {
System.out.printf("%-15s: %s%n", entry.getKey(), entry.getValue());
}
System.out.println("\\n========== 报告结束 ==========");
}
/**
* 主方法 – 测试用
*/
public static void main(String[] args) {
System.out.println("正在收集服务器状态信息…\\n");
// 打印完整报告
printServerStatusReport();
// 也可以单独获取某项信息
// Map<String, Object> cpuInfo = getCpuInfo();
// System.out.println("CPU使用率: " + cpuInfo.get("系统使用率"));
// 定时监控示例
if (args.length > 0 && "monitor".equals(args[0])) {
System.out.println("\\n启动定时监控,每10秒刷新一次…");
startPeriodicMonitoring(10);
}
}
/**
* 启动定时监控
*/
public static void startPeriodicMonitoring(int intervalSeconds) {
Timer timer = new Timer();
timer.scheduleAtFixedRate(new TimerTask() {
private int count = 0;
private final int maxCount = 30; // 最多监控30次
@Override
public void run() {
count++;
System.out.println("\\n— 监控周期 #" + count + " —");
Map<String, Object> report = getServerStatusReport();
String healthStatus = (String) report.get("健康状态");
String cpuUsage = ((Map<String, Object>) report.get("CPU信息"))
.get("系统使用率").toString();
String memoryUsage = ((Map<String, String>) report.get("内存信息"))
.get("使用率");
System.out.printf("健康状态: %s, CPU: %s, 内存: %s%n",
healthStatus, cpuUsage, memoryUsage);
if (count >= maxCount) {
System.out.println("监控任务完成。");
timer.cancel();
}
}
}, 0, intervalSeconds * 1000);
}
}
四、使用JMX监控Java应用
除了系统级监控,JMX(Java Management Extensions)是监控Java应用自身状态的强大工具:
import javax.management.*;
import java.lang.management.*;
/**
* JMX监控工具类
*/
public class JmxMonitor {
/**
* 获取JVM内存使用情况
*/
public static void monitorJvmMemory() {
MemoryMXBean memoryMxBean = ManagementFactory.getMemoryMXBean();
MemoryUsage heapMemoryUsage = memoryMxBean.getHeapMemoryUsage();
MemoryUsage nonHeapMemoryUsage = memoryMxBean.getNonHeapMemoryUsage();
System.out.println("=== JVM内存监控 ===");
System.out.printf("堆内存: 已用=%.2fMB, 提交=%.2fMB, 最大=%.2fMB%n",
heapMemoryUsage.getUsed() / (1024.0 * 1024),
heapMemoryUsage.getCommitted() / (1024.0 * 1024),
heapMemoryUsage.getMax() / (1024.0 * 1024));
System.out.printf("非堆内存: 已用=%.2fMB, 提交=%.2fMB%n",
nonHeapMemoryUsage.getUsed() / (1024.0 * 1024),
nonHeapMemoryUsage.getCommitted() / (1024.0 * 1024));
}
/**
* 获取线程信息
*/
public static void monitorThreads() {
ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
System.out.println("\\n=== 线程监控 ===");
System.out.println("活动线程数: " + threadMxBean.getThreadCount());
System.out.println("守护线程数: " + threadMxBean.getDaemonThreadCount());
System.out.println("峰值线程数: " + threadMxBean.getPeakThreadCount());
System.out.println("启动的总线程数: " + threadMxBean.getTotalStartedThreadCount());
// 检测死锁
long[] deadlockedThreads = threadMxBean.findDeadlockedThreads();
if (deadlockedThreads != null && deadlockedThreads.length > 0) {
System.out.println("警告: 检测到死锁线程!");
}
}
/**
* 获取GC信息
*/
public static void monitorGarbageCollector() {
List<GarbageCollectorMXBean> gcMxBeans =
ManagementFactory.getGarbageCollectorMXBeans();
System.out.println("\\n=== 垃圾回收监控 ===");
for (GarbageCollectorMXBean gcMxBean : gcMxBeans) {
System.out.printf("GC名称: %s%n", gcMxBean.getName());
System.out.printf(" 回收次数: %d%n", gcMxBean.getCollectionCount());
System.out.printf(" 回收时间: %dms%n", gcMxBean.getCollectionTime());
}
}
}
五、高级监控:Spring Boot Actuator集成
对于Spring Boot应用,可以轻松集成监控功能:
# application.yml
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
// 自定义健康检查
@Component
public class ServerHealthIndicator implements HealthIndicator {
@Override
public Health health() {
Map<String, Object> serverStatus = ServerMonitor.getServerStatusReport();
String healthStatus = (String) serverStatus.get("健康状态");
if ("健康".equals(healthStatus)) {
return Health.up()
.withDetail("服务器状态", "正常")
.withDetail("CPU使用率",
((Map<String, Object>) serverStatus.get("CPU信息"))
.get("系统使用率"))
.withDetail("内存使用率",
((Map<String, String>) serverStatus.get("内存信息"))
.get("使用率"))
.build();
} else {
return Health.down()
.withDetail("服务器状态", "异常")
.withDetail("问题", healthStatus)
.build();
}
}
}
六、监控数据可视化
收集到的监控数据可以通过以下方式可视化:
// Prometheus指标导出示例
import io.prometheus.client.*;
public class PrometheusExporter {
private static final Gauge cpuUsage = Gauge.build()
.name("server_cpu_usage_percent")
.help("CPU使用率百分比")
.register();
private static final Gauge memoryUsage = Gauge.build()
.name("server_memory_usage_percent")
.help("内存使用率百分比")
.register();
private static final Gauge diskUsage = Gauge.build()
.name("server_disk_usage_percent")
.help("磁盘使用率百分比")
.labelNames("mount_point")
.register();
public static void updateMetrics() {
Map<String, Object> report = ServerMonitor.getServerStatusReport();
// 更新CPU指标
String cpuUsageStr = ((Map<String, Object>) report.get("CPU信息"))
.get("系统使用率").toString();
cpuUsage.set(Double.parseDouble(cpuUsageStr.replace("%", "")));
// 更新内存指标
String memoryUsageStr = ((Map<String, String>) report.get("内存信息"))
.get("使用率");
memoryUsage.set(Double.parseDouble(memoryUsageStr.replace("%", "")));
// 更新磁盘指标
List<Map<String, String>> diskInfo =
(List<Map<String, String>>) report.get("磁盘信息");
for (Map<String, String> disk : diskInfo) {
String usageStr = disk.get("使用率");
diskUsage.labels(disk.get("挂载点"))
.set(Double.parseDouble(usageStr.replace("%", "")));
}
}
}
七、最佳实践与注意事项
7.1 监控频率控制
- 实时监控:关键指标每秒采集
- 性能监控:每5-30秒采集
- 容量监控:每小时或每天采集
7.2 异常处理
try {
Map<String, Object> report = ServerMonitor.getServerStatusReport();
// 处理监控数据
} catch (Exception e) {
// 记录错误但不要影响主业务流程
logger.error("监控数据采集失败", e);
// 返回降级数据或默认值
}
7.3 性能优化
- 缓存监控结果:避免频繁获取
- 异步采集:不影响主线程性能
- 批量更新:减少I/O操作
八、总结
通过本文的全面介绍,你应该已经掌握了使用Java获取服务器状态的各种方法。关键的要点包括:
在实际项目中,建议根据具体需求选择合适的监控方案。对于简单的项目,可以直接使用本文提供的ServerMonitor类;对于企业级应用,建议集成Spring Boot Actuator或专业的APM工具(如SkyWalking、Pinpoint等)。
监控不仅是技术问题,更是运维文化和工程实践的体现。良好的监控体系能显著提高系统稳定性和团队开发效率,是每个负责任的开发者和运维人员必备的技能。
网硕互联帮助中心





评论前必须登录!
注册