1、node_exporter开启如下参数
--collector.mountstats
数据来源: mountstats Exposes filesystem statistics from . Exposes detailed NFS client statistics./proc/self/mountstats
操作系统查看监控数据:
[root@zj-eflops-a100-91 ~]# rpm -qf /usr/sbin/mountstats
nfs-utils-1.3.0-0.68.el7.2.x86_64
mountstats nfsstat /var/lib/kubelet/pods/5aa344a6-76c7-4b53-9f58-f6ebff83c975/volumes/kubernetes.i
o~nfs/prod-label-ns-gpu-ocs-shucai-data
Client rpc stats:
calls retrans authrefrsh
92852223 92852224 92852223
Client nfs v3
null getattr setattr lookup access readlink
1 0% 11139780 11% 12642601 13% 22792286 24% 14395775 15% 68 0%
read write create mkdir symlink mknod
17702695 19% 6815704 7% 5545747 5% 2178 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 249 0%
fsstat fsinfo pathconf commit
682174 0% 7 0% 1 0% 1132957 1%
mountstats iostat 1 5 /var/lib/kubelet/pods/5aa344a6-76c7-4b53-9f58-f6ebff83c975/volumes/kubernetes.io
~nfs/prod-label-ns-gpu-ocs-shucai-data
file01.adas.com:/gri-ziyandata-2/ mounted on /var/lib/kubelet/pods/5aa344a6-76c7-4b53-9f58-f6ebff83c975/volumes/kubernetes.io~nfs/prod-label-ns-gpu-ocs-shucai-data:
ops/s rpc bklog
172.151 0.000
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
32.787 10348.136 315.621 0 (0.0%) 41.186 42.923
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
12.622 6770.546 536.413 0 (0.0%) 20.658 38.108
file01.adas.com:/gri-ziyandata-2/ mounted on /var/lib/kubelet/pods/5aa344a6-76c7-4b53-9f58-f6ebff83c975/volumes/kubernetes.io~nfs/prod-label-ns-gpu-ocs-shucai-data:
ops/s rpc bklog
972.000 0.000
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
92.000 33058.637 359.333 0 (0.0%) 38.587 39.630
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
715.000 11960.395 16.728 0 (0.0%) 85.057 148.130
2、node_mountstats_nfs指标
node_mountstats_nfs_total_read_bytes_total:从服务端的读数据量
平均读的吞吐量:avg(irate(node_mountstats_nfs_total_read_bytes_total{cluster=~"$cluster",instance=~"$instance",export=~"$export"}[1m])) by (export,instance)
node_mountstats_nfs_total_write_bytes_total:从服务端的写数据量
avg(irate(node_mountstats_nfs_total_write_bytes_total{cluster=~"$cluster",instance=~"$instance",export=~"$export"}[1m])) by (export,instance)
node_mountstats_nfs_operations_request_time_seconds_total:请求操作延迟
读请求操作延迟:avg(irate(node_mountstats_nfs_operations_request_time_seconds_total{operation="READ",cluster=~"$cluster",instance=~"$instance",export=~"$export"}[1m])) by (export,instance,operation)
node_mountstats_nfs_operations_response_time_seconds_total:响应操作延迟
服务端写操作响应延迟:avg(irate(node_mountstats_nfs_operations_response_time_seconds_total{operation="WRITE",cluster=~"$cluster",instance=~"$instance",export=~"$export"}[1m])) by (export,instance,operation)
以下是一些常见的node_mountstats_nfs指标:
node_mountstats_nfs_operations_requests_total:操作请求
读请求次数(iops):avg(irate(node_mountstats_nfs_operations_requests_total{operation="READ",cluster=~"$cluster",instance=~"$instance",export=~"$export"}[1m])) by (export,instance,operation)
node_mountstats_nfs_transport_bad_transaction_ids_total:异常响应次数
node_mountstats_nfs_transport_connect_total:连接数
node_mountstats_nfs_transport_bind_total:重建连接次数
node_nfs_operations_total:
描述:这个指标记录了通过NFS进行的各种操作的次数。
标签:通常包括操作类型(如读、写、获取文件属性等),可以帮助你区分不同类型的NFS操作。
示例查询:node_nfs_operations_total{operation="read"}
node_nfs_requests:
描述:这个指标显示了NFS请求的速率。
标签:通常包括操作的类型,如读、写等。
示例查询:rate(node_nfs_requests[5m])
使用这些指标,你可以监控和分析NFS挂载的性能,帮助诊断和解决性能瓶颈或故障。在Prometheus查询语言中,你可以结合使用这些指标来创建复杂的监控和告警规则。例如,监控特定操作的错误率或延迟过高的情况。
node_mountstats_nfs_operations_request_time_seconds_total 记录了从客户端发起NFS操作请求到服务器接收请求的总时间(单位:秒),主要包含网络传输延迟和服务器处理前的排队时间
node_mountstats_nfs_operations_response_time_seconds_total 记录了从服务器完成处理并发送响应到客户端接收响应的总时间(单位:秒),侧重于服务器处理耗时和响应传输延迟
Duration all requests took to get a reply back after a request for a given operation was transmitted, in seconds.
在发送给定操作的请求后,所有请求获得回复所需的持续时间(秒)。
。
node_mountstats_nfs_transport_bad_transaction_ids_total
Number of times the NFS server sent a response with a transaction ID unknown to this client.
NFS服务器发送此客户端未知的事务ID的响应的次数。
node_mountstats_nfs_operations_transmissions_total
Number of times an actual RPC request has been transmitted for a given operation.
RPC请求的次数
node_exporter自带指标说明:
# HELP node_mountstats_nfs_age_seconds_total The age of the NFS mount in seconds.
# TYPE node_mountstats_nfs_age_seconds_total counter
# HELP node_mountstats_nfs_direct_read_bytes_total Number of bytes read using the read() syscall in O_DIRECT mode.
# TYPE node_mountstats_nfs_direct_read_bytes_total counter
# HELP node_mountstats_nfs_direct_write_bytes_total Number of bytes written using the write() syscall in O_DIRECT mode.
# TYPE node_mountstats_nfs_direct_write_bytes_total counter
# HELP node_mountstats_nfs_event_attribute_invalidate_total Number of times cached inode attributes are invalidated.
# TYPE node_mountstats_nfs_event_attribute_invalidate_total counter
# HELP node_mountstats_nfs_event_data_invalidate_total Number of times an inode cache is cleared.
# TYPE node_mountstats_nfs_event_data_invalidate_total counter
# HELP node_mountstats_nfs_event_dnode_revalidate_total Number of times cached dentry nodes are re-validated from the server.
# TYPE node_mountstats_nfs_event_dnode_revalidate_total counter
# HELP node_mountstats_nfs_event_inode_revalidate_total Number of times cached inode attributes are re-validated from the server.
# TYPE node_mountstats_nfs_event_inode_revalidate_total counter
# HELP node_mountstats_nfs_event_jukebox_delay_total Number of times the NFS server indicated EJUKEBOX; retrieving data from offline storage.
# TYPE node_mountstats_nfs_event_jukebox_delay_total counter
# HELP node_mountstats_nfs_event_pnfs_read_total Number of NFS v4.1+ pNFS reads.
# TYPE node_mountstats_nfs_event_pnfs_read_total counter
# HELP node_mountstats_nfs_event_pnfs_write_total Number of NFS v4.1+ pNFS writes.
# TYPE node_mountstats_nfs_event_pnfs_write_total counter
# HELP node_mountstats_nfs_event_short_read_total Number of times the NFS server gave less data than expected while reading.
pNFS的性能优势依赖于存储后端设计(如RDMA网络或SSD直连架构)
2,部署时需匹配硬件能力
# TYPE node_mountstats_nfs_event_short_read_total counter
# HELP node_mountstats_nfs_event_short_write_total Number of times the NFS server wrote less data than expected while writing.
# TYPE node_mountstats_nfs_event_short_write_total counter
# HELP node_mountstats_nfs_event_silly_rename_total Number of times a file was removed while still open by another process.
# TYPE node_mountstats_nfs_event_silly_rename_total counter
# HELP node_mountstats_nfs_event_truncation_total Number of times files have been truncated.
# TYPE node_mountstats_nfs_event_truncation_total counter
# HELP node_mountstats_nfs_event_vfs_access_total Number of times permissions have been checked.
# TYPE node_mountstats_nfs_event_vfs_access_total counter
# HELP node_mountstats_nfs_event_vfs_file_release_total Number of times files have been closed and released.
# TYPE node_mountstats_nfs_event_vfs_file_release_total counter
# HELP node_mountstats_nfs_event_vfs_flush_total Number of pending writes that have been forcefully flushed to the server.
# TYPE node_mountstats_nfs_event_vfs_flush_total counter
# HELP node_mountstats_nfs_event_vfs_fsync_total Number of times fsync() has been called on directories and files.
# TYPE node_mountstats_nfs_event_vfs_fsync_total counter
# HELP node_mountstats_nfs_event_vfs_getdents_total Number of times directory entries have been read with getdents().
# TYPE node_mountstats_nfs_event_vfs_getdents_total counter
# HELP node_mountstats_nfs_event_vfs_lock_total Number of times locking has been attempted on a file.
# TYPE node_mountstats_nfs_event_vfs_lock_total counter
# HELP node_mountstats_nfs_event_vfs_lookup_total Number of times a directory lookup has occurred.
# TYPE node_mountstats_nfs_event_vfs_lookup_total counter
# HELP node_mountstats_nfs_event_vfs_open_total Number of times cached inode attributes are invalidated.
# TYPE node_mountstats_nfs_event_vfs_open_total counter
# HELP node_mountstats_nfs_event_vfs_read_page_total Number of pages read directly via mmap()'d files.
# TYPE node_mountstats_nfs_event_vfs_read_page_total counter
# HELP node_mountstats_nfs_event_vfs_read_pages_total Number of times a group of pages have been read.
# TYPE node_mountstats_nfs_event_vfs_read_pages_total counter
# HELP node_mountstats_nfs_event_vfs_setattr_total Number of times directory entries have been read with getdents().
# TYPE node_mountstats_nfs_event_vfs_setattr_total counter
# HELP node_mountstats_nfs_event_vfs_update_page_total Number of updates (and potential writes) to pages.
# TYPE node_mountstats_nfs_event_vfs_update_page_total counter
# HELP node_mountstats_nfs_event_vfs_write_page_total Number of pages written directly via mmap()'d files.
# TYPE node_mountstats_nfs_event_vfs_write_page_total counter
# HELP node_mountstats_nfs_event_vfs_write_pages_total Number of times a group of pages have been written.
# TYPE node_mountstats_nfs_event_vfs_write_pages_total counter
# HELP node_mountstats_nfs_event_write_extension_total Number of times a file has been grown due to writes beyond its existing end.
# TYPE node_mountstats_nfs_event_write_extension_total counter
# HELP node_mountstats_nfs_operations_major_timeouts_total Number of times a request has had a major timeout for a given operation.
# TYPE node_mountstats_nfs_operations_major_timeouts_total counter
# HELP node_mountstats_nfs_operations_queue_time_seconds_total Duration all requests spent queued for transmission for a given operation before they were sent, in seconds.
# TYPE node_mountstats_nfs_operations_queue_time_seconds_total counter
# HELP node_mountstats_nfs_operations_received_bytes_total Number of bytes received for a given operation, including RPC headers and payload.
# TYPE node_mountstats_nfs_operations_received_bytes_total counter
# HELP node_mountstats_nfs_operations_request_time_seconds_total Duration all requests took from when a request was enqueued to when it was completely handled for a given operation, in seconds.
# TYPE node_mountstats_nfs_operations_request_time_seconds_total counter
# HELP node_mountstats_nfs_operations_requests_total Number of requests performed for a given operation.
# TYPE node_mountstats_nfs_operations_requests_total counter
# HELP node_mountstats_nfs_operations_response_time_seconds_total Duration all requests took to get a reply back after a request for a given operation was transmitted, in seconds.
# TYPE node_mountstats_nfs_operations_response_time_seconds_total counter
# HELP node_mountstats_nfs_operations_sent_bytes_total Number of bytes sent for a given operation, including RPC headers and payload.
# TYPE node_mountstats_nfs_operations_sent_bytes_total counter
# HELP node_mountstats_nfs_operations_transmissions_total Number of times an actual RPC request has been transmitted for a given operation.
# TYPE node_mountstats_nfs_operations_transmissions_total counter
# HELP node_mountstats_nfs_read_bytes_total Number of bytes read using the read() syscall.
# TYPE node_mountstats_nfs_read_bytes_total counter
# HELP node_mountstats_nfs_read_pages_total Number of pages read directly via mmap()'d files.
# TYPE node_mountstats_nfs_read_pages_total counter
# HELP node_mountstats_nfs_total_read_bytes_total Number of bytes read from the NFS server, in total.
# TYPE node_mountstats_nfs_total_read_bytes_total counter
# HELP node_mountstats_nfs_total_write_bytes_total Number of bytes written to the NFS server, in total.
# TYPE node_mountstats_nfs_total_write_bytes_total counter
# HELP node_mountstats_nfs_transport_backlog_queue_total Total number of items added to the RPC backlog queue.
# TYPE node_mountstats_nfs_transport_backlog_queue_total counter
# HELP node_mountstats_nfs_transport_bad_transaction_ids_total Number of times the NFS server sent a response with a transaction ID unknown to this client.
# TYPE node_mountstats_nfs_transport_bad_transaction_ids_total counter
# HELP node_mountstats_nfs_transport_bind_total Number of times the client has had to establish a connection from scratch to the NFS server.
# TYPE node_mountstats_nfs_transport_bind_total counter
# HELP node_mountstats_nfs_transport_connect_total Number of times the client has made a TCP connection to the NFS server.
# TYPE node_mountstats_nfs_transport_connect_total counter
# HELP node_mountstats_nfs_transport_idle_time_seconds Duration since the NFS mount last saw any RPC traffic, in seconds.
# TYPE node_mountstats_nfs_transport_idle_time_seconds gauge
# HELP node_mountstats_nfs_transport_maximum_rpc_slots Maximum number of simultaneously active RPC requests ever used.
# TYPE node_mountstats_nfs_transport_maximum_rpc_slots gauge
# HELP node_mountstats_nfs_transport_pending_queue_total Total number of items added to the RPC transmission pending queue.
# TYPE node_mountstats_nfs_transport_pending_queue_total counter
# HELP node_mountstats_nfs_transport_receives_total Number of RPC responses for this mount received from the NFS server.
# TYPE node_mountstats_nfs_transport_receives_total counter
# HELP node_mountstats_nfs_transport_sending_queue_total Total number of items added to the RPC transmission sending queue.
# TYPE node_mountstats_nfs_transport_sending_queue_total counter
# HELP node_mountstats_nfs_transport_sends_total Number of RPC requests for this mount sent to the NFS server.
# TYPE node_mountstats_nfs_transport_sends_total counter
# HELP node_mountstats_nfs_write_bytes_total Number of bytes written using the write() syscall.
# TYPE node_mountstats_nfs_write_bytes_total counter
# HELP node_mountstats_nfs_write_pages_total Number of pages written directly via mmap()'d files.
# TYPE node_mountstats_nfs_write_pages_total counter
# HELP node_nfs_connections_total Total number of NFSd TCP connections.
# TYPE node_nfs_connections_total counter
# HELP node_nfs_packets_total Total NFSd network packets (sent+received) by protocol type.
# TYPE node_nfs_packets_total counter
# HELP node_nfs_requests_total Number of NFS procedures invoked.
# TYPE node_nfs_requests_total counter
# HELP node_nfs_rpc_authentication_refreshes_total Number of RPC authentication refreshes performed.
# TYPE node_nfs_rpc_authentication_refreshes_total counter
# HELP node_nfs_rpc_retransmissions_total Number of RPC transmissions performed.
# TYPE node_nfs_rpc_retransmissions_total counter
# HELP node_nfs_rpcs_total Total number of RPCs performed.
# TYPE node_nfs_rpcs_total counter
3、相关工具
[root@iZj7z01lkm7ylw7zzxnahnZ yum.repos.d]# rpm -qf /usr/sbin/nfsiostat
nfs-utils-1.3.0-0.68.el7.2.x86_64
nfsiostat
10.79.249.2:/ mounted on /var/lib/kubelet/pods/47ec3f6d-85c9-4bd9-97db-e1cdcbe8c865/volumes/kubernetes.io~csi/prod-core-jobs-qilin-ocs-nas4-pvc/mount:
op/s rpc bklog
1.56 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
[root@iZj7z01lkm7ylw7zzxnahnZ yum.repos.d]# rpm -qf /usr/bin/nfsiostat-sysstat
sysstat-10.1.5-19.el7.x86_64
nfsiostat-sysstat
Linux 3.10.0-1160.53.1.el7.x86_64 (iZj7z01lkm7ylw7zzxnahnZ) 08/21/2025 _x86_64_ (24 CPU)
Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s
file01.adas.com:/sjbh/data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
file01.adas.com:/gri-ziyandata-2 840.80 72.12 0.00 0.00 840.79 72.12 4.90 0.97 0.45
10.79.248.89:/ 7.25 0.00 0.00 0.00 7.26 0.00 0.01 0.01 0.00
10.193.66.201:/nfs/sjbh-nas3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10.79.249.2:/ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[root@iZj7z01lkm7ylw7zzxnahnZ yum.repos.d]# rpm -qf /usr/sbin/nfsstat
nfs-utils-1.3.0-0.68.el7.2.x86_64
[root@iZj7z01lkm7ylw7zzxnahnZ yum.repos.d]# nfsstat
Client rpc stats:
calls retrans authrefrsh
10505407 1 10506168
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 892600 11% 1506348 20% 1422307 18% 623016 8% 120 0%
read write create mkdir symlink mknod
1421713 18% 754931 10% 754927 10% 6734 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 1247 0% 9655 0%
fsstat fsinfo pathconf commit
73185 0% 15440 0% 7720 0% 10 0%
Client nfs v4:
null read write commit open open_conf
0 0% 1148560 38% 0 0% 0 0% 0 0% 0 0%
open_noat open_dgrd close setattr fsinfo renew
346970 11% 0 0% 346983 11% 0 0% 1044 0% 0 0%
setclntid confirm lock lockt locku access
0 0% 0 0% 0 0% 0 0% 0 0% 23875 0%
getattr lookup lookup_root remove rename link
900402 29% 222440 7% 522 0% 0 0% 0 0% 0 0%
symlink create pathconf statfs readlink readdir
0 0% 0 0% 522 0% 9284 0% 0 0% 5743 0%
server_caps delegreturn getacl setacl fs_locations rel_lkowner
1566 0% 0 0% 0 0% 0 0% 0 0% 0 0%
secinfo exchange_id create_ses destroy_ses sequence get_lease_t
0 0% 0 0% 329 0% 330 0% 327 0% 0 0%
reclaim_comp layoutget getdevinfo layoutcommit layoutreturn getdevlist
1 0% 329 0% 0 0% 0 0% 0 0% 0 0%
(null)
522 0%
(责任编辑:liangzh) |