SAN Fabric Monitoring: Unified FC, iSCSI, NVMe-oF via Linux /proc

Q: Can this approach monitor vendor-specific features like Brocade zoning or EMC PowerPath?

The /proc filesystem reveals the Linux kernel's view of storage fabric health, which covers multipathing, error statistics, and performance metrics uniformly across vendors. Vendor-specific features like advanced zoning policies aren't visible, but path health, queue depths, and failure detection work identically regardless of underlying SAN infrastructure.

Q: How does detection speed compare to enterprise SAN management tools?

Native /proc analysis typically detects issues 15-30 seconds faster than vendor dashboards because it reads directly from kernel statistics rather than polling through management APIs. FC port state changes, iSCSI TCP retransmissions, and NVMe-oF controller failures appear immediately in /sys and /proc filesystems.

Q: Will this monitoring approach work in virtualised environments with SR-IOV or virtual FC adapters?

Yes, virtualised storage adapters present the same /sys/class/ interfaces to guest systems. SR-IOV FC adapters appear as standard fc_host devices, while virtual iSCSI initiators create normal iscsi_session entries. The monitoring scripts work identically in physical and virtual environments.

Building Cross-Protocol SAN Health Monitoring: Unified FC, iSCSI, and NVMe-oF Detection Through /proc Analysis

Enterprise storage monitoring typically demands separate vendor tools for each protocol - Brocade SANNav for FC, QLogic SANsurfer for iSCSI, and vendor-specific NVMe management utilities. This fragmentation creates monitoring blind spots and licensing costs that spiral into tens of thousands annually.

The Linux /proc filesystem exposes storage fabric health data uniformly across all three protocols. By parsing the same kernel interfaces that vendor tools query beneath their polished dashboards, you can build comprehensive SAN monitoring that detects path failures and performance degradation 15-30 seconds before commercial solutions.

Step 1: Map Your Storage Fabric Topology

Before monitoring paths, identify which protocols your systems use and how they connect to storage arrays.

List all SCSI devices to see FC and iSCSI LUNs:

cat /proc/scsi/scsi | grep -A4 "Host:"

This reveals device types, vendors, and SCSI addresses. FC devices show as "Host: scsi0" with Vendor entries like "NETAPP" or "EMC". iSCSI targets appear with "Host: scsi1" and transport details.

For NVMe-oF devices, check the NVMe subsystem:

ls -la /sys/class/nvme/ shows nvme controllers, while nvme list displays namespace mappings.

Document this topology - you'll reference specific device paths and controller numbers throughout your monitoring scripts.

Step 2: Build FC Fabric Path Health Detection

FC fabric monitoring centres on queue depths, error counters, and path state changes that indicate SAN congestion or cable issues.

Create a script that parses FC statistics from /sys/class/fc_host/:

for host in /sys/class/fchost/host*; do echo "$host: $(cat $host/portstate) $(cat $host/speed)"; done

This command reveals port states (Online/Offline/Linkdown) and negotiated speeds. Healthy FC ports show "Online 8 Gbit" or similar.

Monitor queue depths through /sys/class/scsihost/host/canqueue and current usage via /sys/block//queue/nr_requests. When current usage consistently exceeds 80% of available queues, fabric congestion is imminent.

Track FC error statistics in /sys/class/fchost/host*/statistics/ - lipcount (Loop Initialization Primitive), noscount (Not Operational Sequences), and errorframes indicate physical layer problems.

Step 3: Monitor iSCSI Session Health Through TCP Analysis

iSCSI runs over TCP, so connection health appears in standard network statistics before iSCSI-specific tools detect issues.

Identify active iSCSI sessions by parsing /proc/net/tcp for port 3260 connections:

grep ":0CBC" /proc/net/tcp shows established iSCSI sessions (0CBC is port 3260 in hex).

For each session, monitor retransmission counts in /proc/net/snmp - the "TcpRetransSegs" counter indicates network problems affecting iSCSI performance.

Check iSCSI-specific statistics in /sys/class/iscsisession/session*/stats/ for digest errors, timeout errors, and command abort counts. Non-zero values in txdataoctets vs rxdata_octets reveal bandwidth asymmetry.

Multipath configurations require additional validation - examine /proc/mounts for device-mapper entries and cross-reference with /sys/block/dm-*/slaves/ to ensure all paths remain active.

Step 4: Implement NVMe-oF Controller State Monitoring

NVMe-oF controllers expose health data through /sys/class/nvme/ that reveals transport-layer issues before namespace I/O errors appear.

For each controller, check the transport type: cat /sys/class/nvme/nvme0/transport shows "tcp", "rdma", or "fc".

Monitor controller state changes through /sys/class/nvme/nvme0/state - healthy controllers show "live", while "connecting", "deleting", or "dead" indicate problems.

Parse queue depth utilisation from /sys/class/nvme/nvme0/queue*/iotimeout and correlate with I/O statistics in /proc/diskstats for the corresponding namespace devices.

NVMe-oF TCP connections appear in /proc/net/tcp on port 4420 (hex 1144), enabling the same connection health analysis used for iSCSI.

Step 5: Create Unified Protocol Health Scripts

Build a single monitoring script that queries health indicators across all three protocols and correlates failure patterns.

Start with a function that identifies active storage protocols:

detectprotocols() should check for FC hosts in /sys/class/fchost/, iSCSI sessions in /sys/class/iscsi_session/, and NVMe controllers in /sys/class/nvme/.

For each protocol, implement threshold-based alerting:

FC: Alert when portstate != "Online" or errorframes > 10/minute
iSCSI: Alert when TCP retransmissions exceed 1% of total packets or session timeouts occur
NVMe-oF: Alert when controller state != "live" or I/O timeout counts increase

Cross-protocol correlation detects broader issues - if FC and iSCSI paths to the same storage array fail simultaneously, suspect array controller problems rather than fabric issues.

Step 6: Configure Performance Threshold Automation

Static thresholds miss the gradual degradation that precedes complete path failures. Implement baseline tracking that adapts to normal performance patterns.

Store 7-day averages for key metrics:

FC queue depth utilisation percentages
iSCSI TCP round-trip times from /proc/net/tcp
NVMe-oF I/O completion latencies

Alert when current values exceed baseline + 2 standard deviations for 5 consecutive minutes. This mathematical approach catches subtle performance shifts that fixed thresholds miss.

For production environments requiring 24/7 monitoring with immediate alerting, Server Scout's native service monitoring can track your storage health scripts as systemd services, sending notifications when path failures occur.

Step 7: Build Alert Correlation and Recovery Detection

Storage fabric issues often cascade - a single FC switch failure can trigger dozens of path alerts across multiple servers. Implement correlation logic that groups related failures.

Create time-windowed alert suppression - if multiple paths to the same storage target fail within 30 seconds, send one consolidated alert rather than flooding notification channels.

Monitor recovery patterns by tracking when failed paths return to healthy states. Building comprehensive alert infrastructure becomes crucial for managing storage fabric notifications across multiple datacenters.

Document mean-time-to-recovery for each path type - FC paths typically recover faster than iSCSI during network congestion, while NVMe-oF recovery depends heavily on transport protocol configuration.

Cross-Protocol Monitoring Delivers Fabric-Wide Visibility

This unified approach eliminates the vendor tool fragmentation that creates monitoring blind spots in heterogeneous storage environments. By parsing the same kernel interfaces that expensive SAN management platforms query, you achieve equivalent visibility at a fraction of the licensing cost.

The /proc filesystem approach scales across storage protocols because Linux abstracts fabric differences into consistent interfaces. Whether monitoring 10 servers or 1000, the same scripts work identically across FC, iSCSI, and NVMe-oF deployments.

Production teams report significant ROI when replacing multiple vendor monitoring tools with lightweight native Linux solutions that provide faster detection times and lower operational overhead.

FAQ

Can this approach monitor vendor-specific features like Brocade zoning or EMC PowerPath?

The /proc filesystem reveals the Linux kernel's view of storage fabric health, which covers multipathing, error statistics, and performance metrics uniformly across vendors. Vendor-specific features like advanced zoning policies aren't visible, but path health, queue depths, and failure detection work identically regardless of underlying SAN infrastructure.

How does detection speed compare to enterprise SAN management tools?

Native /proc analysis typically detects issues 15-30 seconds faster than vendor dashboards because it reads directly from kernel statistics rather than polling through management APIs. FC port state changes, iSCSI TCP retransmissions, and NVMe-oF controller failures appear immediately in /sys and /proc filesystems.

Will this monitoring approach work in virtualised environments with SR-IOV or virtual FC adapters?

Yes, virtualised storage adapters present the same /sys/class/ interfaces to guest systems. SR-IOV FC adapters appear as standard fchost devices, while virtual iSCSI initiators create normal iscsisession entries. The monitoring scripts work identically in physical and virtual environments.

Building Cross-Protocol SAN Health Monitoring: Unified FC, iSCSI, and NVMe-oF Detection Through /proc Analysis

Building Cross-Protocol SAN Health Monitoring: Unified FC, iSCSI, and NVMe-oF Detection Through /proc Analysis

Step 1: Map Your Storage Fabric Topology

Step 2: Build FC Fabric Path Health Detection

Step 3: Monitor iSCSI Session Health Through TCP Analysis

Step 4: Implement NVMe-oF Controller State Monitoring

Step 5: Create Unified Protocol Health Scripts

Step 6: Configure Performance Threshold Automation

Step 7: Build Alert Correlation and Recovery Detection

Cross-Protocol Monitoring Delivers Fabric-Wide Visibility

FAQ

Can this approach monitor vendor-specific features like Brocade zoning or EMC PowerPath?

How does detection speed compare to enterprise SAN management tools?

Will this monitoring approach work in virtualised environments with SR-IOV or virtual FC adapters?

Ready to Try Server Scout?