Understanding Your Server's Storage Performance
Disk metrics are among the most critical indicators of server health and performance. Unlike CPU and memory, which can recover quickly from temporary spikes, disk problems tend to be persistent and can bring an entire system to a halt. Server Scout monitors both disk usage and I/O performance across multiple tiers to give you comprehensive visibility into your storage subsystem.
Storage is typically the slowest component in your server's architecture. Where RAM access is measured in nanoseconds and CPU operations in picoseconds, disk operations are measured in milliseconds—orders of magnitude slower. This makes disk the most common bottleneck in server performance, which is why Server Scout tracks disk metrics so closely.
Disk Usage Metrics (Slow Tier - Every 5 Minutes)
Server Scout collects disk usage statistics every 5 minutes from all mounted filesystems, filtering out pseudo-filesystems like tmpfs, proc, and sysfs to focus on real storage devices.
Primary Disk Metrics
| Metric | Description | Source |
|---|---|---|
disk_percent | Usage percentage of the primary mount point (/) | df command output |
disk_used_gb | Disk space used on the primary mount in gigabytes | df command output |
disk_total_gb | Total disk capacity of the primary mount in gigabytes | df command output |
The disk_percent metric is your primary indicator of storage health. Most system administrators follow the 80/90 rule: start investigating when usage exceeds 80%, take urgent action when it reaches 90%. Unlike memory pressure, which can be resolved by freeing cache, or CPU spikes, which often resolve themselves, disk space only decreases through deliberate action—files don't delete themselves.
Comprehensive Mount Point Monitoring
Server Scout doesn't just monitor your root filesystem. The disk_mounts array provides detailed information about every mounted storage device on your system:
| Field | Description | Purpose |
|---|---|---|
mount | Mount path (e.g., /, /home, /var) | Identifies the filesystem location |
device | Block device (e.g., /dev/sda1, /dev/nvme0n1p1) | Shows the underlying hardware |
fs | Filesystem type (ext4, xfs, btrfs, etc.) | Indicates filesystem capabilities and limitations |
total_gb | Total capacity in gigabytes | Baseline for calculating usage |
used_gb | Used space in gigabytes | Actual consumption |
percent | Usage percentage | Quick visual indicator |
inodes_percent | Inode usage percentage | Tracks filesystem metadata usage |
ro | Read-only flag | Critical error indicator |
The Hidden Threat: Inode Exhaustion
The inodes_percent metric tracks something many administrators overlook until it causes problems. Inodes are filesystem metadata structures that track individual files and directories. Every file consumes exactly one inode, regardless of its size.
You can have gigabytes of free disk space but still be unable to create new files if you've exhausted your inodes. This commonly occurs on:
- Mail servers with millions of small email files
- Log directories with numerous small log files
- Development environments with extensive dependency trees (looking at you,
node_modules) - Backup systems storing many incremental snapshots
When inodes_percent approaches 100%, you'll start seeing "No space left on device" errors even with available disk space. The solution isn't freeing disk space—it's deleting files to free inodes.
Read-Only Remount: The Canary in the Coal Mine
The ro flag in the mount information serves as an early warning system for serious filesystem problems. When the Linux kernel detects filesystem corruption, hardware errors, or other critical issues, it often remounts the filesystem as read-only to prevent further damage.
A filesystem that has been remounted read-only indicates:
- Potential disk hardware failure
- Filesystem corruption
- I/O errors or timeouts
- Power loss during critical write operations
This requires immediate attention, as the system can no longer write to the affected filesystem.
Disk I/O Metrics (Medium Tier - Every 30 Seconds)
While usage metrics tell you how much space you're consuming, I/O metrics reveal how hard your storage subsystem is working. Server Scout collects these every 30 seconds to capture performance patterns.
| Metric | Description | Source |
|---|---|---|
disk_io_read_bytes | Cumulative bytes read from all block devices | /proc/diskstats field 6 × 512 |
disk_io_write_bytes | Cumulative bytes written to all block devices | /proc/diskstats field 10 × 512 |
These are cumulative counters that continuously increase throughout system uptime. The Server Scout dashboard converts these into meaningful rates (bytes per second) by calculating the delta between consecutive measurements.
Understanding I/O Patterns
Read and write patterns tell different stories about your system's behaviour:
High read I/O typically indicates:
- Database query processing
- File serving (web servers, file shares)
- Backup or archival operations
- Search indexing
- Cold cache scenarios where data isn't in memory
High write I/O suggests:
- Log generation
- Database transactions
- Temporary file creation
- Swap activity (concerning if persistent)
- File downloads or data ingestion
The agent parses these values directly from /proc/diskstats, converting the sector counts (512-byte units) to bytes for easier interpretation. This approach provides accurate, low-overhead monitoring without executing external commands in the collection path.
The Relationship Between Usage and I/O
Disk usage and I/O metrics are interconnected in ways that can create cascading performance problems:
Disk Space and Write Performance
When disk usage approaches 100%, write operations begin failing, but the problems start well before that point. Most filesystems reserve 5-10% of space for the root user to ensure system operations can continue even when the disk appears "full" to regular users.
As available space decreases:
- Filesystem fragmentation increases
- Write performance degrades
- Temporary file operations fail
- Log rotation may stop working
- Applications may crash or behave unpredictably
I/O Wait and System Performance
Heavy disk I/O directly impacts the cpu_iowait metric you'll see in Server Scout's CPU monitoring. When processes are waiting for disk operations to complete, they're not using CPU cycles, but they're also not making progress. High iowait percentages indicate your system is spending significant time waiting for storage operations.
This creates a performance bottleneck where:
- Applications become unresponsive
- Load averages increase without high CPU usage
- User experience degrades significantly
- System appears to "hang" during heavy I/O
Practical Monitoring Thresholds
Effective disk monitoring requires different approaches for usage and I/O metrics:
Usage Thresholds
- 80% usage: Begin investigation and cleanup planning
- 90% usage: Urgent action required
- 95% usage: Critical—immediate intervention needed
- 90% inode usage: Start monitoring file creation patterns
- Any read-only remount: Emergency response required
I/O Rate Guidelines
I/O thresholds depend heavily on your hardware, but general patterns include:
- Sustained writes >100MB/s: Monitor for application issues
- Read rates consistently exceeding write rates: Healthy pattern for most applications
- Write rates consistently exceeding read rates: May indicate logging issues or data ingestion problems
- Sudden I/O pattern changes: Investigate for new processes or system changes
Data Collection and Storage Efficiency
Server Scout's approach to disk monitoring balances comprehensiveness with efficiency:
Usage metrics are collected every 5 minutes because disk space changes gradually. More frequent collection would waste resources without providing meaningful additional insight.
I/O metrics are collected every 30 seconds because I/O patterns can change rapidly and you need sufficient granularity to identify performance problems and correlate them with application behaviour.
The agent reads directly from /proc/diskstats rather than executing commands like iostat or iotop, maintaining its lightweight footprint while providing accurate data.
Using Disk Metrics for Proactive Management
Effective disk monitoring goes beyond reactive alerting. Use these metrics proactively:
- Trend analysis: Track usage growth rates to predict when intervention will be needed
- Pattern recognition: Identify daily, weekly, or monthly I/O patterns to optimise maintenance windows
- Capacity planning: Use historical data to size future storage requirements
- Performance correlation: Compare disk I/O with application performance metrics to identify bottlenecks
The 5-minute collection interval for usage metrics provides sufficient granularity for trend analysis while the 30-second I/O collection captures performance variations without overwhelming you with data.
Understanding your disk metrics helps you maintain system reliability, plan capacity upgrades, and identify performance bottlenecks before they impact users. In Server Scout's dashboard, you can view these metrics across multiple time ranges to spot both immediate issues and long-term trends, ensuring your storage subsystem remains healthy and performant.
Back to Complete Reference IndexFrequently Asked Questions
What are disk inodes and why do they matter?
What is the difference between disk usage and disk I/O metrics?
How should I set disk usage alert thresholds?
What does the disk_mounts array contain?
Was this article helpful?