The €80,000 Mainframe Monitoring Problem
A mid-sized financial services company recently asked their IBM rep about monitoring visibility for 200+ COBOL batch jobs running on z/OS. The quote came back at €80,000 annually for IBM System Automation with application monitoring extensions. The same day, their Linux team demonstrated complete process visibility using system accounting files and 50 lines of bash.
This story repeats across every organisation running legacy COBOL workloads. Vendors push expensive proprietary solutions while ignoring the fact that system-level process accounting provides deeper visibility than most application-specific monitoring tools.
Why Standard Process Accounting Outperforms Proprietary Tools
COBOL programs exhibit remarkably predictable resource consumption patterns. A nightly batch job that processes customer transactions will use consistent CPU time, memory, and disk I/O when healthy. When something goes wrong, these patterns shift dramatically before any application-level error appears.
System accounting captures every process execution without requiring code changes or application instrumentation. The accounting file contains process ID, CPU time, memory usage, exit codes, and runtime duration for every program execution. This data reveals health trends that expensive monitoring dashboards miss.
Consider a typical COBOL batch sequence. Program A reads transaction files, Program B validates data integrity, Program C updates the database, Program D generates reports. Each program's resource usage follows a predictable pattern based on data volume. Troubleshooting Load Spikes: When Top Shows Nothing but Load Average Says Otherwise shows how CPU pattern analysis reveals performance degradation long before errors surface.
Building Visibility with Unix Process Accounting
Process Runtime Pattern Detection
The lastcomm command reads accounting files and shows process execution history. A COBOL program that normally completes in 15 minutes but suddenly takes 45 minutes indicates underlying problems with data quality, database locks, or filesystem performance.
Creating baseline measurements takes just a few weeks of historical data. Track each program's typical CPU seconds, real-time duration, and memory usage. Alert when runtime exceeds 150% of the historical average or when CPU time doesn't correlate with elapsed time.
Memory and Resource Tracking
COBOL programs often reveal memory leaks through gradual increases in peak memory usage over multiple executions. System accounting captures maximum resident set size without impacting program performance. Unlike application monitoring tools that sample memory periodically, process accounting records the true peak usage for the entire execution.
Tracking resource patterns becomes straightforward when every program execution generates accounting records. Parse the accounting files daily and build trend analysis showing resource usage over time.
Implementation Strategy for Legacy Environments
Log Forwarding to Modern Linux Infrastructure
Most mainframe environments already forward system logs to Linux servers using syslog-ng or rsyslog. Adding process accounting data to these streams requires minimal configuration changes. Once accounting records reach Linux systems, The Small File Problem: When Applications Create Millions of Tiny Files explains how to handle the volume of individual process records efficiently.
The accounting data pipeline becomes: mainframe accounting → log forwarding → Linux aggregation → threshold analysis → alerts. This approach leverages existing log infrastructure without requiring new network connections or firewall changes.
Alert Thresholds Based on Historical Behavior
Instead of arbitrary thresholds, use historical process behavior to define normal ranges. A program that typically uses 2-4 CPU seconds should alert when it exceeds 8 CPU seconds or completes in under 1 CPU second. Both extremes indicate problems.
Exit codes provide the clearest health indicator. COBOL programs with proper error handling return specific exit codes for different failure conditions. Monitor exit code patterns and alert on any non-zero exits or unusual code frequencies.
Why This Approach Succeeds
System-level monitoring works because COBOL applications exhibit consistent behavior patterns. Unlike modern microservices with unpredictable load patterns, batch COBOL programs process similar data volumes on predictable schedules. This consistency makes threshold-based alerting extremely reliable.
Mainframe system administrators understand process accounting better than they understand proprietary monitoring agents. The tools are already installed, the data is already collected, and the skills exist within the organisation.
Server Scout's bash-based approach demonstrates how lightweight monitoring agents can bridge legacy and modern infrastructure without introducing complex dependencies or resource overhead.
Expensive proprietary tools add complexity without improving visibility. System accounting provides the data needed to monitor COBOL application health effectively. The challenge isn't data collection - it's parsing and analysing the data that already exists.
FAQ
Can system accounting detect COBOL application errors before batch job failures occur?
Yes, resource usage patterns change significantly before COBOL programs fail. Monitoring CPU time ratios, memory consumption trends, and runtime duration provides 15-30 minutes of early warning before most batch job failures.
How does process accounting compare to CICS transaction monitoring for online COBOL programs?
Process accounting works best for batch COBOL jobs with predictable resource patterns. Online CICS transactions need transaction-level monitoring since individual processes are short-lived and resource usage varies dramatically based on transaction types.
What's the storage overhead of collecting process accounting data for hundreds of COBOL programs?
Process accounting files typically consume 50-100 MB per month for extensive COBOL workloads. The data compresses well and older records can be archived, making storage costs negligible compared to proprietary monitoring license fees.