Monitoring disk utilization is a critical task for maintaining the health of your cloud infrastructure. Whether you’re using AWS EC2, Azure VMs, or Google Cloud Compute Instances, an unmonitored disk can lead to critical system failures. This blog provides a detailed guide to automate disk utilization alerts using Python, cron jobs, and Slack. By the end of this guide, you’ll be able to:
- Launch cloud instances using Terraform.
- Set up a Python script to monitor disk utilization.
- Automate alerts using a cron job.
- Send notifications to Slack.
This approach saves time, prevents outages, and ensures proactive cloud management.
Disk utilization monitoring is often overlooked until it becomes a problem, but proactive monitoring helps avoid disruptions in applications and services. Let’s dive into the technical details to ensure your cloud infrastructure is both efficient and reliable.
Prerequisites
Before we start, make sure you have the following:
- A cloud account on AWS, Azure, or GCP.
- Basic familiarity with Python, Terraform, and shell scripting.
- Slack workspace with a webhook URL for sending alerts.
- Terraform installed on your local machine.
- Python 3.x installed with basic knowledge of psutil and requests modules.
- IAM permissions to launch and configure instances in your cloud provider.
Having these prerequisites in place ensures a smooth implementation of the solution described below.
High-Level Architecture
Workflow Overview
- Use Terraform to launch a cloud instance with user data scripts.
- Inject a Python script to monitor disk usage.
- Configure a cron job to run the script every 2 hours.
- Send alerts to Slack when disk utilization exceeds a threshold.
Below is the architecture diagram:

[Terraform] --> [Launch Instance] --> [User Data Scripts] --> [Disk Monitoring Script + Cron] --> [Slack Alerts]
By automating these tasks, you minimize manual intervention and maintain a robust monitoring solution across cloud environments.
Writing the Python Script
Overview of the Python Script
The Python script will:
- Check disk utilization using the psutil library.
- Format a message and send it to Slack using the requests library.
Python Code
Here’s a sample script:
import os import psutil import requests def check_disk_utilization(threshold, slack_webhook_url): disk_usage = psutil.disk_usage('/') used_percentage = disk_usage.percent if used_percentage > threshold: message = f":warning: Disk utilization is at {used_percentage}%! Take action now." payload = {"text": message} response = requests.post(slack_webhook_url, json=payload) if response.status_code == 200: print("Alert sent to Slack successfully.") else: print(f"Failed to send alert: {response.text}") if __name__ == "__main__": SLACK_WEBHOOK_URL = os.getenv("SLACK_WEBHOOK_URL") THRESHOLD = int(os.getenv("THRESHOLD", 80)) # Default threshold is 80% check_disk_utilization(THRESHOLD, SLACK_WEBHOOK_URL)
- Key Points:
- The script retrieves the disk usage using psutil.disk_usage(‘/’).
- It sends an alert to Slack only if the utilization exceeds the defined threshold.
Setting Up Automation with Terraform
Terraform Configuration
The Terraform script will:
- Launch a cloud instance.
- Configure networking resources like VPC, subnets, and security groups.
- Add user data scripts for the Python and cron setup.
Below is the updated main.tf file:
provider "aws" { region = "us-east-1" } resource "aws_vpc" "main_vpc" { cidr_block = "10.0.0.0/16" enable_dns_support = true enable_dns_hostnames = true tags = { Name = "MainVPC" } } resource "aws_subnet" "main_subnet" { vpc_id = aws_vpc.main_vpc.id cidr_block = "10.0.1.0/24" map_public_ip_on_launch = true availability_zone = "us-east-1a" } resource "aws_security_group" "instance_sg" { vpc_id = aws_vpc.main_vpc.id ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } resource "aws_instance" "disk_monitor" { ami = "ami-0c55b159cbfafe1f0" # Example AMI instance_type = "t2.micro" subnet_id = aws_subnet.main_subnet.id security_groups = [aws_security_group.instance_sg.name] user_data = templatefile("script.sh.tpl", {}) tags = { Name = "DiskMonitorInstance" } }
This configuration ensures your instance is secure and connected to a proper network for access and monitoring.
Creating User Data Templates
The Python Script Template (script.py.tpl)
This template dynamically injects environment variables for the Slack webhook and threshold.
import os import psutil import requests def check_disk_utilization(threshold, slack_webhook_url): disk_usage = psutil.disk_usage('/') used_percentage = disk_usage.percent if used_percentage > threshold: message = f":warning: Disk utilization is at {used_percentage}%! Take action now." payload = {"text": message} requests.post(slack_webhook_url, json=payload) if __name__ == "__main__": SLACK_WEBHOOK_URL = "${slack_webhook_url}" THRESHOLD = ${threshold} check_disk_utilization(THRESHOLD, SLACK_WEBHOOK_URL)
The Shell Script Template (script.sh.tpl)
This script installs Python dependencies and sets up the cron job to run the Python script every 2 hours.
#!/bin/bash # Update and install dependencies yum update -y yum install -y python3 pip3 install psutil requests # Add Python script cat <<EOF > /opt/disk_monitor.py ${python_script} EOF # Make the script executable chmod +x /opt/disk_monitor.py # Add cron job to run the script every 2 hours (crontab -l 2>/dev/null; echo "0 */2 * * * python3 /opt/disk_monitor.py") | crontab - # Start cron service service crond start
This ensures that the necessary dependencies are installed, and the monitoring script is executed periodically.
Adding the Python Script to Cron
In the script.sh.tpl, we used:
(crontab -l 2>/dev/null; echo "0 */2 * * * python3 /opt/disk_monitor.py") | crontab -
This ensures the Python script runs every 2 hours without manual intervention. The cron service is started to ensure the job execution.
Testing the Setup
- Run the Terraform script to launch the instance: terraform init terraform apply
- SSH into the instance to verify the files and cron setup: crontab -l
- Simulate high disk usage and confirm the Slack alert.
- Use commands like dd to create large files and monitor the script behavior.
Enhancing and Scaling the Setup
Once you have verified the basic setup, there are multiple ways to enhance and scale this solution:
Adding Multi-Disk Support
If your cloud instance has multiple disks, modify the Python script to check utilization for all mounted disks. Update the psutil.disk_partitions() function to iterate through all partitions.
for partition in psutil.disk_partitions(): usage = psutil.disk_usage(partition.mountpoint) if usage.percent > threshold: message = f":warning: Disk {partition.device} is at {usage.percent}% utilization!" payload = {"text": message} requests.post(slack_webhook_url, json=payload)
Configuring Alerts for Multiple Channels
If you want to send alerts to different Slack channels based on the severity, you can integrate multiple Slack webhook URLs and classify thresholds as Warning, Critical, etc.
Scaling Across Multiple Instances
To monitor multiple instances in the same cloud environment:
- Use Terraform to deploy the setup across multiple instances.
- Configure a centralized monitoring service like CloudWatch, Azure Monitor, or GCP Stackdriver for aggregated alerts.
Adding Logging and Metrics
Integrate logging to keep track of disk usage trends over time. Use Python’s logging module to log utilization percentages to a file or an external monitoring service.
Also read: AWS RDS Alarms via Twilio: How to Set Up Automated Phone Alerts
Conclusion
In this blog, we’ve demonstrated a robust way to automate disk utilization monitoring for cloud instances using Python, cron jobs, and Slack. This setup is cost-effective, highly customizable, and adaptable across major cloud providers like AWS, Azure, and GCP.
With the power of Terraform, you can launch instances with pre-configured monitoring scripts, ensuring consistent and automated deployment. By leveraging Slack for notifications, you maintain real-time awareness of your infrastructure health.
Disk utilization monitoring is just one of many steps toward a proactive cloud management strategy. Implementing this solution not only prevents outages but also promotes efficient resource utilization, helping you save time and operational costs.