Close Menu
    Facebook X (Twitter) Instagram
    devcurrentdevcurrent
    • DevOps
    • Tutorials
    • How To
    • News
    • Development
    Facebook X (Twitter) Instagram
    devcurrentdevcurrent
    Home»DevOps»How to Set Up Disk Utilization Alerts for Cloud Instances
    DevOps

    How to Set Up Disk Utilization Alerts for Cloud Instances

    ayush.mandal11@gmail.comBy ayush.mandal11@gmail.comJanuary 18, 2025No Comments6 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    disk utilization alerts
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Monitoring disk utilization is a critical task for maintaining the health of your cloud infrastructure. Whether you’re using AWS EC2, Azure VMs, or Google Cloud Compute Instances, an unmonitored disk can lead to critical system failures. This blog provides a detailed guide to automate disk utilization alerts using Python, cron jobs, and Slack. By the end of this guide, you’ll be able to:

    • Launch cloud instances using Terraform.
    • Set up a Python script to monitor disk utilization.
    • Automate alerts using a cron job.
    • Send notifications to Slack.

    This approach saves time, prevents outages, and ensures proactive cloud management.

    Disk utilization monitoring is often overlooked until it becomes a problem, but proactive monitoring helps avoid disruptions in applications and services. Let’s dive into the technical details to ensure your cloud infrastructure is both efficient and reliable.


    Table of Contents

    Toggle
    • Prerequisites
    • High-Level Architecture
      • Workflow Overview
    • Writing the Python Script
      • Overview of the Python Script
      • Python Code
    • Setting Up Automation with Terraform
      • Terraform Configuration
    • Creating User Data Templates
      • The Python Script Template (script.py.tpl)
      • The Shell Script Template (script.sh.tpl)
    • Adding the Python Script to Cron
    • Testing the Setup
    • Enhancing and Scaling the Setup
      • Adding Multi-Disk Support
      • Configuring Alerts for Multiple Channels
      • Scaling Across Multiple Instances
      • Adding Logging and Metrics
    • Conclusion
      • References

    Prerequisites

    Before we start, make sure you have the following:

    • A cloud account on AWS, Azure, or GCP.
    • Basic familiarity with Python, Terraform, and shell scripting.
    • Slack workspace with a webhook URL for sending alerts.
    • Terraform installed on your local machine.
    • Python 3.x installed with basic knowledge of psutil and requests modules.
    • IAM permissions to launch and configure instances in your cloud provider.

    Having these prerequisites in place ensures a smooth implementation of the solution described below.

    See also  What is DevSecOps: A Comprehensive Guide

    High-Level Architecture

    Workflow Overview

    1. Use Terraform to launch a cloud instance with user data scripts.
    2. Inject a Python script to monitor disk usage.
    3. Configure a cron job to run the script every 2 hours.
    4. Send alerts to Slack when disk utilization exceeds a threshold.

    Below is the architecture diagram:

    workflow diagram disk alert
    [Terraform] --> [Launch Instance] --> [User Data Scripts] --> [Disk Monitoring Script + Cron] --> [Slack Alerts]

    By automating these tasks, you minimize manual intervention and maintain a robust monitoring solution across cloud environments.


    Writing the Python Script

    Overview of the Python Script

    The Python script will:

    • Check disk utilization using the psutil library.
    • Format a message and send it to Slack using the requests library.

    Python Code

    Here’s a sample script:

    import os
    import psutil
    import requests
    
    def check_disk_utilization(threshold, slack_webhook_url):
        disk_usage = psutil.disk_usage('/')
        used_percentage = disk_usage.percent
    
        if used_percentage > threshold:
            message = f":warning: Disk utilization is at {used_percentage}%! Take action now."
            payload = {"text": message}
            response = requests.post(slack_webhook_url, json=payload)
    
            if response.status_code == 200:
                print("Alert sent to Slack successfully.")
            else:
                print(f"Failed to send alert: {response.text}")
    
    if __name__ == "__main__":
        SLACK_WEBHOOK_URL = os.getenv("SLACK_WEBHOOK_URL")
        THRESHOLD = int(os.getenv("THRESHOLD", 80))  # Default threshold is 80%
    
        check_disk_utilization(THRESHOLD, SLACK_WEBHOOK_URL)
    • Key Points:
      • The script retrieves the disk usage using psutil.disk_usage(‘/’).
      • It sends an alert to Slack only if the utilization exceeds the defined threshold.

    Setting Up Automation with Terraform

    Terraform Configuration

    The Terraform script will:

    • Launch a cloud instance.
    • Configure networking resources like VPC, subnets, and security groups.
    • Add user data scripts for the Python and cron setup.

    Below is the updated main.tf file:

    provider "aws" {
      region = "us-east-1"
    }
    
    resource "aws_vpc" "main_vpc" {
      cidr_block = "10.0.0.0/16"
      enable_dns_support = true
      enable_dns_hostnames = true
      tags = {
        Name = "MainVPC"
      }
    }
    
    resource "aws_subnet" "main_subnet" {
      vpc_id            = aws_vpc.main_vpc.id
      cidr_block        = "10.0.1.0/24"
      map_public_ip_on_launch = true
      availability_zone = "us-east-1a"
    }
    
    resource "aws_security_group" "instance_sg" {
      vpc_id = aws_vpc.main_vpc.id
    
      ingress {
        from_port   = 22
        to_port     = 22
        protocol    = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
      }
    
      ingress {
        from_port   = 80
        to_port     = 80
        protocol    = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
      }
    
      egress {
        from_port   = 0
        to_port     = 0
        protocol    = "-1"
        cidr_blocks = ["0.0.0.0/0"]
      }
    }
    
    resource "aws_instance" "disk_monitor" {
      ami           = "ami-0c55b159cbfafe1f0" # Example AMI
      instance_type = "t2.micro"
      subnet_id     = aws_subnet.main_subnet.id
      security_groups = [aws_security_group.instance_sg.name]
    
      user_data = templatefile("script.sh.tpl", {})
    
      tags = {
        Name = "DiskMonitorInstance"
      }
    }

    This configuration ensures your instance is secure and connected to a proper network for access and monitoring.

    See also  Solving Scaling Challenges in Kubernetes with KEDA

    Creating User Data Templates

    The Python Script Template (script.py.tpl)

    This template dynamically injects environment variables for the Slack webhook and threshold.

    import os
    import psutil
    import requests
    
    def check_disk_utilization(threshold, slack_webhook_url):
        disk_usage = psutil.disk_usage('/')
        used_percentage = disk_usage.percent
    
        if used_percentage > threshold:
            message = f":warning: Disk utilization is at {used_percentage}%! Take action now."
            payload = {"text": message}
            requests.post(slack_webhook_url, json=payload)
    
    if __name__ == "__main__":
        SLACK_WEBHOOK_URL = "${slack_webhook_url}"
        THRESHOLD = ${threshold}
    
        check_disk_utilization(THRESHOLD, SLACK_WEBHOOK_URL)

    The Shell Script Template (script.sh.tpl)

    This script installs Python dependencies and sets up the cron job to run the Python script every 2 hours.

    #!/bin/bash
    
    # Update and install dependencies
    yum update -y
    yum install -y python3
    pip3 install psutil requests
    
    # Add Python script
    cat <<EOF > /opt/disk_monitor.py
    ${python_script}
    EOF
    
    # Make the script executable
    chmod +x /opt/disk_monitor.py
    
    # Add cron job to run the script every 2 hours
    (crontab -l 2>/dev/null; echo "0 */2 * * * python3 /opt/disk_monitor.py") | crontab -
    
    # Start cron service
    service crond start

    This ensures that the necessary dependencies are installed, and the monitoring script is executed periodically.


    Adding the Python Script to Cron

    In the script.sh.tpl, we used:

    (crontab -l 2>/dev/null; echo "0 */2 * * * python3 /opt/disk_monitor.py") | crontab -

    This ensures the Python script runs every 2 hours without manual intervention. The cron service is started to ensure the job execution.


    Testing the Setup

    1. Run the Terraform script to launch the instance: terraform init terraform apply
    2. SSH into the instance to verify the files and cron setup: crontab -l
    3. Simulate high disk usage and confirm the Slack alert.
      • Use commands like dd to create large files and monitor the script behavior.

    Enhancing and Scaling the Setup

    Once you have verified the basic setup, there are multiple ways to enhance and scale this solution:

    See also  5 Common AWS VPC Peering Mistakes and How to Avoid Them

    Adding Multi-Disk Support

    If your cloud instance has multiple disks, modify the Python script to check utilization for all mounted disks. Update the psutil.disk_partitions() function to iterate through all partitions.

    for partition in psutil.disk_partitions():
        usage = psutil.disk_usage(partition.mountpoint)
        if usage.percent &gt; threshold:
            message = f":warning: Disk {partition.device} is at {usage.percent}% utilization!"
            payload = {"text": message}
            requests.post(slack_webhook_url, json=payload)
    

    Configuring Alerts for Multiple Channels

    If you want to send alerts to different Slack channels based on the severity, you can integrate multiple Slack webhook URLs and classify thresholds as Warning, Critical, etc.

    Scaling Across Multiple Instances

    To monitor multiple instances in the same cloud environment:

    • Use Terraform to deploy the setup across multiple instances.
    • Configure a centralized monitoring service like CloudWatch, Azure Monitor, or GCP Stackdriver for aggregated alerts.

    Adding Logging and Metrics

    Integrate logging to keep track of disk usage trends over time. Use Python’s logging module to log utilization percentages to a file or an external monitoring service.

    Also read: AWS RDS Alarms via Twilio: How to Set Up Automated Phone Alerts


    Conclusion

    In this blog, we’ve demonstrated a robust way to automate disk utilization monitoring for cloud instances using Python, cron jobs, and Slack. This setup is cost-effective, highly customizable, and adaptable across major cloud providers like AWS, Azure, and GCP.

    With the power of Terraform, you can launch instances with pre-configured monitoring scripts, ensuring consistent and automated deployment. By leveraging Slack for notifications, you maintain real-time awareness of your infrastructure health.

    Disk utilization monitoring is just one of many steps toward a proactive cloud management strategy. Implementing this solution not only prevents outages but also promotes efficient resource utilization, helping you save time and operational costs.


    References

    • Python psutil Library
    • AWS EC2 User Data
    • Azure Virtual Machine Extensions
    devops python terraform
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    ayush.mandal11@gmail.com
    • Website

    Related Posts

    Optimizing AWS Lambda Performance: Effective Warmup Strategies for Faster Response Times

    May 22, 2025

    GitOps in Action: How to Choose the Right CI Tool for ArgoCD

    March 31, 2025

    Mastering Celery: Best Practices for Scaling Python Applications

    March 15, 2025
    Leave A Reply Cancel Reply

    Latest Posts
    lambda optimization

    Optimizing AWS Lambda Performance: Effective Warmup Strategies for Faster Response Times

    9:57 am 22 May 2025
    queue

    How Queue Systems Work in Applications

    3:26 pm 08 May 2025
    gitops

    GitOps in Action: How to Choose the Right CI Tool for ArgoCD

    1:23 pm 31 Mar 2025
    celery

    Mastering Celery: Best Practices for Scaling Python Applications

    5:36 am 15 Mar 2025
    keda

    Solving Scaling Challenges in Kubernetes with KEDA

    5:55 am 11 Mar 2025
    Tags
    AI android ansible apple argocd aws aws bedrock celery cloudfront cost optimization datadog devops devsecops django ecs elk fastapi gitops gitops-tools grafana helm how to ingress iphone karpenter keda kubernetes lambda openswift vs kubernetes probes prompt engineer python quantum computing queue route 53 terraform terragrunt vpc VPN
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Terms & Conditions
    • Privacy Policy
    • Contact Us
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.