Autoresizing Persistent Disks in Compute Engine

Got a challenge the other day:

Is it possible to automatically resize a Persistent Disk in Google Compute Engine?

The answer is yes – with a few caveats.  

This solution really only works with Persistent Disks that are not root. Root disks seem to need a reboot to make this work – and automatically rebooting seems like a bad idea. So if you run it on a root disk it will work, but the extra space won’t be available until you manually reboot the machine.

Be careful with quotas. My solution here has a default max disk size of 64TB because that is the max disk that GCE disks can be. You may want to be more conservative with your limits because disk size = money. Also you have a quota on your account for the amount of SSD you can assign.  As of this writing it is 2TB.  You can always raise it, but this script cannot get around your quota, and will fail if it tries to.

All that out of the way, let’s give this a shot.

Step 1 – Script it

The first step is to put together a script that:

  • Checks the utilization of a disk.
  • If the utilization is too high, resizes the disk in Google Cloud Platform
  • Then also resizes the disk on the host OS.

There are a couple of other things we want to configure in this script:

  • What is the threshold percent that is high enough to resize the disk?
  • What is the factor by that we’ll increase the disk? Double it? Triple it?
  • What is the maximum limit to which we will increase the disk?

Keeping all of that in mind, here is my solution in Bash for Debian (our default OS choice on Compute Engine.) As you can see it’s a mix of gcloud commands and df.

#!/bin/bash

# Usage info
show_help() {
cat << EOF
Usage: ${0##*/} -d CLOUDDISK [-t THRESHOLD] [-f FACTOR] [-m MAX]
Checks the disk utilization of CLOUDDISK and if it is over the THRESHOLD
increase the disk size by multiplying current size by FACTOR as long as it
does not exceed MAX.
    -c              Check to make sure you have properly authorized service 
                    account. 
                    SUCCESS = display from gcloud compute disks list
                    FAILURE = ERROR - Insufficient Permission
    -h              Display this help and exit
    -d CLOUDDISK    The Google Cloud Disk name to check. This name can be seen
                    running 'gcloud compute disks list'
    -t THRESHOLD    The percentage (0-100) above which to resize the disk. 
                    DEFAULT 90
    -f FACTOR       The multiplier to resize the disk by. A 1GB disk with
                    a factor of 2 will be resized to 2GB. 
                    DEFAULT 2.
    -m MAX          The limit in GB beyond which we will not resize a disk. 
                    DEFAULT 6400GB.
Examples:
Run with defaults on a disk named 'storage' - 
    ${0##*/} -d storage

Check if the disk 'storage' is more than 50% usage, if so quadruple the disk 
to a limit of 1000GB 
    ${0##*/} -d storage - t 50 -f 4 -m 1000
    
EOF
}

check_perms() {
    /usr/local/bin/gcloud compute disks list

}

# Initialize our own variables:
THRESHOLD=90
FACTOR=2
MAX=64000
while getopts "d:t:m:f:hc" opt; do
    case "$opt" in
        h)
            show_help >&2
            exit
            ;;
        c)
            check_perms >&2
            exit
            ;;    
        d)  
            CLOUDDISK=$OPTARG
            ;;
        t)  
            THRESHOLD=$OPTARG
            ;;
        m)  
            MAX=$OPTARG
            ;;        
        f)  
            FACTOR=$OPTARG
            ;;
    esac
done
if [ "$CLOUDDISK" = "" ]
then
    echo "You must set a CLOUDDISK using -d option. Run ${0##*/} -h for more help. "
    exit
fi

# Get variables for scale parameters
LOCALDISK=`readlink -f /dev/disk/by-id/google-$CLOUDDISK`

# Get current usage in percentage expressed as a number between 1-100
tmp=`df $LOCALDISK | awk '{ print $5 }' | tail -n 1`
USAGE="${tmp//%}"

# Check to see if disk is over threshold. 
if [ $USAGE -lt $THRESHOLD ]
then
        echo "Disk is within threshold"
        exit
else
        echo "Disk is over threshold, attempting to resize"
fi

# Get Current size of disk
tmp2=`df -BG $LOCALDISK | awk '{ print $2 }' | tail -n 1`
CURRENTSIZE="${tmp2//G}"

# Compute next size of disk. 
PROPOSEDSIZE=$(( CURRENTSIZE * FACTOR ))
if [ $PROPOSEDSIZE -gt $MAX ]
then
        echo "Proposed disk size ($PROPOSEDSIZE)GB is higher than the max allowed ($MAX)GB."
        exit
else
        echo "Proposed disk size acceptable, attempting to resize"
fi

# RESIZE IT
ZONE=`/usr/local/bin/gcloud compute disks list $CLOUDDISK | awk '{ print $2 }' | tail -n 1`
/usr/local/bin/gcloud compute disks resize $CLOUDDISK --size "$PROPOSEDSIZE"GB --zone $ZONE --quiet

# Tell the OS that the disk has been resized.
sudo resize2fs /dev/disk/by-id/google-"$CLOUDDISK"

Source is also available in GitHub.

You can find the reference for the gcloud commands in the documentation.

Step 2 – Authorize it

The next step is to make sure this script can run at all.  To do that we have to delve into Cloud IAM.

First we want to create a service account. During this process we have the option to ‘Furnish a new private key’. This will cause a key file to be downloaded at the end of file creation. Choose JSON and keep track of the JSON file that gets downloaded after you click ‘Create’.

create_account

Add the service account to the IAM role – Compute Storage Admin. Then remove the service account from the project level role – Editor. We want it to have as little permission as it needs.    

grant_access

Copy the JSON file to the Compute Engine machine to which the disk you wish to monitor is attached.

Authorize the service account using the following command.

gcloud auth activate-service-account --key-file [YOUR KEY FILE].json

authorize

My co-worker, Sandeep, has a good video tutorial about service accounts if you need more information.

Step 3 Test it

Assuming you have installed the autoscale-disk script from step 1,  and you set up permissions correctly, you are ready to test it.  

To check the permissions, run:

autoscale-disk -c .

If you see the output of a gcloud compute disk list there, you got it right. If you do not, you will see a FAILURE message.

Step 4 – Cron it

Once you have the script installed, and you have tested it – it’s time to set it and forget it. Add it to crontab with your desired settings.

cron

I’m setting this up to check every minute, because it’s pretty lightweight when it isn’t actually resizing disks. However do what you will. You might also want to pipe the output to a log. Again, your call.

Conclusions

There you have it, autoscaling a disk based on utilization with a cron job. What I love about this idea is that it is so very cloudy. On prem, even if you have a pool of storage, eventually you run out, so sizing up a disk isn’t a sure thing.  But in a cloud world, if you need more it’s always just an API call away.

 

One thought on “Autoresizing Persistent Disks in Compute Engine

  1. This is cool !

    I finally got it to work … A couple of comments on resizing the root partition …

    1. As you pointed out, root does require a reboot but no additional manual interaction is required

    2. While auto-rebooting is a bad idea, not auto-rebooting is not necessarily a good idea either. For the root partition the df command will keep reporting the capacity condition that triggered the resize and with every cron job the disk will be resized up to the max of 6400GB. Maybe you want to add something that indicates that a resize is not yet in effect as the node has not been rebooted, preventing a further increase in size when in this state.

    3. At least with the CentOS 7 instances I tried /etc/sudoers per default includes requiretty preventing sudo execution from the script

    4. The OS root (/) mounts a partition /dev/sda1. The partition maps to /dev/disk-id/google–part1 e.g. google-big-data-vm11-part1. So df needs to execute on that corresponding device file.

    Thank you for putting this out there.

    Arend

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s