An Explanation of LVM Snapshots

Alex Brandt 2011-02-15 16:41 Comments

Introduction

It seems that disk snapshots have become a hot topic and a confusing topic. I intend to simply outline what snapshots look like in terms of the lower layers of abstraction and nothing more. Snapshots are built into things like LVM, SAN, &c but I will not be covering those technologies. Instead, what I intend to clarify is how snapshots work in LVM.

Lower Layers

Layer 1: The Hard Disk

A hard disk is historically split into several pieces called partitions. These logically separate the blocks on a disk and allow us to make sure that disks are not over-allocated for one particular purpose or lose access to a system if a critical partition (anything with run in the name) is filled to capacity.

Layer 2: LVM Physical Volume

The partitions that disks are carved into typically get turned into LVM physical volumes. There is a one to one mapping between partitions on hard drives (or whole disks as they don't need to be partitioned) and physical volumes. LVM physical volumes connect the underlying disk structure to the organization that LVM is going to provide.

Layer 3: LVM Volume Groups

The physical volumes we create can be grouped together into volume groups, groups of extents (the smallest unit of disk in LVM) that can then be re-allocated to logical volumes.

Layer 4: LVM Logical Volumes

By carving our pools, volume groups, up into particular chunks we have come full circle and are using our disk (now a few layers removed) as a group of extents that get partitioned again. At this layer the partitions are logical volumes and can be allocated to particular purposes just like the partitions we started with.

It should be evident that we have created small units of contiguous extents (not contiguous physical blocks) that map through the aforementioned layers back to physical blocks on a disk somewhere. This abstraction allows us to perform the magic involved with LVM snapshots.

LVM Snapshots

For certain purposes (database backups, &c), it is necessary to have the disk being backed up in a consistent state. If we want these conditions it usually involves stopping any services that write to that region of the disk and ensuring the changes have been flushed and that nothing further will write to that region. Stopping production services for backup purposes is a possibility if you've got the cash to create a redundant environment but this becomes a much simpler problem when using LVM snapshots.

An LVM snapshot is a temporary region of disk set aside to preserve the consistency of the region it's a snapshot of. When an LVM snapshot is created the following happens:

LVM reserves space for the original blocks that get changed by writes; this is the snapshot space (name and size in the LVM command)
LVM creates a new device (e.g. /dev/mapper/vg-snapshot) to access the disk contents at the snapshot's creation time
When a write occurs on the original disk the original extent gets written into the snapshot area

The reason that LVM snapshots don't use a differencing disk is speed. With this setup for snapshots we have instant creation and instant deletion:

creation:	Since we only create a new device node that passes through to the original logical volume for anything not contained in the snapshot, access continues normally until things start getting written to the logical volume. Once writes have happened we can short circuit this pass-through with the extents in the snapshot that map to locations in the logical volume.
deletion:	Since only the original, un-interesting extents are ever stored in the snapshot it doesn't contain any long-lived information that needs to be applied back to the original disk and can simply be dropped without concern.

Of course, this means that when we are reading this data or writing to the logical volume, we have extra operations to perform that didn't occur before the snapshot existed:

read:	Check that the extent requested hasn't been added to the snapshot location and read from the logical volume if this is the case.
write:	Write the new extent to the logical volume but also write the original contents and location of that extent into the snapshot region.

Conclusion

LVM snapshots are a convenient way of getting a consistent view of a particular logical volume while doing backups or other sensitive-to-write operations. One caveat to keep in mind (besides the ever so slight write performance drop), is the size of the snapshot dictates how much information can be written to the logical volume before the snapshot is useless and is no longer valid. Keep the amount of writing you expect for the logical volume in mind when determining the size of the snapshot you're creating.