Overview
Our customer had a vendor product which used a MySQL database. The nightly backup of the database was preventing users from accessing the data for the duration of the backup. As the data grew, the backup was taking longer and longer, and this was causing problems with other system activity – timeouts and failures had become an issue.
There are various approaches to backing up MySQL which we don’t cover here. In this case, it was not feasible to change the customer’s backup procedure at the time, so this workaround, which took just a couple of hours to implement and test, helped them avoid regular downtime for a few months until an alternative backup approach could be rolled out.
We reduced downtime from over an hour to a few seconds
Before we addressed the problem, our customer’s data would be inaccessible for at least one hour each night. After we implemented our solution, the downtime was reduced to a few seconds.
If you are interested in reading the technical details of this solution, we share them below. If you’d prefer to speak to us about a similar problem in your organisation, please fill in the contact form on the right and we will get in touch.
Ewan’s Technical Notes
A redeployment of the vendor software was already on the horizon, including database replication as the vendor’s preferred backup strategy. In the meantime, what was the most cost effective way to prevent the old backup from disabling the system?
LVM2 – Logical Volume Management
I had previously used LVM2 snapshots to perform low level file system backups on a running system. The commands are well documented elsewhere, so a summary should suffice: If a file system (volume) has been created in a volume group with enough free space left over, an instantaneous snapshot can be taken and maintained by LVM2, using copy-on-write. My previous use of LVM2 was to create such snapshots and run “dump” on those to perform a backup, in the knowledge that if I’d got my sums right, the snapshot would remain valid long enough to complete the backup, without interrupting any continued attempts to change the real file system.
Of course, database backups should not normally be a simple backup of the file system. Even if a snapshot is taken to guarantee no changes to the data being backed up during the process, a restore does involve an element of recovery – to the database, it looks like the system was shut down uncleanly. How could the instant snapshot to be used to take a conventional database backup?
The solution was simple, and had been hinted at elsewhere in my Google searches: mount the LVM2 snapshot as a normal file system, and create a second MySQL configuration to allow it to be accessed, albeit through a non-standard MySQL port. Then a backup could be done by pointing “mysqldump” to this copy. After the backup, the second MySQL would be shut down, the file system unmounted and the snapshot deleted.
Using LVM Snapshots to backup MySQL
So that the second MySQL would not “wake up” to a file system requiring recovery, there was the need to lock the tables just like the previous “mysqldump” would do – but only for long enough to create the LVM2 snapshot, which is just a few seconds. Reducing the system pause to such a short time meant there were no longer any timeouts or failures.
This change to the backups was quick to implement and bought us time to properly roll out new deployments of the vendor software with an entirely different mechanism for backups. Having previously enjoyed using LVM2 for more conventional uses, this was an interesting and novel use which gave us a few months to breathe more easily.