I recently upgraded an 8 node Cassandra cluster (logically split over 2 data centres) from version 1.0 to 1.2. It took me quite a while of experimentation in a sandpit environment before I came up with a formula that I had the confidence to use in production. This was done in rolling fashion.
Any data that I cared about had a replication factor of 3 in each data centre.
The first step in this journey is of course to follow the often excellent documentation on the Datastax website:
1.1 upgrade guide
1.2 upgrade guide
I've included the v1.1 upgrade pages for reasons that will become apparent.
These upgrade guides are rather economical and it is worth reading around the forums in order to fill out the detail.
First I'll describe my journey which started by following the v1.2 upgrade guide. Since the database was already running v1.0, upgrading straight to v1.2 is considered legal. This involved a number of prerequisite checks such as modifying queries as stipulated in the guide, checking the News.txt file for any upgrade related insights and likewise with the Changes.txt file. All fairly standard.
With that out of the way, the first step was to take a snapshot per node. I strongly recommend copying the snapshots to a location outside of cassandra's control. As I discovered during testing, part of the upgrade process is to re-organise the data file structure from:
<path>/KS/filename (1.0 file name format)
to:
<path>/KS/CF/filename (1.2 file name format)
which includes the snapshot files! That leaves you with two headaches. First, should you wish to revert back from 1.2 to 1.0, you need to grab all the snapshot files, convert them back to the original file name format and move them back to the original directory structure. Secondly, and perhaps more worryingly, if the upgrade fails it could leave your snapshot files in a mess with some of them in the original format/location, some in the new and some possibly missing. I can vouch for this, having experienced this very issue during testing. There was no return. Being a test cluster I would have been happy enough to blow away the cluster and start again. In the end I elected to use cassandra's excellent capability to rebuild a node from the surviving nodes, none of which had been upgraded at that point.
Tip #1: Copy/rsync snapshots to a directory outside of cassandra's control or better.
My first upgrade attempt comes right out of the guide.
Iteration 1 Upgrade
Per Node:
1. snapshot cassandra.
2. copy snapshots to remote server (or new directory on same server if you have space.)
3. nodetool drain
4. stop cassandra
5. update the cassandra 1.2 binaries.
6. start cassandra 1.2
7. upgrade sstables: nodetool -h localhost upgradesstables -a
On my spinning-disk cluster, upgradesstables runs at about 1G every 2-3 minutes. Given my quantity of data per node, I would expect this task to last 4 hours! The only way to update an 8 node cluster is by doing 1 node per day making the complete upgrade over 1 week (as the upgrade was to happen at night). N.B. the documentation also tells you not to move, repair or bootstrap until the upgrade is complete. That sounds pretty impractical.
So what happens if you need to cutback to 1.0? There is no easy answer to this and I think the only sensible approach is to NOT cutback to 1.0 after a few hours of operating on 1.2. That means you will only be part way through upgrading your first node when you must decide to commit to it. The implication is to thoroughly test your applications against cassandra 1.2, i.e take every precaution to not have to rollback.
However, I have a trick up my sleeve that ameliorates circumstances somewhat and I'll come on to that a bit later.
Tip #2: Functionally and non-functionally test your apps as thoroughly as you can, as you REALLY don't want to cut back to cassandra 1.0. Copy your production data to a new cluster if possible, even feeble vms, and test the upgrade (I did this).
This is where it gets interesting. On checking the log file having fired up cassandra 1.2, i.e. before the upgradesstable step, I noticed these errors:
INFO [HANDSHAKE-/[ip redacted]] 2014-01-15 12:58:48,047 OutboundTcpConnection.java (line 408) Cannot handshake version with /[ip redacted]
INFO [HANDSHAKE-/[ip redacted] 2014-01-15 12:58:48,048 OutboundTcpConnection.java (line 399) Handshaking version with /[ip redacted]
...etc for all nodes...
Googling around I came across this issue on an Apache jira in which Jonathan Ellis states that for rolling upgrades you cannot jump major versions, i.e. you must follow 1.0>1.1>1.2 (https://issues.apache.org/jira/browse/CASSANDRA-5740).
Iteration 2 Upgrade
As per iteration 1, but done twice. Once for 1.0>1.1 and then 1.1>1.2
Needless to say I experimented with such an approach, but was wholly unenamoured by the prospect of having to run upgradesstable twice per node!
So I dug a little deeper into this error. As far as I could make out, range queries would fail on a mixed version cluster and some nodetool commands fail, such as nodetool ring. The natural question is then, can I not complete the binary upgrade on all nodes and then upgrade the sstables? That would be the silver bullet as I could upgrade directly from 1.0 to 1.2. Upgrading the binaries takes just a few minutes and just might be acceptable at night when traffic drops off a cliff.
My new direction was affirmed in this posting where Aaron Morton of the Last Pickle suggests the same which gave me a warm and cuddly feeling!
Things were starting to take shape. The plan was to upgrade binaries on a rolling basis for all 8 nodes, then stagger but overlap upgradesstable such that the full end-to-end upgrade would last at most 1 day.
Ultimately, the method I went with was as follows:
Iteration 3
Upgrade binaries (per node)
1. take a snapshot
2. copy to backup server
3. test retrieving a record
4. save a list of all datafiles
5. run nodetool ring and make sure all nodes are as expected
6. stop cassandra
7. start cassandra
8. test retrieving a record
9. clear snapshots (no need to have them lurking around as they have been copied elsewhere)
10. upgrade binaries.
11. rolling restart application nodes*
12. nodetool describecluster
13. nodetool ring
* - I found that the upgrade caused hector to mark the node as dead and gone and would only be recognised by a restart.
Cutback Plan
So what about cutback options now? The cluster has 4 nodes per data centre, 8 nodes in total. Bearing in mind that each data centre contains a complete set of data the cut back plan was to reinstall the 1.0 binaries and rebuild data from the other nodes (remember that I am using a replication factor of 3 per DC). So if the upgrade of node 1 in DC1 failed, I would cut back as I could not be certain that other nodes would not similarly fail and I can only withstand 1 node failure per DC. Similar with nodes 2 and 3. However, if node 4 failed I would proceed to upgrade the nodes in DC2. This is because I could live with one node being down in each DC and could resolve this shortly after the upgrade completed.
Upgrade sstables
I did this staggered with some overlap per node such that the whole process took about 7-8 hours.
Test cutback (all nodes)
1. shutdown cassandra
2. clear data directories
3. restore 1.0 files from the backup server
4. restore 1.0 binaries
5. start cassandra.
No comments:
Post a Comment