Jump Hash Sharding Algorithm

Mikes Notes

Google researchers have developed a high-speed sharding algorithm called the Jump Hash Sharding Algorithm, which has been shared with the community. I discovered this in a weekly engineering newsletter from Quastor.

Resources

Article from Quastor

How Booking.com Scaled Their Customer Review System

Booking.com is one of the largest online travel agencies in the world; they booked over 35 million airplane tickets and a billion hotel room nights through the app/website in 2023.

Whenever you use Booking.com to book a hotel room/flight, you’re prompted to leave a customer review. So far, they have nearly 250 million customer reviews that need to be stored, aggregated and filtered through.

Storing these many reviews on a single machine is not possible due to the size of the data, so Booking.com has partitioned this data across several shards. They also run replicas of each shard for redundancy and have the entire system replicated across several availability zones. They wrote a great blog post on how they do this.

Sharding is done based on a field of the data, called the partition key. For this, Booking.com uses the internal ID of an accommodation. The hotel/property/airline’s internal ID would be used to determine which shard its customer reviews would be stored on.

A basic way of doing this is with the modulo operator.

      Accommodation ID % number of shards = Shard ID
    
If the accommodation internal ID is 10125 and you have 90 shards in the cluster, then customer reviews for that accommodation would go to shard 45 (equal to 10125 % 90).

Challenges with Scaling

The challenge with this sharding scheme comes when you want to add/remove machines to your cluster (change the number of shards).

Booking.com expected a huge boom in users during the summer. They forecasted that they would be seeing some of the highest traffic ever and they needed to come up with a scaling strategy.

However, adding new machines to the cluster will mean rearranging all the data onto new shards.

Let’s go back to our example with internal ID 10125. With 90 shards in our cluster, that accommodation would get mapped to shard 45. If we add 10 shards to our cluster, then that accommodation will now be mapped to shard 25 (equal to 10125 % 100).

This process is called resharding, and it’s quite complex with our current scheme. You have a lot of data being rearranged and you’ll have to deal with issues around ambiguity during the resharding process. Your routing layer won’t know if the 10125 accommodation was already resharded (moved to the new shard) and is now on shard 25 or if it’s still stuck in the processing queue and its data is still located on shard 45.

The solution to this is a family of algorithms called Consistent Hashing. These algorithms minimize the number of keys that need to be remapped when you add/remove shards to the system.

ByteByteGo did a great video on Consistent Hashing (with some awesome visuals), so I’d highly recommend watching that if you’re unfamiliar with the concept. Their explanation was the clearest out of all the videos/articles I read on the topic.

Using the Jump Hash Sharding Algorithm

For their sharding scheme, Booking now uses the Jump Hash Sharding algorithm, a consistent hashing algorithm that was created at Google. It’s extremely fast, takes minimal memory and is simple to implement (can be expressed in 5 lines of code).

With Jump Hash, Booking.com can rely on a property called monotonicity. This property states that when you add new shards to the cluster, data will only move from old shards to new shards; thus there is no unnecessary rearrangement.

With the previous scheme, we had 90 shards at first (labeled shard 1 to 90) and then added 10 new shards (labeled 91 to 100).

Accommodation ID 10125 was getting remapped from shard 45 to shard 25; it was getting moved from one of the old shards in the cluster to another old shard. This data transfer is pointless and doesn’t benefit end users.

What you want is monotonicity, where data is only transferred from shards 1-90 onto the new shards 91 - 100. This data transfer serves a purpose because it balances the load between the old and new shards, so you don’t have hot/cold shards.

The Process for Adding New Shards

Booking.com set a clear process for adding new shards to the cluster.

They provision the new hardware and then have coordinator nodes that figure out which keys will be remapped to the new shards and loads them.

The resharding process begins and the old accommodation IDs are transferred over to the new shards, but the remapped keys are not deleted from the old shards during this process.

This allows the routing layer to ignore the resharding and continue directing traffic to remapped accommodation IDs to the old locations.

Once the resharding process is complete, the routing layer is made aware and it will start directing traffic to the new shards (the remapped accommodation ID locations).

Then, the old shards can be updated to delete the keys that were remapped.

No comments:

Post a Comment