Mikes Notes
Google researchers have developed a high-speed sharding algorithm called the Jump Hash Sharding Algorithm, which has been shared with the community. I discovered this in a weekly engineering newsletter from Quastor.
Resources
- A Fast, Minimal Memory, Consistent Hash Algorithm by John Lamping and Eric Veach, from Google. (PDF)
- https://blog.quastor.org/p/bookingcom-scaled-customer-review-system
- https://www.youtube.com/watch?v=UF9Iqmg94tk
- https://medium.com/booking-com-development/scaling-our-customer-review-system-for-peak-traffic-cb19be434edf
Article from Quastor
How Booking.com Scaled Their Customer Review System
Booking.com is one of the largest online travel agencies in the world;
they booked over 35 million airplane tickets and a billion hotel room
nights through the app/website in 2023.
Whenever you use Booking.com to book a hotel room/flight, you’re prompted
to leave a customer review. So far, they have nearly 250 million customer
reviews that need to be stored, aggregated and filtered through.
Storing these many reviews on a single machine is not possible due to the
size of the data, so Booking.com has partitioned this data across several
shards. They also run replicas of each shard for redundancy and have the
entire system replicated across several availability zones. They wrote a
great blog post on how they do this.
Sharding is done based on a field of the data, called the partition key.
For this, Booking.com uses the internal ID of an accommodation. The
hotel/property/airline’s internal ID would be used to determine which
shard its customer reviews would be stored on.
A basic way of doing this is with the modulo operator.
Accommodation ID % number of shards = Shard ID
If the accommodation internal ID is 10125 and you have 90 shards in the
cluster, then customer reviews for that accommodation would go to shard 45
(equal to 10125 % 90).
Challenges with Scaling
The challenge with this sharding scheme comes when you want to add/remove
machines to your cluster (change the number of shards).
Booking.com expected a huge boom in users during the summer. They
forecasted that they would be seeing some of the highest traffic ever and
they needed to come up with a scaling strategy.
However, adding new machines to the cluster will mean rearranging all the
data onto new shards.
Let’s go back to our example with internal ID 10125. With 90 shards in our
cluster, that accommodation would get mapped to shard 45. If we add 10
shards to our cluster, then that accommodation will now be mapped to shard
25 (equal to 10125 % 100).
This process is called resharding, and it’s quite complex with our current
scheme. You have a lot of data being rearranged and you’ll have to deal
with issues around ambiguity during the resharding process. Your routing
layer won’t know if the 10125 accommodation was already resharded (moved
to the new shard) and is now on shard 25 or if it’s still stuck in the
processing queue and its data is still located on shard 45.
The solution to this is a family of algorithms called Consistent Hashing.
These algorithms minimize the number of keys that need to be remapped when
you add/remove shards to the system.
ByteByteGo did a great video on Consistent Hashing (with some awesome
visuals), so I’d highly recommend watching that if you’re unfamiliar with
the concept. Their explanation was the clearest out of all the
videos/articles I read on the topic.
Using the Jump Hash Sharding Algorithm
For their sharding scheme, Booking now uses the Jump Hash Sharding algorithm, a consistent hashing algorithm that was created at Google. It’s
extremely fast, takes minimal memory and is simple to implement (can be
expressed in 5 lines of code).
With Jump Hash, Booking.com can rely on a property called monotonicity.
This property states that when you add new shards to the cluster, data
will only move from old shards to new shards; thus there is no unnecessary
rearrangement.
With the previous scheme, we had 90 shards at first (labeled shard 1 to
90) and then added 10 new shards (labeled 91 to 100).
Accommodation ID 10125 was getting remapped from shard 45 to shard 25; it
was getting moved from one of the old shards in the cluster to another old
shard. This data transfer is pointless and doesn’t benefit end users.
What you want is monotonicity, where data is only transferred from shards
1-90 onto the new shards 91 - 100. This data transfer serves a purpose
because it balances the load between the old and new shards, so you don’t
have hot/cold shards.
The Process for Adding New Shards
Booking.com set a clear process for adding new shards to the cluster.
They provision the new hardware and then have coordinator nodes that
figure out which keys will be remapped to the new shards and loads them.
The resharding process begins and the old accommodation IDs are
transferred over to the new shards, but the remapped keys are not deleted
from the old shards during this process.
This allows the routing layer to ignore the resharding and continue
directing traffic to remapped accommodation IDs to the old locations.
Once the resharding process is complete, the routing layer is made aware
and it will start directing traffic to the new shards (the remapped
accommodation ID locations).
Then, the old shards can be updated to delete the keys that were remapped.
No comments:
Post a Comment