Horizontal Partitioning in System Design
Horizontal partitioning, also known as sharding, is a technique used in the database and system design to distribute data across multiple servers or machines. The goal of horizontal partitioning is to distribute data evenly across multiple servers, so that each server can handle a subset of the data, rather than having all the data stored on a single server. This can help to improve the performance, scalability, and availability of the system. To implement horizontal partitioning, the data is typically divided into smaller subsets, called shards, based on a partition key, such as a user ID, and each shard is stored on a separate server. Horizontal partitioning can be done manually or using a partitioning tool.
There are several techniques for implementing horizontal partitioning:
- Range-based partitioning: Data is partitioned based on a range of values for a specific attribute, such as a date range or a range of numerical values.
- Hash-based partitioning: Data is partitioned based on a hash function applied to a specific attribute, such as a user ID. This technique ensures that data is distributed evenly across partitions.
- List partitioning: Data is partitioned based on a list of values for a specific attribute, such as a list of customer IDs.
- Composite partitioning: This technique combines two or more partitioning methods to partition the data.
- Bucket partitioning: In this technique, data is partitioned into a fixed number of buckets, and each bucket is stored on a separate server.
It’s important to choose the right partitioning technique based on the characteristics of the data and the specific requirements of the system. It also can be done with a combination of different methods to achieve better results.
Horizontal partitioning and vertical partitioning are two different techniques used in the database and system design to scale and distribute data.
Horizontal partitioning is typically used when the data is growing too large to be stored on a single server or when the system needs to handle a large number of concurrent requests. Horizontal partitioning distributes the data across multiple servers, which can improve the performance, scalability, and availability of the system.
On the other hand, vertical partitioning is used to split a table into smaller tables, each containing a subset of the columns. This technique is used to improve query performance by reducing the amount of data that needs to be read from the disk and to reduce contention for locks on the table.
We should consider horizontal partitioning over vertical partitioning when:
1. The data size is so large that it cannot fit on a single machine
2. The number of concurrent requests is too high for a single machine to handle
3. The goal is to scale the system horizontally
4. The data is not easily divisible into smaller subsets based on the columns
It’s important to note that each approach has its own set of trade-offs and it’s important to evaluate the specific requirements of the system before deciding which technique to use.