Vertical Partitioning in System Design
Vertical partitioning, also known as column partitioning, is a technique used in the database and system design to split a table into smaller tables, each containing a subset of the columns. The goal of vertical partitioning is to reduce the amount of data that needs to be read from the disk, improve query performance, and reduce contention for locks on the table.
To implement vertical partitioning, the columns of a table are grouped into smaller subsets, based on the usage patterns of the data and the specific requirements of the system. For example, columns that are frequently queried together might be placed in the same table, while less frequently used columns might be placed in a separate table.
Vertical partitioning can be done manually or using a partitioning tool. It’s important to consider the size of the table, the number of rows, the number of columns, and the types of queries that will be executed against the table when deciding which columns to partition.
It’s important to note that vertical partitioning can also have its own set of trade-offs, such as increased complexity of the schema, increased join operations, and the possibility of data inconsistencies if not implemented carefully.
There are several techniques for implementing vertical partitioning:
- Functional Dependency-based partitioning: Data is partitioned based on the functional dependencies between columns. For example, if column A functionally determines another column B, then column A and B would be placed in the same table.
- Access Frequency-based partitioning: Data is partitioned based on the access frequency of the columns. For example, columns that are frequently queried together would be placed in the same table, while less frequently used columns would be placed in a separate table.
- Data Type-based partitioning: Data is partitioned based on the data types of the columns. For example, columns with large binary data (BLOBs) would be placed in a separate table from columns with smaller data types like integers.
- Update Frequency-based partitioning: Data is partitioned based on how frequently the columns are updated. For example, columns that are updated frequently would be placed in a separate table from columns that are rarely or never updated.
- Security-based partitioning: Data is partitioned based on the security requirements of the columns. For example, sensitive columns would be placed in a separate table with restricted access.
It’s important to choose the right partitioning technique based on the characteristics of the data and the specific requirements of the system. It’s also possible to use a combination of different methods to achieve better results.
Vertical partitioning and horizontal partitioning are two different techniques used in the database and system design to scale and distribute data.
Vertical partitioning is typically used to improve query performance by reducing the amount of data that needs to be read from the disk and to reduce contention for locks on the table. It is also useful when the data size is not too large and it can fit in a single machine but the table has many columns, some of them are not frequently used, so separating them into another table, can increase the performance of the queries.
On the other hand, horizontal partitioning is used when the data is growing too large to be stored on a single server or when the system needs to handle a large number of concurrent requests. Horizontal partitioning distributes the data across multiple servers, which can improve the performance, scalability, and availability of the system.
We should consider vertical partitioning over horizontal partitioning when:
1. The data size is not too large and it can fit on a single machine
2. The table has many columns and some of them are not frequently used
3. The goal is to improve query performance by reducing the amount of data that needs to be read from the disk
4. The goal is to reduce contention for locks on the table.
It’s important to note that each approach has its own set of trade-offs and it’s important to evaluate the specific requirements of the system before deciding which technique to use. Also, it’s possible that a combination of vertical and horizontal partitioning may be the best solution for a specific use case.