Managing Turbo CFS Directories

Last updated: 2023-08-29 10:40:47

Use Cases

This practice aims to achieve higher performance and lower latency of a Turbo file system by better allocating directories and files.

Background

When clients access a file system, operations are performed based on the virtual file system (VFS) layer. The VFS layer serially grants and releases locks when processing multiple processes reading and writing the same file. If a client's I/O is under an extremely large directory at a certain level, the serial lock operations of VFS may cause increased I/O latency.
As a strongly consistent file system, a Turbo file system maintains consistency for any client to read and write at any time. However, this also means that clients' I/O operations will involve the granting or release of backend distributed locks. The Turbo backend can handle lock requests concurrently based on distributed metadata services. However, to ensure good performance in terms of latency, we recommend you manage the number of directories and files according to the suggestions in this document.

Suggestions

If your business mainly involves reading and there are no frequent delete/update/create operations, we recommend you control the number of subdirectories and files in a single directory to be between 10 and 1 million.
If your business involves frequent delete/update/create operations, we recommend you control the number of subdirectories and files in a single directory to be between 10 and 10,000.

Common directory structures

Sort by hash code: A 64-character hash code is used, where the first two characters form a first-level subdirectory, the next two characters form a second-level subdirectory, and files are stored under the second-level subdirectory. When the statistics feature of hash functions works properly, files can be evenly distributed among these 65,536 directories. When the file system has 600 million files, each directory has an average of 10,000 files. When the file system has 1.2 billion files, each directory has an average of 20,000 files.
Sort by time: The year, month, and day form the first-level subdirectory, the hour forms the second-level subdirectory, and the minute forms the third-level subdirectory (which can be omitted).