I/O scheduler tunables

用户9732312

发布于 2022-05-13 16:58:02

4320

发布于 2022-05-13 16:58:02

文章被收录于专栏：ADAS性能优化

Deadline and SIO:

Quote:

fifo_batch: This parameter controls the maximum number of requests per batch.It tunes the balance between per-request latency and aggregate throughput. When low latency is the primary concern, smaller is better (where a value of 1 yields first-come first-served behavior). Increasing fifo_batch generally improves throughput, at the cost of latency variation. The default is 16. front_merges: A request that enters the scheduler is possibly contiguous to a request that is already on the queue. Either it fits in the back of that request, or it fits at the front. Hence it’s called either a back merge candidate or a front merge candidate. Typically back merges are much more common than front merges. You can set this tunable to 0 if you know your workload will never generate front merges. Otherwise leave it at its default value 1. read_expire: In all 3 schedulers, there is some form of deadline to service each Read Request. The focus is read latencies. When a read request first enters the io scheduler, it is assigned a deadline that is the current time + the read_expire value in units of milliseconds. The default value is 500 ms. write_expire: Similar to Read_Expire, this applies only to the Write Requests. The default value is 5000 ms. writes_starved: Typically more attention is given to the Read requests over write requests. But this can’t go on forever. So after the expiry of this value, some of the pending write requests get the same priority as the Reads. Default value is 1. This tunable controls how many read batches can be processed before processing a single write batch. The higher this is set, the more preference is given to reads.

Noop:

Quote:

add_random In some cases, the overhead of I/O events contributing to the entropy pool for /dev/random is measurable. In such cases, it may be desirable to set this value to 0. nomerges This tunable is primarily a debugging aid. Most workloads benefit from request merging (even on faster storage such as SSDs). In some cases, however, it is desirable to disable merging, such as when you want to see how many IOPS a storage back-end can process without disabling read-ahead or performing random I/O. nr_requests If you have a latency-sensitive application, then you should consider lowering the value of nr_requests in your request queue and limiting the command queue depth on the storage to a low number (even as low as 1), so that writeback I/O cannot allocate all of the available request descriptors and fill up the device queue with write I/O. Once nr_requests have been allocated, all other processes attempting to perform I/O will be put to sleep to wait for requests to become available. This makes things more fair, as the requests are then distributed in a round-robin fashion (instead of letting one process consume them all in rapid succession). optimal_io_size In some circumstances, the underlying storage will report an optimal I/O size. This is most common in hardware and software RAID, where the optimal I/O size is the stripe size. If this value is reported, applications should issue I/O aligned to and in multiples of the optimal I/O size whenever possible. rotational Traditional hard disks have been rotational (made up of spinning platters). SSDs, however, are not. Most SSDs will advertise this properly. If, however, you come across a device that does not advertise this flag properly, it may be necessary to set rotational to 0 manually; when rotational is disabled, the I/O elevator does not use logic that is meant to reduce seeks, since there is little penalty for seek operations on non-rotational media. rq_affinity I/O completions can be processed on a different CPU from the one that issued the I/O. Setting rq_affinity to 1 causes the kernel to deliver completions to the CPU on which the I/O was issued. This can improve CPU data caching effectiveness.

CFQ:

Quote:

back_seek_max: The scheduler tries to guess that the next request for access requires going backwards from current position on the Disc. Given that such going back can be time consuming. So in anticipation, may move back on the disc prior to the next request. This setting, given in Kb, determines the max distance to go back. Default value is set to 16 Kb. Do note that in a cellphone or tablet, the storage is actually Flash Memory technology. There is Disk head to be re-positioned. As such this is not that effective as backward reads are not that bad. back_seek_penalty: This parameter is used to compute the cost of backward seeking. If the backward distance of a request is just 1 from a front request, then the seeking cost of the two requests is considered equivalent and the scheduler will not bias toward one or the other. This parameter defaults to 2 so if the distance is only 1/2 of the forward distance, CFQ will consider the backward request to be close enough to the current head location to be “close”. Therefore it will consider it as a forward request. fifo_expire_async & fifo_expire_sync : This particular parameter is used to set the timeout of asynchronous requests. CFQ maintains a fifo (first-in, first-out) list to manage timeout requests. The default value is 250 ms. A smaller value means the timeout is considered much more quickly than a larger value. Similarly, fifo_expire_sync applies to the Synchronous requests. The default is 125 ms. group_idle: If this is set, CFQ will idle before executing the last process issuing I/O in a cgroup. This should be set to 1 along with using proportional weight I/O cgroups and setting slice_idle to 0 as Flash memory is a fast storage mechanism. group_isolation: If set (to 1), there is a stronger isolation between groups at the expense of throughput. If disabled, Scheduler is biased towards sequential requests. When enabled group isolation provides balance for both sequential and random workloads. The default value is 0 (disabled). low_latency: When set (to 1), CFQ attempts to build a backlog of write requests. It will give a maximum wait time of 300 ms for each process issuing I/O on a device. This offers fairness over throughput. When disabled (set to 0), it will ignore target latency, allowing each process in the system to get a full time slice. This is enabled by default. Quantum: This option controls the maximum number of requests being processed at a time. The default value is 8. Increasing the value can improve performance; the latency of some I/O may be increased due to more requests being buffered inside the storage. slice_async: This parameter controls Maximum number of asynchronous requests at a time. The default value is set to 40 ms. slice_idle: When a task has no more requests to submit in its time slice, the scheduler waits for a while before scheduling the next thread to improve locality. The default value is 0 indicating no idling. However, a zero value increases the overall number of seeks. Hence a Non-zero number may be beneficial. slice_sync: This setting determines the time slice allotted to a process I/O. The default is 100 ms.

BFQ:

Quote:

timeout_sync & timeout_async: These parameters determine maximum disk time given to a task, respectively for synchronous and asynchronous queues. It allows the user to control the latencies imposed by the scheduler. max_budget: This determines, how much of the queue request is serviced based on number of sectors on disc. A larger value increases the throughput for the single tasks and for the system, in proportion to the percentage of sequential requests issued. Consequence is increasing the maximum latency a request may incur in. The default value is 0, which enables auto-tuning max_budget_async_rq: This setting determines number of async queues served for a maximum number of requests, before selecting a new queue. low_latency: When this is set to 1 (default is 1), interactive and soft real-time applications experience a lower latency.

ROW:

Quote:

hp_read_quantum: Dispatch quantum for the high priority READ queue. Default: 10 rp_read_quantum: Dispatch quantum for the regular priority READ queue. Default: 100 hp_swrite_quantum: Dispatch quantum for the high priority Synchronous WRITE queue. Default: 1 rp_swrite_quantum: Dispatch quantum for the regular priority Synchronous WRITE queue. Default: 1 rp_write_quantum: Dispatch quantum for the regular priority WRITE queue. Default: 1 lp_read_quantum: Dispatch quantum for the low priority READ queue. Default: 1 lp_swrite_quantum: Dispatch quantum for the low priority Synchronous WRITE queue. Default: 1 read_idle: Determines length of idle on read queue in Msec (in case idling is enabled on that queue). Default: 5ms read_idle_freq: Determines the frequency of inserting READ requests that will trigger idling. This is the time in Msec between inserting two READ requests. Default: 5ms

VR and Zen:

Quote:

rev_penalty: Penalty for reversing head direction. fifo_batch: Number of requests to issue before checking for expired requests. sync_expire: Deadline for synchronous requests. async_expire: Deadline for asynchronous requests.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-04-06，如有侵权请联系 cloudcommunity@tencent.com 删除

http