前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Better perf for Android using SchedTune and SCHED_DEADLINE

Better perf for Android using SchedTune and SCHED_DEADLINE

作者头像
用户9732312
发布2022-05-13 18:56:02
4670
发布2022-05-13 18:56:02
举报
文章被收录于专栏:ADAS性能优化

SCHED_FIFO in Android (today) ● Used for some latency sensitive tasks ○ SurfaceFlinger (3-8ms every 16ms, RT priority 98) ○ Audio (<1ms every 3-5ms, low RT priority) ○ schedfreq kthread(s) (sporadic and unbounded, RT priority 50) ○ others ● Other latency sensitive tasks that are NOT SCHED_FIFO ○ UI thread (where app code resides, handles most animation and input events) ○ Render thread (generates actual OpenGL commands used to draw UI) ○ not SCHED_FIFO because ■ load balancing CPU selection is naive ■ RT throttling is too strict ■ Risk that these tasks can DoS CPUs

SCHED_FIFO (and beyond?) ● use SCHED_FIFO for UI and Render threads ○ Userspace support already in N-DR (to be released in AOSP in Dec timeframe) ○ EAS integrated RT cpu selection in-flight (to be part of MR2 release) ○ Results: ~10% (90th), ~12% (95th) and ~23%(99th) improvements in perf/Watt for jank benchmarks ● TEMP_FIFO ○ demote to CFS instead of throttling (RT throttling)

SCHED_DEADLINE (instead of SCHED_FIFO?) ✓ long term ambition is to provide better QoS using SCHED_DEADLINE https://linuxplumbersconf.org/2015/ocw//system/presentations/3063/original/lelli_slides.pdf ✓ if prototyping results are positive, mainline adoption of required modifications should be easier to achieve (w.r.t. modifying SCHED_FIFO) x missing features ○ https://github.com/jlelli/sched-deadline/wiki/TODOs ■ reclaiming (short term flexibility) ■ integration with schedutil ■ cgroup based scheduling ● demotion to CFS guinea pig for next steps will probably be SurfaceFlinger (16ms period, 3-8 ms runtime)

SchedTune in a Nutshell ● Enables the collection of task related information from informed runtimes ○ using a localized tuning interface to balance Energy Efficiency vs Performance Boost ○ extending Sched{Freq,Util} for OPP Selection and EAS for Task Placement ● OPP Selection: running at higher/lower OPP ○ makes a CPU appear artificially more (or less) utilized than it actually is ○ depending on which tasks are currently active on that CPU ● Task Placement: biasing CPU selection in the wake-up path ○ based on evaluation of the power-vs-performance trade-off ○ using a performance index definition which helps define: how much power are we willing to spend to get a certain speedup for task time-to-completion? ● Uses CGroups to provide both global and per-task boosting ○ simple yet effective support for task classification ○ allows for more advanced use-cases where the boost value is tuned at run-time e.g. replace powersave/performance governors, support for touch boosting...

SchedTune Discussion Points ● Is the CGroups interface a viable solution for mainline integration? ○ CGroups v2 discussions about per-process (instead of per-task) interface? ○ Are the implied overheads (e.g. for moving tasks) acceptable? ● How can we improve the definition of SchedTune’s performance index? ○ How much is task performance affected by certain scheduling decision? ○ How can we factor in all the potential slow-down threat? e.g. co-scheduling, higher priority tasks, blocked utilization, interrupts pressure, etc ● Is negative boosting useful? Can we prove useful and improve the support for negative boosting? ○ Where/When is useful to artificially lower the perceived utilization of a task? identify use cases, e.g. background tasks, memory bounded tasks

Performance Boosting: What Does it Means? ● Speedup the time-to-completion for a task activation ○ by running at an higher capacity CPU (i.e. OPP) ■ i.e. small tasks on big cores and/or using higher OPPs ● To achieve such a goal we need: ○ A) Boosting strategy ■ Evaluate how much “CPU bandwidth” is required by a task ○ B) CPU selection biasing mechanism ■ Select a Cluster/CPU which (can) provide that bandwidth ■ Evaluate if the energy-performance trade-off is acceptable ○ C) OPP selection biasing mechanism ■ Configure selected CPU to provide (at least) that bandwidth ■ ... but possibly only while a boosted task is RUNNABLE on that CPU ○ ... do all that with no noticeable overhead

Patches Availablity and List Discussions ● The initial full stack has been split in two series ○ 1) Non EAS dependant bits ■ OPP selection biasing ■ Global boosting strategy ■ CGroups based per-task boosting support Posted on LKML as RFCv1[1] and RFCv2[2] ○ 2) EAS dependant bits ■ CPU selection biasing ■ Energy model filtering Available on AOSP and LSK for kernels 3.18 and v4.4 [3,4] [1] https://lkml.org/lkml/2015/9/15/679 [2] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1259645.html [3] https://android.googlesource.com/kernel/common/+/android-3.18 [4] https://android.googlesource.com/kernel/common/+/android-4.4

Boosting Strategy: Bandwidth Margin Computation ● Task utilization defines the task's required CPU bandwidth ○ To boost a task we need to inflate this requirement by adding a “margin” ○ Many different strategies/policies can be defined ● Main goals ○ Well defined meaning from user-space ■ 0% boost run @ min required capacity (MAX energy efficiency) ■ 100% boost run @ MAX possible speed (min time to completion) ■ 50%? ==> “something” exactly in between the previous two ○ Easy integration with SchedFreq and EAS ■ By working on top of already used signals ■ Thus providing a different “view” on the SEs/RQs utilization signals

Signal Proportional Compensation (SPC) ● The boost value is converted into an additional margin ○ Which is computed to compensate for max performance ■ i.e. the boost margin is a function of the current and max utilization margin = boost pct ∗(max capacity − cur capacity) , boost pct ∈[0,1]

SchedTune Performance Index ● Based on the composition of two metrics Perf_idx = SpeedUp_idx − Delay_idx ● SpeedUp_Index: how much faster can the task run? SpeedUp_idx = SUI = cpu_boosted_capacity − task_util ● Delay_Index: how much slowed-down can the task be? Delay_idx = DLI = 1024 * cpu_util / (task_util + cpu_util)

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2017-03-08,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Android性能优化 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档