At this point, I generated an off-cpu flamegraph using Linux perf_events to see why we entered this state...(~58684 samples) ]
$ sudo perf script -f time,comm,pid,tid,event,ip,sym,dso,trace -i sched.data | ~/FlameGraph.../stackcollapse-perf-sched.awk | ~/FlameGraph/flamegraph.pl --color=io --countname=us >off-cpu.svg
Note...In an off-cpu flamegraph, the width of a bar is proportional to the total time spent off cpu.