diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt new file mode 100644 index 0000000000000000000000000000000000000000..cbf7ae412da4ab915a345582cf6b5fc93ca00ed5 --- /dev/null +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -0,0 +1,202 @@ +REDUCING OS JITTER DUE TO PER-CPU KTHREADS + +This document lists per-CPU kthreads in the Linux kernel and presents +options to control their OS jitter. Note that non-per-CPU kthreads are +not listed here. To reduce OS jitter from non-per-CPU kthreads, bind +them to a "housekeeping" CPU dedicated to such work. + + +REFERENCES + +o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. + +o Documentation/cgroups: Using cgroups to bind tasks to sets of CPUs. + +o man taskset: Using the taskset command to bind tasks to sets + of CPUs. + +o man sched_setaffinity: Using the sched_setaffinity() system + call to bind tasks to sets of CPUs. + +o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, + writing "0" to offline and "1" to online. + +o In order to locate kernel-generated OS jitter on CPU N: + + cd /sys/kernel/debug/tracing + echo 1 > max_graph_depth # Increase the "1" for more detail + echo function_graph > current_tracer + # run workload + cat per_cpu/cpuN/trace + + +KTHREADS + +Name: ehca_comp/%u +Purpose: Periodically process Infiniband-related work. +To reduce its OS jitter, do any of the following: +1. Don't use eHCA Infiniband hardware, instead choosing hardware + that does not require per-CPU kthreads. This will prevent these + kthreads from being created in the first place. (This will + work for most people, as this hardware, though important, is + relatively old and is produced in relatively low unit volumes.) +2. Do all eHCA-Infiniband-related work on other CPUs, including + interrupts. +3. Rework the eHCA driver so that its per-CPU kthreads are + provisioned only on selected CPUs. + + +Name: irq/%d-%s +Purpose: Handle threaded interrupts. +To reduce its OS jitter, do the following: +1. Use irq affinity to force the irq threads to execute on + some other CPU. + +Name: kcmtpd_ctr_%d +Purpose: Handle Bluetooth work. +To reduce its OS jitter, do one of the following: +1. Don't use Bluetooth, in which case these kthreads won't be + created in the first place. +2. Use irq affinity to force Bluetooth-related interrupts to + occur on some other CPU and furthermore initiate all + Bluetooth activity on some other CPU. + +Name: ksoftirqd/%u +Purpose: Execute softirq handlers when threaded or when under heavy load. +To reduce its OS jitter, each softirq vector must be handled +separately as follows: +TIMER_SOFTIRQ: Do all of the following: +1. To the extent possible, keep the CPU out of the kernel when it + is non-idle, for example, by avoiding system calls and by forcing + both kernel threads and interrupts to execute elsewhere. +2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force + the CPU offline, then bring it back online. This forces + recurring timers to migrate elsewhere. If you are concerned + with multiple CPUs, force them all offline before bringing the + first one back online. Once you have onlined the CPUs in question, + do not offline any other CPUs, because doing so could force the + timer back onto one of the CPUs in question. +NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: +1. Force networking interrupts onto other CPUs. +2. Initiate any network I/O on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +BLOCK_SOFTIRQ: Do all of the following: +1. Force block-device interrupts onto some other CPU. +2. Initiate any block I/O on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +BLOCK_IOPOLL_SOFTIRQ: Do all of the following: +1. Force block-device interrupts onto some other CPU. +2. Initiate any block I/O and block-I/O polling on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +TASKLET_SOFTIRQ: Do one or more of the following: +1. Avoid use of drivers that use tasklets. (Such drivers will contain + calls to things like tasklet_schedule().) +2. Convert all drivers that you must use from tasklets to workqueues. +3. Force interrupts for drivers using tasklets onto other CPUs, + and also do I/O involving these drivers on other CPUs. +SCHED_SOFTIRQ: Do all of the following: +1. Avoid sending scheduler IPIs to the CPU to be de-jittered, + for example, ensure that at most one runnable kthread is present + on that CPU. If a thread that expects to run on the de-jittered + CPU awakens, the scheduler will send an IPI that can result in + a subsequent SCHED_SOFTIRQ. +2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, + CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU + to be de-jittered is marked as an adaptive-ticks CPU using the + "nohz_full=" boot parameter. This reduces the number of + scheduler-clock interrupts that the de-jittered CPU receives, + minimizing its chances of being selected to do the load balancing + work that runs in SCHED_SOFTIRQ context. +3. To the extent possible, keep the CPU out of the kernel when it + is non-idle, for example, by avoiding system calls and by + forcing both kernel threads and interrupts to execute elsewhere. + This further reduces the number of scheduler-clock interrupts + received by the de-jittered CPU. +HRTIMER_SOFTIRQ: Do all of the following: +1. To the extent possible, keep the CPU out of the kernel when it + is non-idle. For example, avoid system calls and force both + kernel threads and interrupts to execute elsewhere. +2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the + CPU offline, then bring it back online. This forces recurring + timers to migrate elsewhere. If you are concerned with multiple + CPUs, force them all offline before bringing the first one + back online. Once you have onlined the CPUs in question, do not + offline any other CPUs, because doing so could force the timer + back onto one of the CPUs in question. +RCU_SOFTIRQ: Do at least one of the following: +1. Offload callbacks and keep the CPU in either dyntick-idle or + adaptive-ticks state by doing all of the following: + a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, + CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU + to be de-jittered is marked as an adaptive-ticks CPU using + the "nohz_full=" boot parameter. Bind the rcuo kthreads + to housekeeping CPUs, which can tolerate OS jitter. + b. To the extent possible, keep the CPU out of the kernel + when it is non-idle, for example, by avoiding system + calls and by forcing both kernel threads and interrupts + to execute elsewhere. +2. Enable RCU to do its processing remotely via dyntick-idle by + doing all of the following: + a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. + b. Ensure that the CPU goes idle frequently, allowing other + CPUs to detect that it has passed through an RCU quiescent + state. If the kernel is built with CONFIG_NO_HZ_FULL=y, + userspace execution also allows other CPUs to detect that + the CPU in question has passed through a quiescent state. + c. To the extent possible, keep the CPU out of the kernel + when it is non-idle, for example, by avoiding system + calls and by forcing both kernel threads and interrupts + to execute elsewhere. + +Name: rcuc/%u +Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. +To reduce its OS jitter, do at least one of the following: +1. Build the kernel with CONFIG_PREEMPT=n. This prevents these + kthreads from being created in the first place, and also obviates + the need for RCU priority boosting. This approach is feasible + for workloads that do not require high degrees of responsiveness. +2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these + kthreads from being created in the first place. This approach + is feasible only if your workload never requires RCU priority + boosting, for example, if you ensure frequent idle time on all + CPUs that might execute within the kernel. +3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y, + which offloads all RCU callbacks to kthreads that can be moved + off of CPUs susceptible to OS jitter. This approach prevents the + rcuc/%u kthreads from having any work to do, so that they are + never awakened. +4. Ensure that the CPU never enters the kernel, and, in particular, + avoid initiating any CPU hotplug operations on this CPU. This is + another way of preventing any callbacks from being queued on the + CPU, again preventing the rcuc/%u kthreads from having any work + to do. + +Name: rcuob/%d, rcuop/%d, and rcuos/%d +Purpose: Offload RCU callbacks from the corresponding CPU. +To reduce its OS jitter, do at least one of the following: +1. Use affinity, cgroups, or other mechanism to force these kthreads + to execute on some other CPU. +2. Build with CONFIG_RCU_NOCB_CPUS=n, which will prevent these + kthreads from being created in the first place. However, please + note that this will not eliminate OS jitter, but will instead + shift it to RCU_SOFTIRQ. + +Name: watchdog/%u +Purpose: Detect software lockups on each CPU. +To reduce its OS jitter, do at least one of the following: +1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these + kthreads from being created in the first place. +2. Echo a zero to /proc/sys/kernel/watchdog to disable the + watchdog timer. +3. Echo a large number of /proc/sys/kernel/watchdog_thresh in + order to reduce the frequency of OS jitter due to the watchdog + timer down to a level that is acceptable for your workload. diff --git a/arch/Kconfig b/arch/Kconfig index dd0e8eb8042f746cc7072c0bf67d1fac8c0d4a7e..a4429bcd609ec8a3761e064d200d0646c3cae5f1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -213,6 +213,9 @@ config USE_GENERIC_SMP_HELPERS config GENERIC_SMP_IDLE_THREAD bool +config GENERIC_IDLE_POLL_SETUP + bool + # Select if arch init_task initializer is different to init/init_task.c config ARCH_INIT_TASK bool diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 8b86c0c68edf4458e894433a45ac8d5b80f673a0..d5585f5e038ee1080f17777792a6927d5897e5ea 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -40,11 +40,13 @@ __setup("hlt", cpu_idle_nopoll_setup); static inline int cpu_idle_poll(void) { + rcu_idle_enter(); trace_cpu_idle_rcuidle(0, smp_processor_id()); local_irq_enable(); while (!need_resched()) cpu_relax(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); + rcu_idle_exit(); return 1; }