1. 01 6月, 2021 2 次提交
  2. 26 5月, 2021 1 次提交
    • F
      sched: Stop PF_NO_SETAFFINITY from being inherited by various init system threads · a8ea6fc9
      Frederic Weisbecker 提交于
      Commit:
      
        00b89fe0 ("sched: Make the idle task quack like a per-CPU kthread")
      
      ... added PF_KTHREAD | PF_NO_SETAFFINITY to the idle kernel threads.
      
      Unfortunately these properties are inherited to the init/0 children
      through kernel_thread() calls: init/1 and kthreadd. There are several
      side effects to that:
      
      1) kthreadd affinity can not be reset anymore from userspace. Also
         PF_NO_SETAFFINITY propagates to all kthreadd children, including
         the unbound kthreads Therefore it's not possible anymore to overwrite
         the affinity of any of them. Here is an example of warning reported
         by rcutorture:
      
      		WARNING: CPU: 0 PID: 116 at kernel/rcu/tree_nocb.h:1306 rcu_bind_current_to_nocb+0x31/0x40
      		Call Trace:
      		 rcu_torture_fwd_prog+0x62/0x730
      		 kthread+0x122/0x140
      		 ret_from_fork+0x22/0x30
      
      2) init/1 does an exec() in the end which clears both
         PF_KTHREAD and PF_NO_SETAFFINITY so we are fine once kernel_init()
         escapes to userspace. But until then, no initcall or init code can
         successfully call sched_setaffinity() to init/1.
      
         Also PF_KTHREAD looks legit on init/1 before it calls exec() but
         we better be careful with unknown introduced side effects.
      
      One way to solve the PF_NO_SETAFFINITY issue is to not inherit this flag
      on copy_process() at all. The cases where it matters are:
      
      * fork_idle(): explicitly set the flag already.
      * fork() syscalls: userspace tasks that shouldn't be concerned by that.
      * create_io_thread(): the callers explicitly attribute the flag to the
                            newly created tasks.
      * kernel_thread():
      	- Fix the issues on init/1 and kthreadd
      	- Fix the issues on kthreadd children.
      	- Usermode helper created by an unbound workqueue. This shouldn't
      	  matter. In the worst case it gives more control to userspace
      	  on setting affinity to these short living tasks although this can
      	  be tuned with inherited unbound workqueues affinity already.
      
      Fixes: 00b89fe0 ("sched: Make the idle task quack like a per-CPU kthread")
      Reported-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Tested-by: NPaul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/r/20210525235849.441842-1-frederic@kernel.org
      a8ea6fc9
  3. 19 5月, 2021 4 次提交
  4. 18 5月, 2021 3 次提交
  5. 13 5月, 2021 6 次提交
    • P
      sched/isolation: Reconcile rcu_nocbs= and nohz_full= · 915a2bc3
      Paul Gortmaker 提交于
      We have a mismatch between RCU and isolation -- in relation to what is
      considered the maximum valid CPU number.
      
      This matters because nohz_full= and rcu_nocbs= are joined at the hip; in
      fact the former will enforce the latter.  So we don't want a CPU mask to
      be valid for one and denied for the other.
      
      The difference 1st appeared as of v4.15; further details are below.
      
      As it is confusing to anyone who isn't looking at the code regularly, a
      reminder is in order; three values exist here:
      
        CONFIG_NR_CPUS  - compiled in maximum cap on number of CPUs supported.
        nr_cpu_ids      - possible # of CPUs (typically reflects what ACPI says)
        cpus_present    - actual number of present/detected/installed CPUs.
      
      For this example, I'll refer to NR_CPUS=64 from "make defconfig" and
      nr_cpu_ids=6 for ACPI reporting on a board that could run a six core,
      and present=4 for a quad that is physically in the socket.  From dmesg:
      
       smpboot: Allowing 6 CPUs, 2 hotplug CPUs
       setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:6 nr_node_ids:1
       rcu: 	RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=6.
       smp: Brought up 1 node, 4 CPUs
      
      And from userspace, see:
      
         paul@trash:/sys/devices/system/cpu$ cat present
         0-3
         paul@trash:/sys/devices/system/cpu$ cat possible
         0-5
         paul@trash:/sys/devices/system/cpu$ cat kernel_max
         63
      
      Everything is fine if we boot 5x5 for rcu/nohz:
      
        Command line: BOOT_IMAGE=/boot/bzImage nohz_full=2-5 rcu_nocbs=2-5 root=/dev/sda1 ro
        NO_HZ: Full dynticks CPUs: 2-5.
        rcu: 	Offload RCU callbacks from CPUs: 2-5.
      
      ..even though there is no CPU 4 or 5.  Both RCU and nohz_full are OK.
      Now we push that > 6 but less than NR_CPU and with 15x15 we get:
      
        Command line: BOOT_IMAGE=/boot/bzImage rcu_nocbs=2-15 nohz_full=2-15 root=/dev/sda1 ro
        rcu: 	Note: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.
        rcu: 	Offload RCU callbacks from CPUs: 2-5.
      
      These are both functionally equivalent, as we are only changing flags on
      phantom CPUs that don't exist, but note the kernel interpretation changes.
      And worse, it only changes for one of the two - which is the problem.
      
      RCU doesn't care if you want to restrict the flags on phantom CPUs but
      clearly nohz_full does after this change from v4.15.
      
       edb93821: ("sched/isolation: Move isolcpus= handling to the housekeeping code")
      
       -       if (cpulist_parse(str, non_housekeeping_mask) < 0) {
       -               pr_warn("Housekeeping: Incorrect nohz_full cpumask\n");
       +       err = cpulist_parse(str, non_housekeeping_mask);
       +       if (err < 0 || cpumask_last(non_housekeeping_mask) >= nr_cpu_ids) {
       +               pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU range\n");
      
      To be clear, the sanity check on "possible" (nr_cpu_ids) is new here.
      
      The goal was reasonable ; not wanting housekeeping to land on a
      not-possible CPU, but note two things:
      
        1) this is an exclusion list, not an inclusion list; we are tracking
           non_housekeeping CPUs; not ones who are explicitly assigned housekeeping
      
        2) we went one further in 9219565a ("sched/isolation: Require a present CPU in housekeeping mask")
           - ensuring that housekeeping was sanity checking against present and not just possible CPUs.
      
      To be clear, this means the check added in v4.15 is doubly redundant.
      And more importantly, overly strict/restrictive.
      
      We care now, because the bitmap boot arg parsing now knows that a value
      of "N" is NR_CPUS; the size of the bitmap, but the bitmap code doesn't
      know anything about the subtleties of our max/possible/present CPU
      specifics as outlined above.
      
      So drop the check added in v4.15 (edb93821) and make RCU and
      nohz_full both in alignment again on NR_CPUS so "N" works for both,
      and then they can fall back to nr_cpu_ids internally just as before.
      
        Command line: BOOT_IMAGE=/boot/bzImage nohz_full=2-N rcu_nocbs=2-N root=/dev/sda1 ro
        NO_HZ: Full dynticks CPUs: 2-5.
        rcu: 	Offload RCU callbacks from CPUs: 2-5.
      
      As shown above, with this change, RCU and nohz_full are in sync, even
      with the use of the "N" placeholder.  Same result is achieved with "15".
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NPaul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/r/20210419042659.1134916-1-paul.gortmaker@windriver.com
      915a2bc3
    • A
      sched: Make multiple runqueue task counters 32-bit · e6fe3f42
      Alexey Dobriyan 提交于
      Make:
      
      	struct dl_rq::dl_nr_migratory
      	struct dl_rq::dl_nr_running
      
      	struct rt_rq::rt_nr_boosted
      	struct rt_rq::rt_nr_migratory
      	struct rt_rq::rt_nr_total
      
      	struct rq::nr_uninterruptible
      
      32-bit.
      
      If total number of tasks can't exceed 2**32 (and less due to futex pid
      limits), then per-runqueue counters can't as well.
      
      This patchset has been sponsored by REX Prefix Eradication Society.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20210422200228.1423391-4-adobriyan@gmail.com
      e6fe3f42
    • A
      sched: Make nr_iowait_cpu() return 32-bit value · 8fc2858e
      Alexey Dobriyan 提交于
      Runqueue ->nr_iowait counters are 32-bit anyway.
      
      Propagate 32-bitness into other code, but don't try too hard.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20210422200228.1423391-3-adobriyan@gmail.com
      8fc2858e
    • A
      sched: Make nr_iowait() return 32-bit value · 97455168
      Alexey Dobriyan 提交于
      Creating 2**32 tasks to wait in D-state is impossible and wasteful.
      
      Return "unsigned int" and save on REX prefixes.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20210422200228.1423391-2-adobriyan@gmail.com
      97455168
    • A
      sched: Make nr_running() return 32-bit value · 01aee8fd
      Alexey Dobriyan 提交于
      Creating 2**32 tasks is impossible due to futex pid limits and wasteful
      anyway. Nobody has done it.
      
      Bring nr_running() into 32-bit world to save on REX prefixes.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20210422200228.1423391-1-adobriyan@gmail.com
      01aee8fd
    • I
      sched: Fix leftover comment typos · cc00c198
      Ingo Molnar 提交于
      A few more snuck in. Also capitalize 'CPU' while at it.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cc00c198
  6. 12 5月, 2021 24 次提交