1. 12 5月, 2016 6 次提交
    • P
      sched/fair: Fix fairness issue on migration · 2f950354
      Peter Zijlstra 提交于
      Pavan reported that in the presence of very light tasks (or cgroups)
      the placement of migrated tasks can cause severe fairness issues.
      
      The problem is that enqueue_entity() places the task before it updates
      time, thereby it can place the task far in the past (remember that
      light tasks will shoot virtual time forward at a high speed, so in
      relation to the pre-existing light task, we can land far in the past).
      
      This is done because update_curr() needs the current task, and we
      might be placing the current task.
      
      The obvious solution is to differentiate between the current and any
      other task; placing the current before we update time, and placing any
      other task after, such that !curr tasks end up at the current moment
      in time, and not in the past.
      
      This commit re-introduces the previously reverted commit:
      
        3a47d512 ("sched/fair: Fix fairness issue on migration")
      
      ... which is now safe to do, after we've also fixed another
      underlying bug first, in:
      
        sched/fair: Prepare to fix fairness problems on migration
      
      and cleaned up other details in the migration code:
      
        sched/core: Kill sched_class::task_waking
      Reported-by: NPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2f950354
    • P
      sched/core: Kill sched_class::task_waking to clean up the migration logic · 59efa0ba
      Peter Zijlstra 提交于
      With sched_class::task_waking being called only when we do
      set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
      and eliminate sched_class::task_waking entirely.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59efa0ba
    • P
      sched/fair: Prepare to fix fairness problems on migration · b5179ac7
      Peter Zijlstra 提交于
      Mike reported that our recent attempt to fix migration problems:
      
        3a47d512 ("sched/fair: Fix fairness issue on migration")
      
      broke interactivity and the signal starve test. We reverted that
      commit and now let's try it again more carefully, with some other
      underlying problems fixed first.
      
      One problem is that I assumed ENQUEUE_WAKING was only set when we do a
      cross-cpu wakeup (migration), which isn't true. This means we now
      destroy the vruntime history of tasks and wakeup-preemption suffers.
      
      Cure this by making my assumption true, only call
      sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
      the indirect call in the case we do a local wakeup.
      Reported-by: NMike Galbraith <mgalbraith@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Fixes: 3a47d512 ("sched/fair: Fix fairness issue on migration")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b5179ac7
    • P
      sched/fair: Move record_wakee() · c58d25f3
      Peter Zijlstra 提交于
      Since I want to make ->task_woken() conditional on the task getting
      migrated, we cannot use it to call record_wakee().
      
      Move it to select_task_rq_fair(), which gets called in almost all the
      same conditions. The only exception is if the woken task (@p) is
      CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c58d25f3
    • I
      Merge branch 'smp/hotplug' into sched/core, to resolve conflicts · 4eb86765
      Ingo Molnar 提交于
      Conflicts:
      	kernel/sched/core.c
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4eb86765
    • I
      eb60b3e5
  2. 11 5月, 2016 4 次提交
  3. 10 5月, 2016 10 次提交
    • X
      sched/rt, sched/dl: Don't push if task's scheduling class was changed · 13b5ab02
      Xunlei Pang 提交于
      We got this warning:
      
          WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
          [...]
          Call Trace:
      
          dump_stack+0x63/0x87
          __warn+0xd1/0xf0
          warn_slowpath_null+0x1d/0x20
          set_task_cpu+0x1af/0x1c0
          push_dl_task.part.34+0xea/0x180
          push_dl_tasks+0x17/0x30
          __balance_callback+0x45/0x5c
          __sched_setscheduler+0x906/0xb90
          SyS_sched_setattr+0x150/0x190
          do_syscall_64+0x62/0x110
          entry_SYSCALL64_slow_path+0x25/0x25
      
      This corresponds to:
      
          WARN_ON_ONCE(p->state == TASK_RUNNING &&
                   p->sched_class == &fair_sched_class &&
                   (p->on_rq && !task_on_rq_migrating(p)))
      
      It happens because in find_lock_later_rq(), the task whose scheduling
      class was changed to fair class is still pushed away as if it were
      a deadline task ...
      
      So, check in find_lock_later_rq() after double_lock_balance(), if the
      scheduling class of the deadline task was changed, break and retry.
      
      Apply the same logic to RT tasks.
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13b5ab02
    • T
      x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n · 8d415ee2
      Thomas Gleixner 提交于
      Josef reported that the uncore driver trips over with CONFIG_SMP=n because
      x86_max_cores is 16 instead of 12.
      
      The reason is, that for SMP=n the extended topology detection is a NOOP and
      the cache leaf is used to determine the number of cores. That's wrong in two
      aspects:
      
      1) The cache leaf enumerates the maximum addressable number of cores in the
         package, which is obviously not correct
      
      2) UP has no business with topology bits at all.
      
      Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n
      Reported-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team <Kernel-team@fb.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com
      8d415ee2
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2d0bd953
      Linus Torvalds 提交于
      Pull libnvdimm build fix from Dan Williams:
       "A build fix for the usage of HPAGE_SIZE in the last libnvdimm pull
        request.
      
        I have taken note that the kbuild robot build success test does not
        include results for alpha_allmodconfig.  Thanks to Guenter for the
        report.  It's tagged for -stable since the original fix will land
        there and cause build problems"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure
      2d0bd953
    • A
      perf/core: Change the default paranoia level to 2 · 0161028b
      Andy Lutomirski 提交于
      Allowing unprivileged kernel profiling lets any user dump follow kernel
      control flow and dump kernel registers.  This most likely allows trivial
      kASLR bypassing, and it may allow other mischief as well.  (Off the top
      of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads
      could be quite interesting.)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0161028b
    • L
      Merge branch 'akpm' (patches from Andrew) · 5c56b563
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "2 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        zsmalloc: fix zs_can_compact() integer overflow
        Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""
      5c56b563
    • S
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky 提交于
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99
    • R
      Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan"" · 1e92a61c
      Robin Humble 提交于
      This reverts the 4.6-rc1 commit 7e2bc81d ("proc/base: make prompt
      shell start from new line after executing "cat /proc/$pid/wchan")
      because it breaks /proc/$PID/whcan formatting in ps and top.
      
      Revert also because the patch is inconsistent - it adds a newline at the
      end of only the '0' wchan, and does not add a newline when
      /proc/$PID/wchan contains a symbol name.
      
      eg.
      $ ps -eo pid,stat,wchan,comm
      PID STAT WCHAN  COMMAND
      ...
      1189 S    -      dbus-launch
      1190 Ssl  0
      dbus-daemon
      1198 Sl   0
      lightdm
      1299 Ss   ep_pol systemd
      1301 S    -      (sd-pam)
      1304 Ss   wait   sh
      Signed-off-by: NRobin Humble <plaguedbypenguins@gmail.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e92a61c
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · b507146b
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
       "This fixes the following issues:
      
         - bug in ahash SG list walking that may lead to crashes
      
         - resource leak in qat
      
         - missing RSA dependency that causes it to fail"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: rsa - select crypto mgr dependency
        crypto: hash - Fix page length clamping in hash walk
        crypto: qat - fix adf_ctl_drv.c:undefined reference to adf_init_pf_wq
        crypto: qat - fix invalid pf2vf_resp_wq logic
      b507146b
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26acc792
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Check klogctl failure correctly, from Colin Ian King.
      
       2) Prevent OOM when under memory pressure in flowcache, from Steffen
          Klassert.
      
       3) Fix info leak in llc and rtnetlink ifmap code, from Kangjie Lu.
      
       4) Memory barrier and multicast handling fixes in bnxt_en, from Michael
          Chan.
      
       5) Endianness bug in mlx5, from Daniel Jurgens.
      
       6) Fix disconnect handling in VSOCK, from Ian Campbell.
      
       7) Fix locking of netdev list walking in get_bridge_ifindices(), from
          Nikolay Aleksandrov.
      
       8) Bridge multicast MLD parser can look at wrong packet offsets, fix
          from Linus Lüssing.
      
       9) Fix chip hang in qede driver, from Sudarsana Reddy Kalluru.
      
      10) Fix missing setting of encapsulation before inner handling completes
          in udp_offload code, from Jarno Rajahalme.
      
      11) Missing rollbacks during LAG join and flood configuration failures
          in mlxsw driver, from Ido Schimmel.
      
      12) Fix error code checks in netxen driver, from Dan Carpenter.
      
      13) Fix key size in new macsec driver, from Sabrina Dubroca.
      
      14) Fix mlx5/VXLAN dependencies, from Arnd Bergmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
        net/mlx5e: make VXLAN support conditional
        Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue"
        macsec: key identifier is 128 bits, not 64
        Documentation/networking: more accurate LCO explanation
        macvtap: segmented packet is consumed
        tools: bpf_jit_disasm: check for klogctl failure
        qede: uninitialized variable in qede_start_xmit()
        netxen: netxen_rom_fast_read() doesn't return -1
        netxen: reversed condition in netxen_nic_set_link_parameters()
        netxen: fix error handling in netxen_get_flash_block()
        mlxsw: spectrum: Add missing rollback in flood configuration
        mlxsw: spectrum: Fix rollback order in LAG join failure
        udp_offload: Set encapsulation before inner completes.
        udp_tunnel: Remove redundant udp_tunnel_gro_complete().
        qede: prevent chip hang when increasing channels
        net: ipv6: tcp reset, icmp need to consider L3 domain
        bridge: fix igmp / mld query parsing
        net: bridge: fix old ioctl unlocked net device walk
        VSOCK: do not disconnect socket when peer has shutdown SEND only
        net/mlx4_en: Fix endianness bug in IPV6 csum calculation
        ...
      26acc792
    • J
      compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16() · 8634de6d
      Josh Poimboeuf 提交于
      gcc support for __builtin_bswap16() was supposedly added for powerpc in
      gcc 4.6, and was then later added for other architectures in gcc 4.8.
      
      However, Stephen Rothwell reported that attempting to use it on powerpc
      in gcc 4.6 fails with:
      
        lib/vsprintf.c:160:2: error: initializer element is not constant
        lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]')
        lib/vsprintf.c:160:2: error: initializer element is not constant
        lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]')
        ...
      
      I'm not entirely sure what those errors mean, but I don't see them on
      gcc 4.8.  So let's consider gcc 4.8 to be the official starting point
      for __builtin_bswap16().
      
      Arnd Bergmann adds:
       "I found the commit in gcc-4.8 that replaced the powerpc-specific
        implementation of __builtin_bswap16 with an architecture-independent
        one.  Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to
        the lhbrx/sthbrx instructions, so it ended up not being a constant,
        though the intent of the patch was mainly to add support for the
        builtin to x86:
      
          https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624
      
        has the patch that went into gcc-4.8 and more information."
      
      Fixes: 7322dd75 ("byteswap: try to avoid __builtin_constant_p gcc bug")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Tested-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8634de6d
  4. 09 5月, 2016 11 次提交
  5. 08 5月, 2016 7 次提交
  6. 07 5月, 2016 2 次提交