1. 22 9月, 2016 3 次提交
  2. 20 9月, 2016 1 次提交
  3. 10 9月, 2016 4 次提交
    • P
      Revert "sched/fair: Make update_min_vruntime() more readable" · de58af87
      Peter Zijlstra 提交于
      There's a bug in this commit:
      
         97a7142f ("sched/fair: Make update_min_vruntime() more readable")
      
      ... when !rb_leftmost && curr we fail to advance min_vruntime.
      
      So revert it.
      Reported-by: NByungchul Park <byungchul.park@lge.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      de58af87
    • A
      perf/core: Fix aux_mmap_count vs aux_refcount order · b79ccadd
      Alexander Shishkin 提交于
      The order of accesses to ring buffer's aux_mmap_count and aux_refcount
      has to be preserved across the users, namely perf_mmap_close() and
      perf_aux_output_begin(), otherwise the inversion can result in the latter
      holding the last reference to the aux buffer and subsequently free'ing
      it in atomic context, triggering a warning.
      
      > ------------[ cut here ]------------
      > WARNING: CPU: 0 PID: 257 at kernel/events/ring_buffer.c:541 __rb_free_aux+0x11a/0x130
      > CPU: 0 PID: 257 Comm: stopbug Not tainted 4.8.0-rc1+ #2596
      > Call Trace:
      >  [<ffffffff810f3e0b>] __warn+0xcb/0xf0
      >  [<ffffffff810f3f3d>] warn_slowpath_null+0x1d/0x20
      >  [<ffffffff8121182a>] __rb_free_aux+0x11a/0x130
      >  [<ffffffff812127a8>] rb_free_aux+0x18/0x20
      >  [<ffffffff81212913>] perf_aux_output_begin+0x163/0x1e0
      >  [<ffffffff8100c33a>] bts_event_start+0x3a/0xd0
      >  [<ffffffff8100c42d>] bts_event_add+0x5d/0x80
      >  [<ffffffff81203646>] event_sched_in.isra.104+0xf6/0x2f0
      >  [<ffffffff8120652e>] group_sched_in+0x6e/0x190
      >  [<ffffffff8120694e>] ctx_sched_in+0x2fe/0x5f0
      >  [<ffffffff81206ca0>] perf_event_sched_in+0x60/0x80
      >  [<ffffffff81206d1b>] ctx_resched+0x5b/0x90
      >  [<ffffffff81207281>] __perf_event_enable+0x1e1/0x240
      >  [<ffffffff81200639>] event_function+0xa9/0x180
      >  [<ffffffff81202000>] ? perf_cgroup_attach+0x70/0x70
      >  [<ffffffff8120203f>] remote_function+0x3f/0x50
      >  [<ffffffff811971f3>] flush_smp_call_function_queue+0x83/0x150
      >  [<ffffffff81197bd3>] generic_smp_call_function_single_interrupt+0x13/0x60
      >  [<ffffffff810a6477>] smp_call_function_single_interrupt+0x27/0x40
      >  [<ffffffff81a26ea9>] call_function_single_interrupt+0x89/0x90
      >  [<ffffffff81120056>] finish_task_switch+0xa6/0x210
      >  [<ffffffff81120017>] ? finish_task_switch+0x67/0x210
      >  [<ffffffff81a1e83d>] __schedule+0x3dd/0xb50
      >  [<ffffffff81a1efe5>] schedule+0x35/0x80
      >  [<ffffffff81128031>] sys_sched_yield+0x61/0x70
      >  [<ffffffff81a25be5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      > ---[ end trace 6235f556f5ea83a9 ]---
      
      This patch puts the checks in perf_aux_output_begin() in the same order
      as that of perf_mmap_close().
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-3-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b79ccadd
    • A
      perf/core: Fix a race between mmap_close() and set_output() of AUX events · 767ae086
      Alexander Shishkin 提交于
      In the mmap_close() path we need to stop all the AUX events that are
      writing data to the AUX area that we are unmapping, before we can
      safely free the pages. To determine if an event needs to be stopped,
      we're comparing its ->rb against the one that's getting unmapped.
      However, a SET_OUTPUT ioctl may turn up inside an AUX transaction
      and swizzle event::rb to some other ring buffer, but the transaction
      will keep writing data to the old ring buffer until the event gets
      scheduled out. At this point, mmap_close() will skip over such an
      event and will proceed to free the AUX area, while it's still being
      used by this event, which will set off a warning in the mmap_close()
      path and cause a memory corruption.
      
      To avoid this, always stop an AUX event before its ->rb is updated;
      this will release the (potentially) last reference on the AUX area
      of the buffer. If the event gets restarted, its new ring buffer will
      be used. If another SET_OUTPUT comes and switches it back to the
      old ring buffer that's getting unmapped, it's also fine: this
      ring buffer's aux_mmap_count will be zero and AUX transactions won't
      start any more.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      767ae086
    • D
      mm: fix cache mode of dax pmd mappings · 9049771f
      Dan Williams 提交于
      track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
      uncacheable rendering them impractical for application usage.  DAX-pte
      mappings are cached and the goal of establishing DAX-pmd mappings is to
      attain more performance, not dramatically less (3 orders of magnitude).
      
      track_pfn_insert() relies on a previous call to reserve_memtype() to
      establish the expected page_cache_mode for the range.  While memremap()
      arranges for reserve_memtype() to be called, devm_memremap_pages() does
      not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
      tracking without a vma, and arrange for devm_memremap_pages() to
      establish the write-back-cache reservation in the memtype tree.
      
      Cc: <stable@vger.kernel.org>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NToshi Kani <toshi.kani@hpe.com>
      Reported-by: NKai Zhang <kai.ka.zhang@oracle.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9049771f
  4. 05 9月, 2016 14 次提交
  5. 02 9月, 2016 6 次提交
    • W
      tick/nohz: Fix softlockup on scheduler stalls in kvm guest · 08d07259
      Wanpeng Li 提交于
      tick_nohz_start_idle() is prevented to be called if the idle tick can't 
      be stopped since commit 1f3b0f82 ("tick/nohz: Optimize nohz idle 
      enter"). As a result, after suspend/resume the host machine, full dynticks 
      kvm guest will softlockup:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]
       Call Trace:
        default_idle+0x31/0x1a0
        arch_cpu_idle+0xf/0x20
        default_idle_call+0x2a/0x50
        cpu_startup_entry+0x39b/0x4d0
        rest_init+0x138/0x140
        ? rest_init+0x5/0x140
        start_kernel+0x4c1/0x4ce
        ? set_init_arg+0x55/0x55
        ? early_idt_handler_array+0x120/0x120
        x86_64_start_reservations+0x24/0x26
        x86_64_start_kernel+0x142/0x14f
      
      In addition, cat /proc/stat | grep cpu in guest or host:
      
      cpu  398 16 5049 15754 5490 0 1 46 0 0
      cpu0 206 5 450 0 0 0 1 14 0 0
      cpu1 81 0 3937 3149 1514 0 0 9 0 0
      cpu2 45 6 332 6052 2243 0 0 11 0 0
      cpu3 65 2 328 6552 1732 0 0 11 0 0
      
      The idle and iowait states are weird 0 for cpu0(housekeeping). 
      
      The bug is present in both guest and host kernels, and they both have 
      cpu0's idle and iowait states issue, however, host kernel's suspend/resume 
      path etc will touch watchdog to avoid the softlockup.
      
      - The watchdog will not be touched in tick_nohz_stop_idle path (need be 
        touched since the scheduler stall is expected) if idle_active flags are 
        not detected.
      - The idle and iowait states will not be accounted when exit idle loop 
        (resched or interrupt) if idle start time and idle_active flags are 
        not set. 
      
      This patch fixes it by reverting commit 1f3b0f82 since can't stop 
      idle tick doesn't mean can't be idle.
      
      Fixes: 1f3b0f82 ("tick/nohz: Optimize nohz idle enter")
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com>
      Cc: Gaurav Jindal<gaurav.jindal@spreadtrum.com>
      Cc: stable@vger.kernel.org
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      08d07259
    • M
      kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd · 735f2770
      Michal Hocko 提交于
      Commit fec1d011 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
      exit") has caused a subtle regression in nscd which uses
      CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
      shared databases, so that the clients are notified when nscd is
      restarted.  Now, when nscd uses a non-persistent database, clients that
      have it mapped keep thinking the database is being updated by nscd, when
      in fact nscd has created a new (anonymous) one (for non-persistent
      databases it uses an unlinked file as backend).
      
      The original proposal for the CLONE_CHILD_CLEARTID change claimed
      (https://lkml.org/lkml/2006/10/25/233):
      
      : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
      : on behalf of pthread_create() library calls.  This feature is used to
      : request that the kernel clear the thread-id in user space (at an address
      : provided in the syscall) when the thread disassociates itself from the
      : address space, which is done in mm_release().
      :
      : Unfortunately, when a multi-threaded process incurs a core dump (such as
      : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
      : the other threads, which then proceed to clear their user-space tids
      : before synchronizing in exit_mm() with the start of core dumping.  This
      : misrepresents the state of process's address space at the time of the
      : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
      : problems (misleading him/her to conclude that the threads had gone away
      : before the fault).
      :
      : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
      : core dump has been initiated.
      
      The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
      seems to have a larger scope than the original patch asked for.  It
      seems that limitting the scope of the check to core dumping should work
      for SIGSEGV issue describe above.
      
      [Changelog partly based on Andreas' description]
      Fixes: fec1d011 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
      Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NWilliam Preston <wpreston@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Andreas Schwab <schwab@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      735f2770
    • D
      mm, mempolicy: task->mempolicy must be NULL before dropping final reference · c11600e4
      David Rientjes 提交于
      KASAN allocates memory from the page allocator as part of
      kmem_cache_free(), and that can reference current->mempolicy through any
      number of allocation functions.  It needs to be NULL'd out before the
      final reference is dropped to prevent a use-after-free bug:
      
      	BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c
      	CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
      	...
      	Call Trace:
      		dump_stack
      		kasan_object_err
      		kasan_report_error
      		__asan_report_load2_noabort
      		alloc_pages_current	<-- use after free
      		depot_save_stack
      		save_stack
      		kasan_slab_free
      		kmem_cache_free
      		__mpol_put		<-- free
      		do_exit
      
      This patch sets current->mempolicy to NULL before dropping the final
      reference.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com
      Fixes: cd11016e ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>	[4.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c11600e4
    • S
      printk/nmi: avoid direct printk()-s from __printk_nmi_flush() · 19feeff1
      Sergey Senozhatsky 提交于
      __printk_nmi_flush() can be called from nmi_panic(), therefore it has to
      test whether it's executed in NMI context and thus must route the
      messages through deferred printk() or via direct printk().
      
      This is to avoid potential deadlocks, as described in commit
      cf9b1106 ("printk/nmi: flush NMI messages on the system panic").
      
      However there remain two places where __printk_nmi_flush() does
      unconditional direct printk() calls:
      
       - pr_err("printk_nmi_flush: internal error ...")
       - pr_cont("\n")
      
      Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line()
      function, which takes care of in_nmi(), and use it in
      __printk_nmi_flush() for printing and error-reporting.
      
      Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19feeff1
    • A
      kconfig: tinyconfig: provide whole choice blocks to avoid warnings · 236dec05
      Arnd Bergmann 提交于
      Using "make tinyconfig" produces a couple of annoying warnings that show
      up for build test machines all the time:
      
          .config:966:warning: override: NOHIGHMEM changes choice state
          .config:965:warning: override: SLOB changes choice state
          .config:963:warning: override: KERNEL_XZ changes choice state
          .config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
          .config:933:warning: override: SLOB changes choice state
          .config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
          .config:870:warning: override: SLOB changes choice state
          .config:868:warning: override: KERNEL_XZ changes choice state
          .config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
      
      I've made a previous attempt at fixing them and we discussed a number of
      alternatives.
      
      I tried changing the Makefile to use "merge_config.sh -n
      $(fragment-list)" but couldn't get that to work properly.
      
      This is yet another approach, based on the observation that we do want
      to see a warning for conflicting 'choice' options, and that we can
      simply make them non-conflicting by listing all other options as
      disabled.  This is a trivial patch that we can apply independent of
      plans for other changes.
      
      Link: http://lkml.kernel.org/r/20160829214952.1334674-2-arnd@arndb.de
      Link: https://storage.kernelci.org/mainline/v4.7-rc6/x86-tinyconfig/build.log
      https://patchwork.kernel.org/patch/9212749/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      236dec05
    • T
      kexec: fix double-free when failing to relocate the purgatory · 070c43ee
      Thiago Jung Bauermann 提交于
      If kexec_apply_relocations fails, kexec_load_purgatory frees pi->sechdrs
      and pi->purgatory_buf.  This is redundant, because in case of error
      kimage_file_prepare_segments calls kimage_file_post_load_cleanup, which
      will also free those buffers.
      
      This causes two warnings like the following, one for pi->sechdrs and the
      other for pi->purgatory_buf:
      
        kexec-bzImage64: Loading purgatory failed
        ------------[ cut here ]------------
        WARNING: CPU: 1 PID: 2119 at mm/vmalloc.c:1490 __vunmap+0xc1/0xd0
        Trying to vfree() nonexistent vm area (ffffc90000e91000)
        Modules linked in:
        CPU: 1 PID: 2119 Comm: kexec Not tainted 4.8.0-rc3+ #5
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
          dump_stack+0x4d/0x65
          __warn+0xcb/0xf0
          warn_slowpath_fmt+0x4f/0x60
          ? find_vmap_area+0x19/0x70
          ? kimage_file_post_load_cleanup+0x47/0xb0
          __vunmap+0xc1/0xd0
          vfree+0x2e/0x70
          kimage_file_post_load_cleanup+0x5e/0xb0
          SyS_kexec_file_load+0x448/0x680
          ? putname+0x54/0x60
          ? do_sys_open+0x190/0x1f0
          entry_SYSCALL_64_fastpath+0x13/0x8f
        ---[ end trace 158bb74f5950ca2b ]---
      
      Fix by setting pi->sechdrs an pi->purgatory_buf to NULL, since vfree
      won't try to free a NULL pointer.
      
      Link: http://lkml.kernel.org/r/1472083546-23683-1-git-send-email-bauerman@linux.vnet.ibm.comSigned-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      070c43ee
  6. 01 9月, 2016 2 次提交
  7. 31 8月, 2016 1 次提交
  8. 27 8月, 2016 2 次提交
  9. 24 8月, 2016 3 次提交
    • W
      perf/core: Use this_cpu_ptr() when stopping AUX events · 8b6a3fe8
      Will Deacon 提交于
      When tearing down an AUX buf for an event via perf_mmap_close(),
      __perf_event_output_stop() is called on the event's CPU to ensure that
      trace generation is halted before the process of unmapping and
      freeing the buffer pages begins.
      
      The callback is performed via cpu_function_call(), which ensures that it
      runs with interrupts disabled and is therefore not preemptible.
      Unfortunately, the current code grabs the per-cpu context pointer using
      get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair
      the call with put_cpu_ptr(), leading to a preempt_count() imbalance and
      a BUG when freeing the AUX buffer later on:
      
        WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120
        Modules linked in:
        [...]
        Call Trace:
         [<ffffffff813379dd>] dump_stack+0x4f/0x72
         [<ffffffff81059ff6>] __warn+0xc6/0xe0
         [<ffffffff8105a0c8>] warn_slowpath_null+0x18/0x20
         [<ffffffff8112761c>] __rb_free_aux+0x10c/0x120
         [<ffffffff81128163>] rb_free_aux+0x13/0x20
         [<ffffffff8112515e>] perf_mmap_close+0x29e/0x2f0
         [<ffffffff8111da30>] ? perf_iterate_ctx+0xe0/0xe0
         [<ffffffff8115f685>] remove_vma+0x25/0x60
         [<ffffffff81161796>] exit_mmap+0x106/0x140
         [<ffffffff8105725c>] mmput+0x1c/0xd0
         [<ffffffff8105cac3>] do_exit+0x253/0xbf0
         [<ffffffff8105e32e>] do_group_exit+0x3e/0xb0
         [<ffffffff81068d49>] get_signal+0x249/0x640
         [<ffffffff8101c273>] do_signal+0x23/0x640
         [<ffffffff81905f42>] ? _raw_write_unlock_irq+0x12/0x30
         [<ffffffff81905f69>] ? _raw_spin_unlock_irq+0x9/0x10
         [<ffffffff81901896>] ? __schedule+0x2c6/0x710
         [<ffffffff810022a4>] exit_to_usermode_loop+0x74/0x90
         [<ffffffff81002a56>] prepare_exit_to_usermode+0x26/0x30
         [<ffffffff81906d1b>] retint_user+0x8/0x10
      
      This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is
      already disabled by the caller.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 95ff4ca2 ("perf/core: Free AUX pages in unmap path")
      Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8b6a3fe8
    • J
      timekeeping: Cap array access in timekeeping_debug · a4f8f666
      John Stultz 提交于
      It was reported that hibernation could fail on the 2nd attempt, where the
      system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
      claim_dma_lock(), because the lock has already been taken.
      
      However there is actually no other process would like to grab this lock on
      that problematic platform.
      
      Further investigation showed that the problem is triggered by setting
      /sys/power/pm_trace to 1 before the 1st hibernation.
      
      Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
      and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
      than 1970) to the release date of that motherboard during POST stage, thus
      after resumed, it may seem that the system had a significant long sleep time
      which is a completely meaningless value.
      
      Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
      sleep time happened to be set to 1, fls() returns 32 and we add 1 to
      sleep_time_bin[32], which causes an out of bounds array access and therefor
      memory being overwritten.
      
      As depicted by System.map:
      0xffffffff81c9d080 b sleep_time_bin
      0xffffffff81c9d100 B dma_spin_lock
      the dma_spin_lock.val is set to 1, which caused this problem.
      
      This patch adds a sanity check in tk_debug_account_sleep_time()
      to ensure we don't index past the sleep_time_bin array.
      
      [jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
       issue slightly differently, but borrowed his excelent explanation of the
       issue here.]
      
      Fixes: 5c83545f "power: Add option to log time spent in suspend"
      Reported-by: NJanek Kozicki <cosurgi@gmail.com>
      Reported-by: NChen Yu <yu.c.chen@intel.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: linux-pm@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xunlei Pang <xpang@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: stable <stable@vger.kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a4f8f666
    • J
      timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING · 27727df2
      John Stultz 提交于
      When I added some extra sanity checking in timekeeping_get_ns() under
      CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
      method was using timekeeping_get_ns().
      
      Thus the locking added to the debug checks broke the NMI-safety of
      __ktime_get_fast_ns().
      
      This patch open-codes the timekeeping_get_ns() logic for
      __ktime_get_fast_ns(), so can avoid any deadlocks in NMI.
      
      Fixes: 4ca22c26 "timekeeping: Add warnings when overflows or underflows are observed"
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: stable <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      27727df2
  10. 22 8月, 2016 2 次提交
  11. 18 8月, 2016 2 次提交
    • S
      sched/cputime: Improve scalability by not accounting thread group tasks pending runtime · a1eb1411
      Stanislaw Gruszka 提交于
      Commit:
      
        d670ec13 ("posix-cpu-timers: Cure SMP wobbles")
      
      started accounting thread group tasks pending runtime in thread_group_cputime().
      
      Another commit:
      
        6e998916 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
      
      updated scheduler runtime statistics (call update_curr()) when reading task pending
      runtime. Those changes cause bad performance of SYS_times() and
      SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls, especially on
      larger systems with many CPUs.
      
      While we would like to have cpuclock monotonicity kept i.e. have
      problems fixed by above commits stay fixed, we also would like to have
      good performance.
      
      However when we notice that change from commit d670ec13 is not
      longer needed to solve problem addressed by that commit, because of
      change from the second commit 6e998916, we can get room for
      optimization. Since we update task while reading it's pending runtime
      in task_sched_runtime(), clock_gettime(CLOCK_PROCESS_CPUTIME_ID) will
      see updated values and on testcase from d670ec13 process cpuclock
      will not be smaller than thread cpuclock.
      
      I tested the patch on testcases from commits d670ec13,
      6e998916 and some other cpuclock/cputimers testcases and
      did not found cpuclock monotonicity problems or other malfunction.
      
      This patch has the drawback that we will not provide thread group cputime
      up-to-date to the last moment. For example when arming cputime timer,
      we will arm it with possibly a bit outdated values and that timer will
      trigger earlier compared to behaviour without the patch. However that
      was the behaviour before d670ec13 commit (kernel v3.1) so it's
      unlikely to affect applications.
      
      Patch improves related syscall performance, as measured by Giovanni's
      benchmarks described in commit:
      
        6075620b ("sched/cputime: Mitigate performance regression in times()/clock_gettime()")
      
      The benchmark results are:
      
      SYS_clock_gettime():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
        5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
        8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
        12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
        21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
        30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
        48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
        79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
        110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
        128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)
      
      SYS_times():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
        5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
        8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
        12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
        21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
        30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
        48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
        79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
        110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
        128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)
      Reported-and-tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/20160817093043.GA25206@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1eb1411
    • M
      sched/fair: Let asymmetric CPU configurations balance at wake-up · 3273163c
      Morten Rasmussen 提交于
      Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
      SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
      configurations SD_WAKE_AFFINE is only desirable if the waking task's
      compute demand (utilization) is suitable for the waking CPU and the
      previous CPU, and all CPUs within their respective
      SD_SHARE_PKG_RESOURCES domains (sd_llc). If not, let wakeup balancing
      take over (find_idlest_{group, cpu}()).
      
      This patch makes affine wake-ups conditional on whether both the waker
      CPU and the previous CPU has sufficient capacity for the waking task,
      or not, assuming that the CPU capacities within an SD_SHARE_PKG_RESOURCES
      domain (sd_llc) are homogeneous.
      Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1469453670-2660-10-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3273163c