1. 14 9月, 2016 26 次提交
  2. 10 9月, 2016 7 次提交
    • P
      perf/x86/intel: Fix PEBSv3 record drain · 8ef9b845
      Peter Zijlstra 提交于
      Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
      the perf fuzzer.
      
      This means the PEBSv3 record included a status bit for an inactive
      event, something that _should_ not happen.
      
      Move the code that filters the status bits against our known PEBS
      events up a spot to guarantee we only deal with events we know about.
      
      Further add "continue" statements to the WARN_ON_ONCE()s such that
      we'll not die nor generate silly events in case we ever do hit them
      again.
      Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: stable@vger.kernel.org
      Fixes: a3d86542 ("perf/x86/intel/pebs: Add PEBSv3 decoding")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8ef9b845
    • A
      perf/x86/intel/bts: Kill a silly warning · ef9ef3be
      Alexander Shishkin 提交于
      At the moment, intel_bts will WARN() out if there is more than one
      event writing to the same ring buffer, via SET_OUTPUT, and will only
      send data from one event to a buffer.
      
      There is no reason to have this warning in, so kill it.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef9ef3be
    • A
      perf/x86/intel/bts: Fix BTS PMI detection · 4d4c4741
      Alexander Shishkin 提交于
      Since BTS doesn't have a dedicated PMI status bit, the driver needs to
      take extra care to check for the condition that triggers it to avoid
      spurious NMI warnings.
      
      Regardless of the local BTS context state, the only way of knowing that
      the NMI is ours is to compare the write pointer against the interrupt
      threshold.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d4c4741
    • A
      perf/x86/intel/bts: Fix confused ordering of PMU callbacks · a9a94401
      Alexander Shishkin 提交于
      The intel_bts driver is using a CPU-local 'started' variable to order
      callbacks and PMIs and make sure that AUX transactions don't get messed
      up. However, the ordering rules in regard to this variable is a complete
      mess, which recently resulted in perf_fuzzer-triggered warnings and
      panics.
      
      The general ordering rule that is patch is enforcing is that this
      cpu-local variable be set only when the cpu-local AUX transaction is
      active; consequently, this variable is to be checked before the AUX
      related bits can be touched.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a9a94401
    • D
      mm: fix cache mode of dax pmd mappings · 9049771f
      Dan Williams 提交于
      track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
      uncacheable rendering them impractical for application usage.  DAX-pte
      mappings are cached and the goal of establishing DAX-pmd mappings is to
      attain more performance, not dramatically less (3 orders of magnitude).
      
      track_pfn_insert() relies on a previous call to reserve_memtype() to
      establish the expected page_cache_mode for the range.  While memremap()
      arranges for reserve_memtype() to be called, devm_memremap_pages() does
      not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
      tracking without a vma, and arrange for devm_memremap_pages() to
      establish the write-back-cache reservation in the memtype tree.
      
      Cc: <stable@vger.kernel.org>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NToshi Kani <toshi.kani@hpe.com>
      Reported-by: NKai Zhang <kai.ka.zhang@oracle.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9049771f
    • A
      alpha: fix copy_from_user() · 2561d309
      Al Viro 提交于
      it should clear the destination even when access_ok() fails.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2561d309
    • S
      perf/x86/amd/uncore: Prevent use after free · 7d762e49
      Sebastian Andrzej Siewior 提交于
      The resent conversion of the cpu hotplug support in the uncore driver
      introduced a regression due to the way the callbacks are invoked at
      initialization time.
      
      The old code called the prepare/starting/online function on each online cpu
      as a block. The new code registers the hotplug callbacks in the core for
      each state. The core invokes the callbacks at each registration on all
      online cpus.
      
      The code implicitely relied on the prepare/starting/online callbacks being
      called as combo on a particular cpu, which was not obvious and completely
      undocumented.
      
      The resulting subtle wreckage happens due to the way how the uncore code
      manages shared data structures for cpus which share an uncore resource in
      hardware. The sharing is determined in the cpu starting callback, but the
      prepare callback allocates per cpu data for the upcoming cpu because
      potential sharing is unknown at this point. If the starting callback finds
      a online cpu which shares the hardware resource it takes a refcount on the
      percpu data of that cpu and puts the own data structure into a
      'free_at_online' pointer of that shared data structure. The online callback
      frees that.
      
      With the old model this worked because in a starting callback only one non
      unused structure (the one of the starting cpu) was available. The new code
      allocates the data structures for all cpus when the prepare callback is
      registered.
      
      Now the starting function iterates through all online cpus and looks for a
      data structure (skipping its own) which has a matching hardware id. The id
      member of the data structure is initialized to 0, but the hardware id can
      be 0 as well. The resulting wreckage is:
      
        CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
        its own data structure into CPU1s data structure to be freed.
      
        CPU1 skips CPU0 because the data structure is its allegedly unsued own.
        It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
        its own data structure into CPU2s data structure to be freed.
      
        ....
      
      Now the online callbacks are invoked.
      
        CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
        far so good.
      
        CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
        is still referenced by CPU0 ---> Booom
      
      So there are two issues to be solved here:
      
      1) The id field must be initialized at allocation time to a value which
         cannot be a valid hardware id, i.e. -1
      
         This prevents the above scenario, but now CPU1 and CPU2 both stick their
         own data structure into the free_at_online pointer of CPU0. So we leak
         CPU1s data structure.
      
      2) Fix the memory leak described in #1
      
         Instead of having a single pointer, use a hlist to enqueue the
         superflous data structures which are then freed by the first cpu
         invoking the online callback.
      
      Ideally we should know the sharing _before_ invoking the prepare callback,
      but that's way beyond the scope of this bug fix.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: 96b2bd38 ("perf/x86/amd/uncore: Convert to hotplug state machine")
      Reported-and-tested-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7d762e49
  3. 09 9月, 2016 3 次提交
    • C
      arm64: use preempt_disable_notrace in _percpu_read/write · 2b974344
      Chunyan Zhang 提交于
      When debug preempt or preempt tracer is enabled, preempt_count_add/sub()
      can be traced by function and function graph tracing, and
      preempt_disable/enable() would call preempt_count_add/sub(), so in Ftrace
      subsystem we should use preempt_disable/enable_notrace instead.
      
      In the commit 345ddcc8 ("ftrace: Have set_ftrace_pid use the bitmap
      like events do") the function this_cpu_read() was added to
      trace_graph_entry(), and if this_cpu_read() calls preempt_disable(), graph
      tracer will go into a recursive loop, even if the tracing_on is
      disabled.
      
      So this patch change to use preempt_enable/disable_notrace instead in
      this_cpu_read().
      
      Since Yonghui Yang helped a lot to find the root cause of this problem,
      so also add his SOB.
      Signed-off-by: NYonghui Yang <mark.yang@spreadtrum.com>
      Signed-off-by: NChunyan Zhang <zhang.chunyan@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2b974344
    • W
      arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb() · 872c63fb
      Will Deacon 提交于
      smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
      to a full barrier, such that prior stores are ordered with respect to
      loads and stores occuring inside the critical section.
      
      Unfortunately, the core code defines the barrier as smp_wmb(), which
      is insufficient to provide the required ordering guarantees when used in
      conjunction with our load-acquire-based spinlock implementation.
      
      This patch overrides the arm64 definition of smp_mb__before_spinlock()
      to map to a full smp_mb().
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reported-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      872c63fb
    • S
      kvm-arm: Unmap shadow pagetables properly · 293f2936
      Suzuki K Poulose 提交于
      On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via
      mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when
      the userspace buffer gets unmapped. However, when the Hypervisor
      process exits without explicit unmap of the guest buffers, the only
      notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release
      ) which does nothing on arm. Later this causes us to access pages that
      were already released [via exit_mmap() -> unmap_vmas()] when we actually
      get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() ->
      kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC,
      which unmaps any free'd pages from the linear map.
      
       [  757.644120] Unable to handle kernel paging request at virtual address
        ffff800661e00000
       [  757.652046] pgd = ffff20000b1a2000
       [  757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003,
        *pmd=00000047fcc7c003, *pte=00e8004661e00712
       [  757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP
       [  757.672041] Modules linked in:
       [  757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G      D
       4.8.0-rc1 #3
       [  757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board,
        BIOS 3.06.15 Aug 19 2016
       [  757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000
       [  757.698840] PC is at __flush_dcache_area+0x1c/0x40
       [  757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70
       [  757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145
       ...
       [  758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40
       [  758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0
       [  758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60
       [  758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68
       [  758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358
       [  758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40
       [  758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8
       [  758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18
       [  758.400869] [<ffff200008104658>] task_work_run+0x108/0x138
       [  758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8
       [  758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130
       [  758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18
       [  758.421943] [<ffff20000808a098>] do_signal+0x158/0x860
       [  758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88
       [  758.432608] [<ffff200008083624>] work_pending+0x10/0x14
       [  758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)
      
      This patch fixes the issue by moving the kvm_free_stage2_pgd() to
      kvm_arch_flush_shadow_all().
      
      Cc: <stable@vger.kernel.org> # 3.9+
      Tested-by: NItaru Kitayama <itaru.kitayama@riken.jp>
      Reported-by: NItaru Kitayama <itaru.kitayama@riken.jp>
      Reported-by: NJames Morse <james.morse@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      293f2936
  4. 08 9月, 2016 4 次提交
    • P
      x86, clock: Fix kvm guest tsc initialization · a4497a86
      Prarit Bhargava 提交于
      When booting a kvm guest on AMD with the latest kernel the following
      messages are displayed in the boot log:
      
       tsc: Unable to calibrate against PIT
       tsc: HPET/PMTIMER calibration failed
      
      aa297292 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
      introduced a change to account for a difference in cpu and tsc frequencies for
      Intel SKL processors. Before this change the native tsc set
      x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware
      calibration of the tsc, and in tsc_init() executed
      
      	tsc_khz = x86_platform.calibrate_tsc();
      	cpu_khz = tsc_khz;
      
      The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and
      executed the same tsc_init() function.  This meant that KVM guests did not
      execute the native hardware calibration function.
      
      After aa297292, there are separate native calibrations for cpu_khz and
      tsc_khz.  The code sets x86_platform.calibrate_tsc to native_calibrate_tsc()
      which is now an Intel specific calibration function, and
      x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old"
      native_calibrate_tsc() function (ie, the native hardware calibration
      function).
      
      tsc_init() now does
      
      	cpu_khz = x86_platform.calibrate_cpu();
      	tsc_khz = x86_platform.calibrate_tsc();
      	if (tsc_khz == 0)
      		tsc_khz = cpu_khz;
      	else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
      		cpu_khz = tsc_khz;
      
      The kvm code should not call the hardware initialization in
      native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that
      prior to aa297292.
      
      This patch resolves this issue by setting x86_platform.calibrate_cpu to
      kvm_get_tsc_khz().
      
      v2: I had originally set x86_platform.calibrate_cpu to
      cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf
      in that function is not available in KVM.  I have changed the function
      pointer to kvm_get_tsc_khz().
      
      Fixes: aa297292 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: Len Brown <len.brown@intel.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a4497a86
    • D
      x86/apic: Fix num_processors value in case of failure · c291b015
      Dou Liyang 提交于
      If the topology package map check of the APIC ID and the CPU is a failure,
      we don't generate the processor info for that APIC ID yet we increase
      disabled_cpus by one - which is buggy.
      
      Only increase num_processors once we are sure we don't fail.
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
      [ Rewrote the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c291b015
    • O
      Revert "ARM: tegra: fix erroneous address in dts" · d8b795f5
      Olof Johansson 提交于
      This reverts commit b5c86b74.
      
      This is no longer needed due to other changes going into 4.8 to rename
      the unit addresses on a large number of device nodes. So it was picked up
      for v4.8-rc1 in error.
      Reported-by: NRalf Ramsauer <ralf@ramses-pyramidenbau.de>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      d8b795f5
    • P
      powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET · f077aaf0
      Paul Mackerras 提交于
      In commit c60ac569 ("powerpc: Update kernel VSID range", 2013-03-13)
      we lost a check on the region number (the top four bits of the effective
      address) for addresses below PAGE_OFFSET.  That commit replaced a check
      that the top 18 bits were all zero with a check that bits 46 - 59 were
      zero (performed for all addresses, not just user addresses).
      
      This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
      and we will insert a valid SLB entry for it.  The VSID used will be the
      same as if the top 4 bits were 0, but the page size will be some random
      value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
      array in the paca.  If that page size is the same as would be used for
      region 0, then userspace just has an alias of the region 0 space.  If the
      page size is different, then no HPTE will be found for the access, and
      the process will get a SIGSEGV (since hash_page_mm() will refuse to create
      a HPTE for the bogus address).
      
      The access beyond the end of the mm_ctx_high_slices_psize can be at most
      5.5MB past the array, and so will be in RAM somewhere.  Since the access
      is a load performed in real mode, it won't fault or crash the kernel.
      At most this bug could perhaps leak a little bit of information about
      blocks of 32 bytes of memory located at offsets of i * 512kB past the
      paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
      
      Fixes: c60ac569 ("powerpc: Update kernel VSID range")
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f077aaf0