1. 05 7月, 2017 4 次提交
    • A
      x86/mm: Stop calling leave_mm() in idle code · 43858b4f
      Andy Lutomirski 提交于
      Now that lazy TLB suppresses all flush IPIs (as opposed to all but
      the first), there's no need to leave_mm() when going idle.
      
      This means we can get rid of the rcuidle hack in
      switch_mm_irqs_off() and we can unexport leave_mm().
      
      This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
      since it has no callers any more.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      43858b4f
    • A
      x86/mm: Rework lazy TLB mode and TLB freshness tracking · 94b1b03b
      Andy Lutomirski 提交于
      x86's lazy TLB mode used to be fairly weak -- it would switch to
      init_mm the first time it tried to flush a lazy TLB.  This meant an
      unnecessary CR3 write and, if the flush was remote, an unnecessary
      IPI.
      
      Rewrite it entirely.  When we enter lazy mode, we simply remove the
      CPU from mm_cpumask.  This means that we need a way to figure out
      whether we've missed a flush when we switch back out of lazy mode.
      I use the tlb_gen machinery to track whether a context is up to
      date.
      
      Note to reviewers: this patch, my itself, looks a bit odd.  I'm
      using an array of length 1 containing (ctx_id, tlb_gen) rather than
      just storing tlb_gen, and making it at array isn't necessary yet.
      I'm doing this because the next few patches add PCID support, and,
      with PCID, we need ctx_id, and the array will end up with a length
      greater than 1.  Making it an array now means that there will be
      less churn and therefore less stress on your eyeballs.
      
      NB: This is dubious but, AFAICT, still correct on Xen and UV.
      xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
      patch changes the way that mm_cpumask() works.  This should be okay,
      since Xen *also* iterates all online CPUs to find all the CPUs it
      needs to twiddle.
      
      The UV tlbflush code is rather dated and should be changed.
      
      Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
      (turbo off, intel_pstate requesting max performance) under KVM with
      the guest using idle=poll (to avoid artifacts when bouncing between
      CPUs).  I haven't done any real statistics here -- I just ran them
      in a loop and picked the fastest results that didn't look like
      outliers.  Unpatched means commit a4eb8b99, so all the
      bookkeeping overhead is gone.
      
      MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
      an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
      This is intended to be a nearly worst-case test.
      
        patched:         13.4µs
        unpatched:       21.6µs
      
      Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
      nrounds = 100, 256M data
      
        patched:         1.1 seconds or so
        unpatched:       1.9 seconds or so
      
      The sleepup on Vitaly's test appearss to be because it spends a lot
      of time blocked on mmap_sem, and this patch avoids sending IPIs to
      blocked CPUs.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      94b1b03b
    • A
      x86/mm: Track the TLB's tlb_gen and update the flushing algorithm · b0579ade
      Andy Lutomirski 提交于
      There are two kernel features that would benefit from tracking
      how up-to-date each CPU's TLB is in the case where IPIs aren't keeping
      it up to date in real time:
      
       - Lazy mm switching currently works by switching to init_mm when
         it would otherwise flush.  This is wasteful: there isn't fundamentally
         any need to update CR3 at all when going lazy or when returning from
         lazy mode, nor is there any need to receive flush IPIs at all.  Instead,
         we should just stop trying to keep the TLB coherent when we go lazy and,
         when unlazying, check whether we missed any flushes.
      
       - PCID will let us keep recent user contexts alive in the TLB.  If we
         start doing this, we need a way to decide whether those contexts are
         up to date.
      
      On some paravirt systems, remote TLBs can be flushed without IPIs.
      This won't update the target CPUs' tlb_gens, which may cause
      unnecessary local flushes later on.  We can address this if it becomes
      a problem by carefully updating the target CPU's tlb_gen directly.
      
      By itself, this patch is a very minor optimization that avoids
      unnecessary flushes when multiple TLB flushes targetting the same CPU
      race.  The complexity in this patch would not be worth it on its own,
      but it will enable improved lazy TLB tracking and PCID.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1210fb244bc9cbe7677f7f0b72db4d359675f24b.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0579ade
    • A
      x86/mm: Give each mm TLB flush generation a unique ID · f39681ed
      Andy Lutomirski 提交于
      This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
      ctx_id uniquely identifies the mm_struct and will never be reused.
      For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
      count of the number of times that a TLB flush has been requested.
      The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
      flush actions and will be used in subsequent patches to reliably
      determine whether all needed TLB flushes have occurred on a given
      CPU.
      
      This patch is split out for ease of review.  By itself, it has no
      real effect other than creating and updating the new variables.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/413a91c24dab3ed0caa5f4e4d017d87b0857f920.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f39681ed
  2. 03 7月, 2017 3 次提交
    • T
      parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs · 33f9e024
      Thomas Bogendoerfer 提交于
      Enabling parport pc driver on a B2600 (and probably other 64bit PARISC
      systems) produced following BUG:
      
      CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc5-30198-g1132d5e7 #156
      task: 000000009e050000 task.stack: 000000009e04c000
      
           YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      PSW: 00001000000001101111111100001111 Not tainted
      r00-03  000000ff0806ff0f 000000009e04c990 0000000040871b78 000000009e04cac0
      r04-07  0000000040c14de0 ffffffffffffffff 000000009e07f098 000000009d82d200
      r08-11  000000009d82d210 0000000000000378 0000000000000000 0000000040c345e0
      r12-15  0000000000000005 0000000040c345e0 0000000000000000 0000000040c9d5e0
      r16-19  0000000040c345e0 00000000f00001c4 00000000f00001bc 0000000000000061
      r20-23  000000009e04ce28 0000000000000010 0000000000000010 0000000040b89e40
      r24-27  0000000000000003 0000000000ffffff 000000009d82d210 0000000040c14de0
      r28-31  0000000000000000 000000009e04ca90 000000009e04cb40 0000000000000000
      sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
      IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404aece0 00000000404aece4
       IIR: 03ffe01f    ISR: 0000000010340000  IOR: 000001781304cac8
       CPU:        0   CR30: 000000009e04c000 CR31: 00000000e2976de2
       ORIG_R28: 0000000000000200
       IAOQ[0]: sba_dma_supported+0x80/0xd0
       IAOQ[1]: sba_dma_supported+0x84/0xd0
       RP(r2): parport_pc_probe_port+0x178/0x1200
      
      Cause is a call to dma_coerce_mask_and_coherenet in parport_pc_probe_port,
      which PARISC DMA API doesn't handle very nicely. This commit gives back
      DMA_ERROR_CODE for DMA API calls, if device isn't capable of DMA
      transaction.
      
      Cc: <stable@vger.kernel.org> # v3.13+
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      33f9e024
    • H
      parisc: Report SIGSEGV instead of SIGBUS when running out of stack · 24746231
      Helge Deller 提交于
      When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
      instead of the expected SIGSEGV signal.
      
      This example shows how the kernel faults:
      do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
      trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000
      
      The vma->vm_end value is the first address which does not belong to the vma, so
      adjust the check to include vma->vm_end to the range for which to send the
      SIGSEGV signal.
      
      This patch unbreaks building the debian libsigsegv package.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHelge Deller <deller@gmx.de>
      24746231
    • E
      parisc: use compat_sys_keyctl() · b0f94efd
      Eric Biggers 提交于
      Architectures with a compat syscall table must put compat_sys_keyctl()
      in it, not sys_keyctl().  The parisc architecture was not doing this;
      fix it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      b0f94efd
  3. 01 7月, 2017 1 次提交
  4. 30 6月, 2017 15 次提交
    • J
      objtool, x86: Add several functions and files to the objtool whitelist · c207aee4
      Josh Poimboeuf 提交于
      In preparation for an objtool rewrite which will have broader checks,
      whitelist functions and files which cause problems because they do
      unusual things with the stack.
      
      These whitelists serve as a TODO list for which functions and files
      don't yet have undwarf unwinder coverage.  Eventually most of the
      whitelists can be removed in favor of manual CFI hint annotations or
      objtool improvements.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c207aee4
    • A
      x86/mm: Delete a big outdated comment about TLB flushing · 8781fb7e
      Andy Lutomirski 提交于
      The comment describes the old explicit IPI-based flush logic, which
      is long gone.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/55e44997e56086528140c5180f8337dc53fb7ffc.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8781fb7e
    • A
      x86/mm: Don't reenter flush_tlb_func_common() · bc0d5a89
      Andy Lutomirski 提交于
      It was historically possible to have two concurrent TLB flushes
      targetting the same CPU: one initiated locally and one initiated
      remotely.  This can now cause an OOPS in leave_mm() at
      arch/x86/mm/tlb.c:47:
      
              if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
                      BUG();
      
      with this call trace:
       flush_tlb_func_local arch/x86/mm/tlb.c:239 [inline]
       flush_tlb_mm_range+0x26d/0x370 arch/x86/mm/tlb.c:317
      
      Without reentrancy, this OOPS is impossible: leave_mm() is only
      called if we're not in TLBSTATE_OK, but then we're unexpectedly
      in TLBSTATE_OK in leave_mm().
      
      This can be caused by flush_tlb_func_remote() happening between
      the two checks and calling leave_mm(), resulting in two consecutive
      leave_mm() calls on the same CPU with no intervening switch_mm()
      calls.
      
      We never saw this OOPS before because the old leave_mm()
      implementation didn't put us back in TLBSTATE_OK, so the assertion
      didn't fire.
      
      Nadav noticed the reentrancy issue in a different context, but
      neither of us realized that it caused a problem yet.
      Reported-by: NLevin, Alexander (Sasha Levin) <alexander.levin@verizon.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Fixes: 3d28ebce ("x86/mm: Rework lazy TLB to track the actual loaded mm")
      Link: http://lkml.kernel.org/r/855acf733268d521c9f2e191faee2dcc23a29729.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bc0d5a89
    • P
      x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings · 236222d3
      Paolo Abeni 提交于
      According to the Intel datasheet, the REP MOVSB instruction
      exposes a pretty heavy setup cost (50 ticks), which hurts
      short string copy operations.
      
      This change tries to avoid this cost by calling the explicit
      loop available in the unrolled code for strings shorter
      than 64 bytes.
      
      The 64 bytes cutoff value is arbitrary from the code logic
      point of view - it has been selected based on measurements,
      as the largest value that still ensures a measurable gain.
      
      Micro benchmarks of the __copy_from_user() function with
      lengths in the [0-63] range show this performance gain
      (shorter the string, larger the gain):
      
       - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4
       - in the [72%-9%] range on Intel Core i7-4810MQ
      
      Other tested CPUs - namely Intel Atom S1260 and AMD Opteron
      8216 - show no difference, because they do not expose the
      ERMS feature bit.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com
      [ Clarified the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      236222d3
    • C
      perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static · e91c8d97
      Colin Ian King 提交于
      A few minor clean-ups: constify the lbr_desc[] array and make
      local function lbr_from_signext_quirk_rd() static to fix a sparse warning:
      
        "symbol 'lbr_from_signext_quirk_rd' was not declared. Should it be static?"
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170629091406.9870-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e91c8d97
    • K
      x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging · a24261d7
      Kirill A. Shutemov 提交于
      KASLR uses hack to detect whether we booted via startup_32() or
      startup_64(): it checks what is loaded into cr3 and compares it to
      _pgtables. _pgtables is the array of page tables where early code
      allocates page table from.
      
      KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
      that's not true if we booted with 5-level paging enabled. In this case top
      level page table is allocated separately and only the first p4d page table
      is allocated from the array.
      
      Let's modify the check to cover both 4- and 5-level paging cases.
      
      The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
      page table for 4th or 5th level, depending on configuration.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170628121730.43079-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a24261d7
    • B
      x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug · 8eabf42a
      Baoquan He 提交于
      Kernel text KASLR is separated into physical address and virtual
      address randomization. And for virtual address randomization, we
      only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
      So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
      but not the original kernel loading address 'output'.
      
      The bug will cause kernel boot failure if kernel is loaded at a different
      position than the address, 16M, which is decided at compiled time.
      Kexec/kdump is such practical case.
      
      To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
      value.
      Tested-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8391c73c ("x86/KASLR: Randomize virtual address separately")
      Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8eabf42a
    • B
      x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization · b892cb87
      Baoquan He 提交于
      For kernel text KASLR, the virtual address is confined to area of 1G,
      [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
      virtual address randomization, we only randomize to get an offset
      between 16M and 1G, then add this offset to the starting address,
      0xffffffff80000000. Here 16M is the offset which is decided at linking
      stage. So the amount of the local variable 'virt_addr' which respresents
      the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.
      
      Add a debug check for the offset. If out of bounds, print error
      message and hang there.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b892cb87
    • J
      MIPS: Avoid accidental raw backtrace · 85423636
      James Hogan 提交于
      Since commit 81a76d71 ("MIPS: Avoid using unwind_stack() with
      usermode") show_backtrace() invokes the raw backtracer when
      cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels
      where user and kernel address spaces overlap.
      
      However this is used by show_stack() which creates its own pt_regs on
      the stack and leaves cp0_status uninitialised in most of the code paths.
      This results in the non deterministic use of the raw back tracer
      depending on the previous stack content.
      
      show_stack() deals exclusively with kernel mode stacks anyway, so
      explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure
      we get a useful backtrace.
      
      Fixes: 81a76d71 ("MIPS: Avoid using unwind_stack() with usermode")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.15+
      Patchwork: https://patchwork.linux-mips.org/patch/16656/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      85423636
    • P
      MIPS: Perform post-DMA cache flushes on systems with MAARs · cad482c1
      Paul Burton 提交于
      Recent CPUs from Imagination Technologies such as the I6400 or P6600 are
      able to speculatively fetch data from memory into caches. This means
      that if used in a system with non-coherent DMA they require that caches
      be invalidated after a device performs DMA, and before the CPU reads the
      DMA'd data, in order to ensure that stale values weren't speculatively
      prefetched.
      
      Such CPUs also introduced Memory Accessibility Attribute Registers
      (MAARs) in order to control the regions in which they are allowed to
      speculate. Thus we can use the presence of MAARs as a good indication
      that the CPU requires the above cache maintenance. Use the presence of
      MAARs to determine the result of cpu_needs_post_dma_flush() in the
      default case, in order to handle these recent CPUs correctly.
      
      Note that the return type of cpu_needs_post_dma_flush() is changed to
      bool, such that it's clearer what's happening when cpu_has_maar is cast
      to bool for the return value. If this patch were backported to a
      pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int
      we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is
      currently 1ull<<30, so when truncated to an int gives a non-zero value
      anyway, but even so the implicit conversion from long long int to bool
      makes it clearer to understand what will happen than the implicit
      conversion from long long int to int would. The bool return type also
      fits this usage better semantically, so seems like an all-round win.
      
      Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting
      the return type change.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Ed Blake <ed.blake@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16363/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      cad482c1
    • P
      MIPS: Fix IRQ tracing & lockdep when rescheduling · d8550860
      Paul Burton 提交于
      When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
      from arch/mips/kernel/entry.S we disable interrupts. This is true
      regardless of whether we reach work_resched from syscall_exit_work,
      resume_userspace or by looping after calling schedule(). Although we
      disable interrupts in these paths we don't call trace_hardirqs_off()
      before calling into C code which may acquire locks, and we therefore
      leave lockdep with an inconsistent view of whether interrupts are
      disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are
      both enabled.
      
      Without tracing this interrupt state lockdep will print warnings such
      as the following once a task returns from a syscall via
      syscall_exit_partial with TIF_NEED_RESCHED set:
      
      [   49.927678] ------------[ cut here ]------------
      [   49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
      [   49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [   49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
      [   49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
      [   49.974431]         0000000000000000 0000000000000000 0000000000000000 000000000000004a
      [   49.985300]         ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
      [   49.996194]         0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
      [   50.007063]         000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
      [   50.017945]         0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
      [   50.028827]         0000000000000000 0000000000000001 0000000000000000 0000000000000000
      [   50.039688]         0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
      [   50.050575]         00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
      [   50.061448]         0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
      [   50.072327]         ...
      [   50.076087] Call Trace:
      [   50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
      [   50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
      [   50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
      [   50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
      [   50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
      [   50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
      [   50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
      [   50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
      [   50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
      [   50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
      [   50.148405] possible reason: unannotated irqs-off.
      [   50.154600] irq event stamp: 400463
      [   50.159566] hardirqs last  enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
      [   50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
      [   50.183897] softirqs last  enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
      [   50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128
      
      Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
      when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
      schedule() following the work_resched label because:
      
       1) Interrupts are disabled regardless of the path we take to reach
          work_resched() & schedule().
      
       2) Performing the tracing here avoids the need to do it in paths which
          disable interrupts but don't call out to C code before hitting a
          path which uses the RESTORE_SOME macro that will call
          trace_hardirqs_on() or trace_hardirqs_off() as appropriate.
      
      We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
      syscall_trace_leave() for similar reasons, ensuring that lockdep has a
      consistent view of state after we re-enable interrupts.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org>
      Patchwork: https://patchwork.linux-mips.org/patch/15385/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      d8550860
    • P
      MIPS: pm-cps: Drop manual cache-line alignment of ready_count · 161c51cc
      Paul Burton 提交于
      We allocate memory for a ready_count variable per-CPU, which is accessed
      via a cached non-coherent TLB mapping to perform synchronisation between
      threads within the core using LL/SC instructions. In order to ensure
      that the variable is contained within its own data cache line we
      allocate 2 lines worth of memory & align the resulting pointer to a line
      boundary. This is however unnecessary, since kmalloc is guaranteed to
      return memory which is at least cache-line aligned (see
      ARCH_DMA_MINALIGN). Stop the redundant manual alignment.
      
      Besides cleaning up the code & avoiding needless work, this has the side
      effect of avoiding an arithmetic error found by Bryan on 64 bit systems
      due to the 32 bit size of the former dlinesz. This led the ready_count
      variable to have its upper 32b cleared erroneously for MIPS64 kernels,
      causing problems when ready_count was later used on MIPS64 via cpuidle.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 3179d37e ("MIPS: pm-cps: add PM state entry code for CPS systems")
      Reported-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Tested-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org> # v3.16+
      Patchwork: https://patchwork.linux-mips.org/patch/15383/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      161c51cc
    • D
      ARM: 8685/1: ensure memblock-limit is pmd-aligned · 9e25ebfe
      Doug Berger 提交于
      The pmd containing memblock_limit is cleared by prepare_page_table()
      which creates the opportunity for early_alloc() to allocate unmapped
      memory if memblock_limit is not pmd aligned causing a boot-time hang.
      
      Commit 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      attempted to resolve this problem, but there is a path through the
      adjust_lowmem_bounds() routine where if all memory regions start and
      end on pmd-aligned addresses the memblock_limit will be set to
      arm_lowmem_limit.
      
      Since arm_lowmem_limit can be affected by the vmalloc early parameter,
      the value of arm_lowmem_limit may not be pmd-aligned. This commit
      corrects this oversight such that memblock_limit is always rounded
      down to pmd-alignment.
      
      Fixes: 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      9e25ebfe
    • K
      x86/ftrace: Exclude functions in head64.c from function-tracing · bb43dbc5
      Kirill A. Shutemov 提交于
      A recent commit moved most logic of early boot up from startup_64() written
      in assembly to __startup_64() written in C.
      
      Fengguang reported breakage due to the change. It was tracked down to
      CONFIG_FUNCTION_TRACER being enabled.
      
      Tracing this function is not possible because it's invoked from the
      earliest boot stage before the relocation fixups have been done. It is the
      function doing the relocation.
      
      Exclude it from being built with tracer stubs.
      
      Fixes: c88d7150 ("x86/boot/64: Rewrite startup_64() in C")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: lkp@01.org
      Link: http://lkml.kernel.org/r/20170627115948.17938-1-kirill.shutemov@linux.intel.com
      bb43dbc5
    • K
      perf/x86/intel/uncore: Fix wrong box pointer check · 80c65fdb
      Kan Liang 提交于
      Should not init a NULL box. It will cause system crash.
      The issue looks like caused by a typo.
      
      This was not noticed because there is no NULL box. Also, for most
      boxes, they are enabled by default. The init code is not critical.
      
      Fixes: fff4b87e ("perf/x86/intel/uncore: Make package handling more robust")
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
      80c65fdb
  5. 29 6月, 2017 6 次提交
  6. 28 6月, 2017 11 次提交