1. 10 1月, 2017 2 次提交
    • M
      arm64: add missing printk newlines · 117f5727
      Mark Rutland 提交于
      A few printk calls in arm64 omit a trailing newline, even though there
      is no subsequent KERN_CONT printk associated with them, and we actually
      want a newline.
      
      This can result in unrelated lines being appended, rather than appearing
      on a new line. Additionally, timestamp prefixes may appear in-line. This
      makes the logs harder to read than necessary.
      
      Avoid this by adding a trailing newline.
      
      These were found with a shortlist generated by:
      
      $ git grep 'pr\(intk\|_.*\)(.*)' -- arch/arm64 | grep -v pr_fmt | grep -v '\\n"'
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      CC: James Morse <james.morse@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      117f5727
    • J
      arm64: Don't trace __switch_to if function graph tracer is enabled · 8f4b326d
      Joel Fernandes 提交于
      Function graph tracer shows negative time (wrap around) when tracing
      __switch_to if the nosleep-time trace option is enabled.
      
      Time compensation for nosleep-time is done by an ftrace probe on
      sched_switch. This doesn't work well for the following events (with
      letters representing timestamps):
      A - sched switch probe called for task T switch out
      B - __switch_to calltime is recorded
      C - sched_switch probe called for task T switch in
      D - __switch_to rettime is recorded
      
      If C - A > D - B, then we end up over compensating for the time spent in
      __switch_to giving rise to negative times in the trace output.
      
      On x86, __switch_to is not traced if function graph tracer is enabled.
      Do the same for arm64 as well.
      
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8f4b326d
  2. 05 1月, 2017 5 次提交
    • J
      KVM: VMX: remove duplicated declaration · 69130ea1
      Jan Dakinevich 提交于
      Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code.
      Probably, it was happened after unsuccessful merge.
      Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      69130ea1
    • J
      KVM: MIPS: Flush KVM entry code from icache globally · 32eb12a6
      James Hogan 提交于
      Flush the KVM entry code from the icache on all CPUs, not just the one
      that built the entry code.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.16.x-
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      32eb12a6
    • J
      KVM: MIPS: Don't clobber CP0_Status.UX · 4c881451
      James Hogan 提交于
      On 64-bit kernels, MIPS KVM will clear CP0_Status.UX to prevent the
      guest (running in user mode) from accessing the 64-bit memory segments.
      However the previous value of CP0_Status.UX is never restored when
      exiting from the guest.
      
      If the user process uses 64-bit addressing (the n64 ABI) this can result
      in address error exceptions from the kernel if it needs to deliver a
      signal before returning to user mode, as the kernel will need to write a
      sigframe to high user addresses on the user stack which are disallowed
      by CP0_Status.UX=0.
      
      This is fixed by explicitly setting SX and UX again when exiting from
      the guest, and explicitly clearing those bits when returning to the
      guest. Having the SX and UX bits set when handling guest exits (rather
      than only when exiting to userland) will be helpful when we support VZ,
      since we shouldn't need to directly read or write guest memory, so it
      will be valid for cache management IPIs to access host user addresses.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 4.8.x-
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      4c881451
    • M
      arm64: restore get_current() optimisation · 9d84fb27
      Mark Rutland 提交于
      Commit c02433dd ("arm64: split thread_info from task stack")
      inverted the relationship between get_current() and
      current_thread_info(), with sp_el0 now holding the current task_struct
      rather than the current thead_info. The new implementation of
      get_current() prevents the compiler from being able to optimize repeated
      calls to either, resulting in a noticeable penalty in some
      microbenchmarks.
      
      This patch restores the previous optimisation by implementing
      get_current() in the same way as our old current_thread_info(), using a
      non-volatile asm statement.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9d84fb27
    • M
      arm64: mm: fix show_pte KERN_CONT fallout · 6ef4fb38
      Mark Rutland 提交于
      Recent changes made KERN_CONT mandatory for continued lines. In the
      absence of KERN_CONT, a newline may be implicit inserted by the core
      printk code.
      
      In show_pte, we (erroneously) use printk without KERN_CONT for continued
      prints, resulting in output being split across a number of lines, and
      not matching the intended output, e.g.
      
      [ff000000000000] *pgd=00000009f511b003
      , *pud=00000009f4a80003
      , *pmd=0000000000000000
      
      Fix this by using pr_cont() for all the continuations.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      6ef4fb38
  3. 04 1月, 2017 3 次提交
  4. 03 1月, 2017 3 次提交
  5. 02 1月, 2017 9 次提交
  6. 31 12月, 2016 1 次提交
  7. 30 12月, 2016 5 次提交
    • S
      arm64: dts: vexpress: Support GICC_DIR operations · 1dff32d7
      Sudeep Holla 提交于
      The GICv2 CPU interface registers span across 8K, not 4K as indicated in
      the DT.  Only the GICC_DIR register is located after the initial 4K
      boundary, leaving a functional system but without support for separately
      EOI'ing and deactivating interrupts.
      
      After this change the system supports split priority drop and interrupt
      deactivation. This patch is based on similar one from Christoffer Dall:
      commit 368400e2 ("ARM: dts: vexpress: Support GICC_DIR operations")
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      1dff32d7
    • C
      ARM: dts: vexpress: Support GICC_DIR operations · 368400e2
      Christoffer Dall 提交于
      The GICv2 CPU interface registers span across 8K, not 4K as indicated in
      the DT.  Only the GICC_DIR register is located after the initial 4K
      boundary, leaving a functional system but without support for separately
      EOI'ing and deactivating interrupts.
      
      After this change the system supports split priority drop and interrupt
      deactivation.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      [sudeep.holla@arm.com: included same fix for tc1 platform too]
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      368400e2
    • H
      parisc: Drop TIF_RESTORE_SIGMASK and switch to generic code · 1fe0a7e0
      Helge Deller 提交于
      Commit 7e781418 ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code")
      introduced code with which the "restore sigmask" flag lives in task_struct
      instead of ti->flags. Let's use this optimization on parisc too.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      1fe0a7e0
    • H
      parisc: Mark cr16 clocksource unstable on SMP systems · 41744213
      Helge Deller 提交于
      The cr16 interval timer of each CPU is not syncronized to other cr16
      timers in other CPUs in a SMP system. So, delay the registration of the
      cr16 clocksource until all CPUs have been detected and then - if we are
      on a SMP machine - mark the cr16 clocksource as unstable and lower it's
      rating before registering it at the clocksource framework.
      
      This patch fixes the stalled CPU warnings which we have seen since
      introduction of the cr16 clocksource.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v4.8+
      41744213
    • L
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds 提交于
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: NNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  8. 29 12月, 2016 1 次提交
  9. 28 12月, 2016 11 次提交