1. 21 2月, 2013 1 次提交
    • D
      sparc64: Fix tsb_grow() in atomic context. · 0fbebed6
      David S. Miller 提交于
      If our first THP installation for an MM is via the set_pmd_at() done
      during khugepaged's collapsing we'll end up in tsb_grow() trying to do
      a GFP_KERNEL allocation with several locks held.
      
      Simply using GFP_ATOMIC in this situation is not the best option
      because we really can't have this fail, so we'd really like to keep
      this an order 0 GFP_KERNEL allocation if possible.
      
      Also, doing the TSB allocation from khugepaged is a really bad idea
      because we'll allocate it potentially from the wrong NUMA node in that
      context.
      
      So what we do is defer the hugepage TSB allocation until the first TLB
      miss we take on a hugepage.  This is slightly tricky because we have
      to handle two unusual cases:
      
      1) Taking the first hugepage TLB miss in the window trap handler.
         We'll call the winfix_trampoline when that is detected.
      
      2) An initial TSB allocation via TLB miss races with a hugetlb
         fault on another cpu running the same MM.  We handle this by
         unconditionally loading the TSB we see into the current cpu
         even if it's non-NULL at hugetlb_setup time.
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fbebed6
  2. 11 10月, 2012 1 次提交
    • D
      sparc64: Fix deficiencies in sun4v error reporting. · f88620b9
      David S. Miller 提交于
      Missing error types, attributes, and report fields.  Pad out
      to 64-bytes.
      
      Make string reporting cleaner and easier to extend in the future using
      "const char *" arrays that index by either bit position, or absolute
      field value.
      
      Report the raw 64-byte error report as a sequence of u64s before the
      annotated version.
      
      Only report fields which are valid, given the context and the
      attribute bits which are set.
      
      For shutdown requests, use the local copy of the error report not the
      one we just freed up back to the queue.  Also, use orderly_poweroff()
      just like the Domain Services shutdown request code does.
      
      If the real-address reported is "-1" (unknown) try to disassemble the
      instruction to report the effective address of the access.  Only do
      this in privileged mode.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88620b9
  3. 09 10月, 2012 2 次提交
    • D
      sparc64: Support transparent huge pages. · 9e695d2e
      David Miller 提交于
      This is relatively easy since PMD's now cover exactly 4MB of memory.
      
      Our PMD entries are 32-bits each, so we use a special encoding.  The
      lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
      because sparc64's page tables are purely software entities so we can use
      whatever encoding scheme we want.  We just have to make the TLB miss
      assembler page table walkers aware of the layout.
      
      set_pmd_at() works much like set_pte_at() but it has to operate in two
      page from a table of non-huge PTEs, so we have to queue up TLB flushes
      based upon what mappings are valid in the PTE table.  In the second regime
      we are going from huge-page to non-huge-page, and in that case we need
      only queue up a single TLB flush to push out the huge page mapping.
      
      We still have 5 bits remaining in the huge PMD encoding so we can very
      likely support any new pieces of THP state tracking that might get added
      in the future.
      
      With lots of help from Johannes Weiner.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e695d2e
    • S
      readahead: fault retry breaks mmap file read random detection · 45cac65b
      Shaohua Li 提交于
      .fault now can retry.  The retry can break state machine of .fault.  In
      filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
      try, since the page is in page cache now, ra->mmap_miss is decreased.  And
      these are done in one fault, so we can't detect random mmap file access.
      
      Add a new flag to indicate .fault is tried once.  In the second try, skip
      ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.
      
      I only tested x86, didn't test other archs, but looks the change for other
      archs is obvious, but who knows :)
      Signed-off-by: NShaohua Li <shaohua.li@fusionio.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45cac65b
  4. 05 4月, 2012 1 次提交
  5. 01 7月, 2011 1 次提交
    • P
      perf: Remove the nmi parameter from the swevent and overflow interface · a8b0ca17
      Peter Zijlstra 提交于
      The nmi parameter indicated if we could do wakeups from the current
      context, if not, we would set some state and self-IPI and let the
      resulting interrupt do the wakeup.
      
      For the various event classes:
      
        - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
          the PMI-tail (ARM etc.)
        - tracepoint: nmi=0; since tracepoint could be from NMI context.
        - software: nmi=[0,1]; some, like the schedule thing cannot
          perform wakeups, and hence need 0.
      
      As one can see, there is very little nmi=1 usage, and the down-side of
      not using it is that on some platforms some software events can have a
      jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
      
      The up-side however is that we can remove the nmi parameter and save a
      bunch of conditionals in fast paths.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8b0ca17
  6. 01 3月, 2010 1 次提交
  7. 21 1月, 2010 1 次提交
  8. 11 12月, 2009 2 次提交
  9. 03 8月, 2009 1 次提交
  10. 22 6月, 2009 1 次提交
  11. 04 2月, 2009 1 次提交
    • D
      sparc64: Kill bogus TPC/address truncation during 32-bit faults. · 9b026058
      David S. Miller 提交于
      This builds upon eeabac73
      ("sparc64: Validate kernel generated fault addresses on sparc64.")
      
      Upon further consideration, we actually should never see any
      fault addresses for 32-bit tasks with the upper 32-bits set.
      
      If it does every happen, by definition it's a bug.  Whatever
      context created that fault would only have that fault satisfied
      if we used the full 64-bit address.  If we truncate it, we'll
      always fault the wrong address and we'll always loop faulting
      forever.
      
      So catch such conditions and mark them as errors always.  Log
      the error and fail the fault.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b026058
  12. 03 2月, 2009 1 次提交
    • D
      sparc64: Validate kernel generated fault addresses on sparc64. · eeabac73
      David S. Miller 提交于
      In order to handle all of the cases of address calculation overflow
      properly, we run sparc 32-bit processes in "address masking" mode
      when running on a 64-bit kernel.
      
      Address masking mode zeros out the top 32-bits of the address
      calculated for every load and store instruction.
      
      However, when we're in privileged mode we have to run with that
      address masking mode disabled even when accessing userspace from
      the kernel.
      
      To "simulate" the address masking mode we clear the top-bits by
      hand for 32-bit processes in the fault handler.
      
      It is the responsibility of code in the compat layer to properly
      zero extend addresses used to access userspace.  If this isn't
      followed properly we can get into a fault loop.
      
      Say that the user address is 0xf0000000 but for whatever reason
      the kernel code sign extends this to 64-bit, and then the kernel
      tries to access the result.
      
      In such a case we'll fault on address 0xfffffffff0000000 but the fault
      handler will process that fault as if it were to address 0xf0000000.
      We'll loop faulting forever because the fault never gets satisfied.
      
      So add a check specifically for this case, when the kernel is faulting
      on a user address access and the addresses don't match up.
      
      This code path is sufficiently slow path, and this bug is sufficiently
      painful to diagnose, that this kind of bug check is warranted.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeabac73
  13. 05 12月, 2008 1 次提交
  14. 12 9月, 2008 1 次提交
  15. 18 7月, 2008 1 次提交
  16. 20 5月, 2008 1 次提交
  17. 29 2月, 2008 1 次提交
  18. 27 2月, 2008 1 次提交
    • D
      [SPARC64]: Loosen checks in exception table handling. · 622eaec6
      David S. Miller 提交于
      Some parts of the kernel now do things like do *_user() accesses while
      set_fs(KERNEL_DS) that fault on purpose.
      
      See, for example, the code added by changeset
      a0c1e907 ("futex: runtime enable pi
      and robust functionality").
      
      That trips up the ASI sanity checking we make in do_kernel_fault().
      
      Just remove it for now.  Maybe we can add it back later with an added
      conditional which looks at the current get_fs() value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622eaec6
  19. 17 10月, 2007 1 次提交
    • W
      During VM oom condition, kill all threads in process group · dcca2bde
      Will Schmidt 提交于
      We have had complaints where a threaded application is left in a bad state
      after one of it's threads is killed when we hit a VM: out_of_memory
      condition.
      
      Killing just one of the process threads can leave the application in a bad
      state, whereas killing the entire process group would allow for the
      application to restart, or be otherwise handled, and makes it very obvious
      that something has gone wrong.
      
      This change allows the entire process group to be taken down, rather
      than just the one thread.
      Signed-off-by: NWill Schmidt <will_schmidt@vnet.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcca2bde
  20. 30 7月, 2007 1 次提交
  21. 20 7月, 2007 1 次提交
    • N
      mm: fault feedback #2 · 83c54070
      Nick Piggin 提交于
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
  22. 09 5月, 2007 3 次提交
  23. 25 7月, 2006 1 次提交
  24. 27 6月, 2006 1 次提交
  25. 01 4月, 2006 1 次提交
  26. 27 3月, 2006 1 次提交
  27. 22 3月, 2006 1 次提交
  28. 20 3月, 2006 3 次提交
    • D
      [SPARC64]: Fix and re-enable dynamic TSB sizing. · 7a1ac526
      David S. Miller 提交于
      This is good for up to %50 performance improvement of some test cases.
      The problem has been the race conditions, and hopefully I've plugged
      them all up here.
      
      1) There was a serious race in switch_mm() wrt. lazy TLB
         switching to and from kernel threads.
      
         We could erroneously skip a tsb_context_switch() and thus
         use a stale TSB across a TSB grow event.
      
         There is a big comment now in that function describing
         exactly how it can happen.
      
      2) All code paths that do something with the TSB need to be
         guarded with the mm->context.lock spinlock.  This makes
         page table flushing paths properly synchronize with both
         TSB growing and TLB context changes.
      
      3) TSB growing events are moved to the end of successful fault
         processing.  Previously it was in update_mmu_cache() but
         that is deadlock prone.  At the end of do_sparc64_fault()
         we hold no spinlocks that could deadlock the TSB grow
         sequence.  We also have dropped the address space semaphore.
      
      While we're here, add prefetching to the copy_tsb() routine
      and put it in assembler into the tsb.S file.  This piece of
      code is quite time critical.
      
      There are some small negative side effects to this code which
      can be improved upon.  In particular we grab the mm->context.lock
      even for the tsb insert done by update_mmu_cache() now and that's
      a bit excessive.  We can get rid of that locking, and the same
      lock taking in flush_tsb_user(), by disabling PSTATE_IE around
      the whole operation including the capturing of the tsb pointer
      and tsb_nentries value.  That would work because anyone growing
      the TSB won't free up the old TSB until all cpus respond to the
      TSB change cross call.
      
      I'm not quite so confident in that optimization to put it in
      right now, but eventually we might be able to and the description
      is here for reference.
      
      This code seems very solid now.  It passes several parallel GCC
      bootstrap builds, and our favorite "nut cruncher" stress test which is
      a full "make -j8192" build of a "make allmodconfig" kernel.  That puts
      about 256 processes on each cpu's run queue, makes lots of process cpu
      migrations occur, causes lots of page table and TLB flushing activity,
      incurs many context version number changes, and it swaps the machine
      real far out to disk even though there is 16GB of ram on this test
      system. :-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a1ac526
    • D
    • D
      [SPARC64]: Deal with PTE layout differences in SUN4V. · c4bce90e
      David S. Miller 提交于
      Yes, you heard it right, they changed the PTE layout for
      SUN4V.  Ho hum...
      
      This is the simple and inefficient way to support this.
      It'll get optimized, don't worry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4bce90e
  29. 10 11月, 2005 1 次提交
  30. 09 11月, 2005 1 次提交
  31. 29 9月, 2005 4 次提交