1. 26 7月, 2012 1 次提交
  2. 06 7月, 2012 5 次提交
  3. 30 6月, 2012 1 次提交
  4. 26 6月, 2012 1 次提交
  5. 25 6月, 2012 2 次提交
    • C
      x86/uv: Work around UV2 BAU hangs · 8b6e511e
      Cliff Wickman 提交于
      On SGI's UV2 the BAU (Broadcast Assist Unit) driver can hang
      under a heavy load. To cure this:
      
      - Disable the UV2 extended status mode (see UV2_EXT_SHFT), as
        this mode changes BAU behavior in more ways then just delivering
        an extra bit of status.  Revert status to just two meaningful bits,
        like UV1.
      
      - Use no IPI-style resets on UV2.  Just give up the request for
        whatever the reason it failed and let it be accomplished with
        the legacy IPI method.
      
      - Use no alternate sending descriptor (the former UV2 workaround
        bcp->using_desc and handle_uv2_busy() stuff).  Just disable the
        use of the BAU for a period of time in favor of the legacy IPI
        method when the h/w bug leaves a descriptor busy.
      
        -- new tunable: giveup_limit determines the threshold at which a hub is
           so plugged that it should do all requests with the legacy IPI method for a
           period of time
        -- generalize disable_for_congestion() (renamed disable_for_period()) for
           use whenever a hub should avoid using the BAU for a period of time
      
      Also:
      
       - Fix find_another_by_swack(), which is part of the UV2 bug workaround
      
       - Correct and clarify the statistics (new stats s_overipilimit, s_giveuplimit,
         s_enters, s_ipifordisabled, s_plugged, s_congested)
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Link: http://lkml.kernel.org/r/20120622131459.GC31884@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8b6e511e
    • C
      x86/uv: Implement UV BAU runtime enable and disable control via /proc/sgi_uv/ · 26ef8577
      Cliff Wickman 提交于
      This patch enables the BAU to be turned on or off dynamically.
      
        echo "on"  > /proc/sgi_uv/ptc_statistics
        echo "off" > /proc/sgi_uv/ptc_statistics
      
      The system may be booted with or without the nobau option.
      
      Whether the system currently has the BAU off can be seen in
      the /proc file -- normally with the baustats script.
      Each cpu will have a 1 in the bauoff field if the BAU was turned
      off, so baustats will give a count of cpus that have it off.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Link: http://lkml.kernel.org/r/20120622131330.GB31884@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26ef8577
  6. 21 6月, 2012 1 次提交
    • A
      thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE · e4eed03f
      Andrea Arcangeli 提交于
      In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
      mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
      Xen.
      
      So instead of dealing only with "consistent" pmdvals in
      pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
      simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
      where the low 32bit and high 32bit could be inconsistent (to avoid having
      to use cmpxchg8b).
      
      The only guarantee we get from pmd_read_atomic is that if the low part of
      the pmd was found null, the high part will be null too (so the pmd will be
      considered unstable).  And if the low part of the pmd is found "stable"
      later, then it means the whole pmd was read atomically (because after a
      pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
      and we read the high part after the low part).
      
      In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
      atomically to declare the pmd as "stable" and that's true for THP and no
      THP, furthermore in the THP case we also have a barrier() that will
      prevent any inconsistent pmdvals to be cached by a later re-read of the
      *pmd.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Jonathan Nieder <jrnieder@gmail.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Tested-by: NAndrew Jones <drjones@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4eed03f
  7. 20 6月, 2012 1 次提交
  8. 18 6月, 2012 1 次提交
  9. 14 6月, 2012 1 次提交
  10. 08 6月, 2012 10 次提交
  11. 06 6月, 2012 7 次提交
  12. 05 6月, 2012 1 次提交
  13. 02 6月, 2012 3 次提交
  14. 01 6月, 2012 1 次提交
    • S
      ftrace: Synchronize variable setting with breakpoints · a192cd04
      Steven Rostedt 提交于
      When the function tracer starts modifying the code via breakpoints
      it sets a variable (modifying_ftrace_code) to inform the breakpoint
      handler to call the ftrace int3 code.
      
      But there's no synchronization between setting this code and the
      handler, thus it is possible for the handler to be called on another
      CPU before it sees the variable. This will cause a kernel crash as
      the int3 handler will not know what to do with it.
      
      I originally added smp_mb()'s to force the visibility of the variable
      but H. Peter Anvin suggested that I just make it atomic.
      
      [ Added comments as suggested by Peter Zijlstra ]
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a192cd04
  15. 31 5月, 2012 2 次提交
  16. 30 5月, 2012 1 次提交
    • A
      mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition · 26c19178
      Andrea Arcangeli 提交于
      When holding the mmap_sem for reading, pmd_offset_map_lock should only
      run on a pmd_t that has been read atomically from the pmdp pointer,
      otherwise we may read only half of it leading to this crash.
      
      PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
       #0 [f06a9dd8] crash_kexec at c049b5ec
       #1 [f06a9e2c] oops_end at c083d1c2
       #2 [f06a9e40] no_context at c0433ded
       #3 [f06a9e64] bad_area_nosemaphore at c043401a
       #4 [f06a9e6c] __do_page_fault at c0434493
       #5 [f06a9eec] do_page_fault at c083eb45
       #6 [f06a9f04] error_code (via page_fault) at c083c5d5
          EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
          00000000
          DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
          CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
       #7 [f06a9f38] _spin_lock at c083bc14
       #8 [f06a9f44] sys_mincore at c0507b7d
       #9 [f06a9fb0] system_call at c083becd
                               start           len
          EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
          DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
          SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
          CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286
      
      This should be a longstanding bug affecting x86 32bit PAE without THP.
      Only archs with 64bit large pmd_t and 32bit unsigned long should be
      affected.
      
      With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
      would partly hide the bug when the pmd transition from none to stable,
      by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
      enabled a new set of problem arises by the fact could then transition
      freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
      So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
      unconditional isn't good idea and it would be a flakey solution.
      
      This should be fully fixed by introducing a pmd_read_atomic that reads
      the pmd in order with THP disabled, or by reading the pmd atomically
      with cmpxchg8b with THP enabled.
      
      Luckily this new race condition only triggers in the places that must
      already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
      is localized there but this bug is not related to THP.
      
      NOTE: this can trigger on x86 32bit systems with PAE enabled with more
      than 4G of ram, otherwise the high part of the pmd will never risk to be
      truncated because it would be zero at all times, in turn so hiding the
      SMP race.
      
      This bug was discovered and fully debugged by Ulrich, quote:
      
      ----
      [..]
      pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
      eax.
      
          496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
          *pmd)
          497 {
          498         /* depend on compiler for an atomic pmd read */
          499         pmd_t pmdval = *pmd;
      
                                      // edi = pmd pointer
      0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
      ...
                                      // edx = PTE page table high address
      0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
      ...
                                      // eax = PTE page table low address
      0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax
      
      [..]
      
      Please note that the PMD is not read atomically. These are two "mov"
      instructions where the high order bits of the PMD entry are fetched
      first. Hence, the above machine code is prone to the following race.
      
      -  The PMD entry {high|low} is 0x0000000000000000.
         The "mov" at 0xc0507a84 loads 0x00000000 into edx.
      
      -  A page fault (on another CPU) sneaks in between the two "mov"
         instructions and instantiates the PMD.
      
      -  The PMD entry {high|low} is now 0x00000003fda38067.
         The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
      ----
      Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26c19178
  17. 27 5月, 2012 1 次提交