1. 08 4月, 2014 1 次提交
    • D
      mm: per-thread vma caching · 615d6e87
      Davidlohr Bueso 提交于
      This patch is a continuation of efforts trying to optimize find_vma(),
      avoiding potentially expensive rbtree walks to locate a vma upon faults.
      The original approach (https://lkml.org/lkml/2013/11/1/410), where the
      largest vma was also cached, ended up being too specific and random,
      thus further comparison with other approaches were needed.  There are
      two things to consider when dealing with this, the cache hit rate and
      the latency of find_vma().  Improving the hit-rate does not necessarily
      translate in finding the vma any faster, as the overhead of any fancy
      caching schemes can be too high to consider.
      
      We currently cache the last used vma for the whole address space, which
      provides a nice optimization, reducing the total cycles in find_vma() by
      up to 250%, for workloads with good locality.  On the other hand, this
      simple scheme is pretty much useless for workloads with poor locality.
      Analyzing ebizzy runs shows that, no matter how many threads are
      running, the mmap_cache hit rate is less than 2%, and in many situations
      below 1%.
      
      The proposed approach is to replace this scheme with a small per-thread
      cache, maximizing hit rates at a very low maintenance cost.
      Invalidations are performed by simply bumping up a 32-bit sequence
      number.  The only expensive operation is in the rare case of a seq
      number overflow, where all caches that share the same address space are
      flushed.  Upon a miss, the proposed replacement policy is based on the
      page number that contains the virtual address in question.  Concretely,
      the following results are seen on an 80 core, 8 socket x86-64 box:
      
      1) System bootup: Most programs are single threaded, so the per-thread
         scheme does improve ~50% hit rate by just adding a few more slots to
         the cache.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 50.61%   | 19.90            |
      | patched        | 73.45%   | 13.58            |
      +----------------+----------+------------------+
      
      2) Kernel build: This one is already pretty good with the current
         approach as we're dealing with good locality.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 75.28%   | 11.03            |
      | patched        | 88.09%   | 9.31             |
      +----------------+----------+------------------+
      
      3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 70.66%   | 17.14            |
      | patched        | 91.15%   | 12.57            |
      +----------------+----------+------------------+
      
      4) Ebizzy: There's a fair amount of variation from run to run, but this
         approach always shows nearly perfect hit rates, while baseline is just
         about non-existent.  The amounts of cycles can fluctuate between
         anywhere from ~60 to ~116 for the baseline scheme, but this approach
         reduces it considerably.  For instance, with 80 threads:
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 1.06%    | 91.54            |
      | patched        | 99.97%   | 14.18            |
      +----------------+----------+------------------+
      
      [akpm@linux-foundation.org: fix nommu build, per Davidlohr]
      [akpm@linux-foundation.org: document vmacache_valid() logic]
      [akpm@linux-foundation.org: attempt to untangle header files]
      [akpm@linux-foundation.org: add vmacache_find() BUG_ON]
      [hughd@google.com: add vmacache_valid_mm() (from Oleg)]
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: adjust and enhance comments]
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Tested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      615d6e87
  2. 26 2月, 2014 1 次提交
  3. 25 1月, 2014 1 次提交
  4. 04 10月, 2013 1 次提交
  5. 01 5月, 2013 1 次提交
  6. 05 2月, 2013 1 次提交
  7. 12 10月, 2012 1 次提交
  8. 27 9月, 2012 1 次提交
  9. 30 3月, 2012 1 次提交
    • J
      kgdb,debug_core: pass the breakpoint struct instead of address and memory · 98b54aa1
      Jason Wessel 提交于
      There is extra state information that needs to be exposed in the
      kgdb_bpt structure for tracking how a breakpoint was installed.  The
      debug_core only uses the the probe_kernel_write() to install
      breakpoints, but this is not enough for all the archs.  Some arch such
      as x86 need to use text_poke() in order to install a breakpoint into a
      read only page.
      
      Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
      kgdb_arch_remove_breakpoint() allows other archs to set the type
      variable which indicates how the breakpoint was installed.
      
      Cc: stable@vger.kernel.org # >= 2.6.36
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      98b54aa1
  10. 29 3月, 2012 1 次提交
  11. 23 3月, 2012 2 次提交
  12. 27 7月, 2011 1 次提交
  13. 31 3月, 2011 1 次提交
  14. 30 10月, 2010 1 次提交
  15. 23 10月, 2010 5 次提交
    • J
      kdb,debug_core: adjust master cpu switch logic against new debug_core locking · 495363d3
      Jason Wessel 提交于
      The kdb shell needs to enforce switching back to the original CPU that
      took the exception before restoring normal kernel execution.  Resuming
      from a different CPU than what took the original exception will cause
      problems with spin locks that are freed from the a different processor
      than had taken the lock.
      
      The special logic in dbg_cpu_switch() can go away entirely with
      because the state of what cpus want to be masters or slaves will
      remain unchanged between entry and exit of the debug_core exception
      context.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      495363d3
    • J
      debug_core: refactor locking for master/slave cpus · dfee3a7b
      Jason Wessel 提交于
      For quite some time there have been problems with memory barriers and
      various races with NMI on multi processor systems using the kernel
      debugger.  The algorithm for entering the kernel debug core and
      resuming kernel execution was racy and had several known edge case
      problems with attempting to debug something on a heavily loaded system
      using breakpoints that are hit repeatedly and quickly.
      
      The prior "locking" design entry worked as follows:
      
        * The atomic counter kgdb_active was used with atomic exchange in
          order to elect a master cpu out of all the cpus that may have
          taken a debug exception.
        * The master cpu increments all elements of passive_cpu_wait[].
        * The master cpu issues the round up cpus message.
        * Each "slave cpu" that enters the debug core increments its own
          element in cpu_in_kgdb[].
        * Each "slave cpu" spins on passive_cpu_wait[] until it becomes 0.
        * The master cpu debugs the system.
      
      The new scheme removes the two arrays of atomic counters and replaces
      them with 2 single counters.  One counter is used to count the number
      of cpus waiting to become a master cpu (because one or more hit an
      exception). The second counter is use to indicate how many cpus have
      entered as slave cpus.
      
      The new entry logic works as follows:
      
        * One or more cpus enters via kgdb_handle_exception() and increments
          the masters_in_kgdb. Each cpu attempts to get the spin lock called
          dbg_master_lock.
        * The master cpu sets kgdb_active to the current cpu.
        * The master cpu takes the spinlock dbg_slave_lock.
        * The master cpu asks to round up all the other cpus.
        * Each slave cpu that is not already in kgdb_handle_exception()
          will enter and increment slaves_in_kgdb.  Each slave will now spin
          try_locking on dbg_slave_lock.
        * The master cpu waits for the sum of masters_in_kgdb and slaves_in_kgdb
          to be equal to the sum of the online cpus.
        * The master cpu debugs the system.
      
      In the new design the kgdb_active can only be changed while holding
      dbg_master_lock.  Stress testing has not turned up any further
      entry/exit races that existed in the prior locking design.  The prior
      locking design suffered from atomic variables not being truly atomic
      (in the capacity as used by kgdb) along with memory barrier races.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NDongdong Deng <dongdong.deng@windriver.com>
      dfee3a7b
    • D
      debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter() · c1bb9a9c
      Dongdong Deng 提交于
      The slave cpus do not have the hw breakpoints disabled upon entry to
      the debug_core and as a result could cause unrecoverable recursive
      faults on badly placed breakpoints, or get out of sync with the arch
      specific hw breakpoint operations.
      
      This patch addresses the problem by invoking kgdb_disable_hw_debug()
      earlier in kgdb_enter_cpu for each cpu that enters the debug core.
      
      The hw breakpoint dis/enable flow should be:
      
      master_debug_cpu   slave_debug_cpu
               \              /
                kgdb_cpu_enter
                      |
              kgdb_disable_hw_debug --> uninstall pre-enabled hw_breakpoint
                      |
       do add/rm dis/enable operates to hw_breakpoints on master_debug_cpu..
                      |
              correct_hw_break --> correct/install the enabled hw_breakpoint
                      |
                 leave_kgdb
      Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      c1bb9a9c
    • J
      debug_core: stop rcu warnings on kernel resume · fb70b588
      Jason Wessel 提交于
      When returning from the kernel debugger reset the rcu jiffies_stall
      value to prevent the rcu stall detector from sending NMI events which
      invoke a stack dump for each cpu in the system.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      fb70b588
    • J
      debug_core: move all watch dog syncs to a single function · 16cdc628
      Jason Wessel 提交于
      Move the various clock and watch dog syncs to a single function in
      advance of adding another sync for the rcu stall detector.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      16cdc628
  16. 20 8月, 2010 1 次提交
  17. 05 8月, 2010 1 次提交
  18. 22 7月, 2010 1 次提交
  19. 19 7月, 2010 1 次提交
  20. 21 5月, 2010 11 次提交
    • J
      x86, kgdb, init: Add early and late debug states · 0b4b3827
      Jason Wessel 提交于
      The kernel debugger can operate well before mm_init(), but the x86
      hardware breakpoint code which uses the perf api requires that the
      kernel allocators are initialized.
      
      This means the kernel debug core needs to provide an optional arch
      specific call back to allow the initialization functions to run after
      the kernel has been further initialized.
      
      The kdb shell already had a similar restriction with an early
      initialization and late initialization.  The kdb_init() was moved into
      the debug core's version of the late init which is called
      dbg_late_init();
      
      CC: kgdb-bugreport@lists.sourceforge.net
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      0b4b3827
    • J
      kdb,debug_core: Allow the debug core to receive a panic notification · 4402c153
      Jason Wessel 提交于
      It is highly desirable to trap into kdb on panic.  The debug core will
      attempt to register as the first in line for the panic notifier.
      
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      4402c153
    • J
      debug_core,kdb: Allow the debug core to process a recursive debug entry · 6d906340
      Jason Wessel 提交于
      This allows kdb to debug a crash with in the kms code with a
      single level recursive re-entry.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      6d906340
    • J
      kgdb: Add the ability to schedule a breakpoint via a tasklet · 1cee5e35
      Jason Wessel 提交于
      Some kgdb I/O modules require the ability to create a breakpoint
      tasklet, such as kgdboc and external modules such as kgdboe.  The
      breakpoint tasklet is used as an asynchronous entry point into the
      debugger which will have a different function scope than the current
      execution path where it might not be safe to have an inline
      breakpoint.  This is true of some of the kgdb I/O drivers which share
      code with kgdb and rest of the kernel users.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      1cee5e35
    • J
      x86,kgdb: Add low level debug hook · f503b5ae
      Jason Wessel 提交于
      The only way the debugger can handle a trap in inside rcu_lock,
      notify_die, or atomic_notifier_call_chain without a triple fault is
      to have a low level "first opportunity handler" in the int3 exception
      handler.
      
      Generally this will be something the vast majority of folks will not
      need, but for those who need it, it is added as a kernel .config
      option called KGDB_LOW_LEVEL_TRAP.
      
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: H. Peter Anvin <hpa@zytor.com>
      CC: x86@kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      f503b5ae
    • J
      kgdb: remove post_primary_code references · 98ec1878
      Jason Wessel 提交于
      Remove all the references to the kgdb_post_primary_code.  This
      function serves no useful purpose because you can obtain the same
      information from the "struct kgdb_state *ks" from with in the
      debugger, if for some reason you want the data.
      
      Also remove the unintentional duplicate assignment for ks->ex_vector.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      98ec1878
    • J
      kgdb: gdb "monitor" -> kdb passthrough · a0de055c
      Jason Wessel 提交于
      One of the driving forces behind integrating another front end (kdb)
      to the debug core is to allow front end commands to be accessible via
      gdb's monitor command.  It is true that you could write gdb macros to
      get certain data, but you may want to just use gdb to access the
      commands that are available in the kdb front end.
      
      This patch implements the Rcmd gdb stub packet.  In gdb you access
      this with the "monitor" command.  For instance you could type "monitor
      help", "monitor lsmod" or "monitor ps A" etc...
      
      There is no error checking or command restrictions on what you can and
      cannot access at this point.  Doing something like trying to set
      breakpoints with the monitor command is going to cause nothing but
      problems.  Perhaps in the future only the commands that are actually
      known to work with the gdb monitor command will be available.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      a0de055c
    • J
      kgdb,8250,pl011: Return immediately from console poll · f5316b4a
      Jason Wessel 提交于
      The design of the kdb shell requires that every device that can
      provide input to kdb have a polling routine that exits immediately if
      there is no character available.  This is required in order to get the
      page scrolling mechanism working.
      
      Changing the kernel debugger I/O API to require all polling character
      routines to exit immediately if there is no data allows the kernel
      debugger to process multiple input channels.
      
      NO_POLL_CHAR will be the return code to the polling routine when ever
      there is no character available.
      
      CC: linux-serial@vger.kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      f5316b4a
    • J
      kgdb: core changes to support kdb · dcc78711
      Jason Wessel 提交于
      These are the minimum changes to the kgdb core in order to enable an
      API to connect a new front end (kdb) to the debug core.
      
      This patch introduces the dbg_kdb_mode variable controls where the
      user level I/O is routed.  It will be routed to the gdbstub (kgdb) or
      to the kdb front end which is a simple shell available over the kgdboc
      connection.
      
      You can switch back and forth between kdb or the gdb stub mode of
      operation dynamically.  From gdb stub mode you can blindly type
      "$3#33", or from the kdb mode you can enter "kgdb" to switch to the
      gdb stub.
      
      The logic in the debug core depends on kdb to look for the typical gdb
      connection sequences and return immediately with KGDB_PASS_EVENT if a
      gdb serial command sequence is detected.  That should allow a
      reasonably seamless transition between kdb -> gdb without leaving the
      kernel exception state.  The two gdb serial queries that kdb is
      responsible for detecting are the "?" and "qSupported" packets.
      
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NMartin Hicks <mort@sgi.com>
      dcc78711
    • J
      Separate the gdbstub from the debug core · 53197fc4
      Jason Wessel 提交于
      Split the former kernel/kgdb.c into debug_core.c which contains the
      kernel debugger exception logic and to the gdbstub.c which contains
      the logic for allowing gdb to talk to the debug core.
      
      This also created a private include file called debug_core.h which
      contains all the definitions to glue the debug_core to any other
      debugger connections.
      
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      53197fc4
    • J
      Move kernel/kgdb.c to kernel/debug/debug_core.c · c4338209
      Jason Wessel 提交于
      Move kgdb.c in preparation to separate the gdbstub from the debug
      core and exception handling.
      
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      c4338209
  21. 03 4月, 2010 4 次提交
  22. 01 2月, 2010 1 次提交
    • J
      softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume · d6ad3e28
      Jason Wessel 提交于
      When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, sched_clock() gets
      the time from hardware such as the TSC on x86. In this
      configuration kgdb will report a softlock warning message on
      resuming or detaching from a debug session.
      
      Sequence of events in the problem case:
      
       1) "cpu sched clock" and "hardware time" are at 100 sec prior
          to a call to kgdb_handle_exception()
      
       2) Debugger waits in kgdb_handle_exception() for 80 sec and on
          exit the following is called ...  touch_softlockup_watchdog() -->
          __raw_get_cpu_var(touch_timestamp) = 0;
      
       3) "cpu sched clock" = 100s (it was not updated, because the
          interrupt was disabled in kgdb) but the "hardware time" = 180 sec
      
       4) The first timer interrupt after resuming from
          kgdb_handle_exception updates the watchdog from the "cpu sched clock"
      
      update_process_times() { ...  run_local_timers() -->
      softlockup_tick() --> check (touch_timestamp == 0) (it is "YES"
      here, we have set "touch_timestamp = 0" at kgdb) -->
      __touch_softlockup_watchdog() ***(A)--> reset "touch_timestamp"
      to "get_timestamp()" (Here, the "touch_timestamp" will still be
      set to 100s.)  ...
      
          scheduler_tick() ***(B)--> sched_clock_tick() (update "cpu sched
          clock" to "hardware time" = 180s) ...  }
      
       5) The Second timer interrupt handler appears to have a large
          jump and trips the softlockup warning.
      
      update_process_times() { ...  run_local_timers() -->
      softlockup_tick() --> "cpu sched clock" - "touch_timestamp" =
      180s-100s > 60s --> printk "soft lockup error messages" ...  }
      
      note: ***(A) reset "touch_timestamp" to
      "get_timestamp(this_cpu)"
      
      Why is "touch_timestamp" 100 sec, instead of 180 sec?
      
      When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, the call trace of
      get_timestamp() is:
      
      get_timestamp(this_cpu)
       -->cpu_clock(this_cpu)
       -->sched_clock_cpu(this_cpu)
       -->__update_sched_clock(sched_clock_data, now)
      
      The __update_sched_clock() function uses the GTOD tick value to
      create a window to normalize the "now" values.  So if "now"
      value is too big for sched_clock_data, it will be ignored.
      
      The fix is to invoke sched_clock_tick() to update "cpu sched
      clock" in order to recover from this state.  This is done by
      introducing the function touch_softlockup_watchdog_sync(). This
      allows kgdb to request that the sched clock is updated when the
      watchdog thread runs the first time after a resume from kgdb.
      
      [yong.zhang0@gmail.com: Use per cpu instead of an array]
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NDongdong Deng <Dongdong.Deng@windriver.com>
      Cc: kgdb-bugreport@lists.sourceforge.net
      Cc: peterz@infradead.org
      LKML-Reference: <1264631124-4837-2-git-send-email-jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6ad3e28