1. 25 5月, 2010 1 次提交
  2. 22 5月, 2010 2 次提交
  3. 17 5月, 2010 1 次提交
    • H
      [S390] debug: enable exception-trace debug facility · ab3c68ee
      Heiko Carstens 提交于
      The exception-trace facility on x86 and other architectures prints
      traces to dmesg whenever a user space application crashes.
      s390 has such a feature since ages however it is called
      userprocess_debug and is enabled differently.
      This patch makes sure that whenever one of the two procfs files
      
      /proc/sys/kernel/userprocess_debug
      /proc/sys/debug/exception-trace
      
      is modified the contents of the second one changes as well.
      That way we keep backwards compatibilty but also support the same
      interface like other architectures do.
      Besides that the output of the traces is improved since it will now
      also contain the corresponding filename of the vma (when available)
      where the process caused a fault or trap.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ab3c68ee
  4. 16 5月, 2010 2 次提交
    • O
      sysctl: add proc_do_large_bitmap · 9f977fb7
      Octavian Purdila 提交于
      The new function can be used to read/write large bitmaps via /proc. A
      comma separated range format is used for compact output and input
      (e.g. 1,3-4,10-10).
      
      Writing into the file will first reset the bitmap then update it
      based on the given input.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f977fb7
    • A
      sysctl: refactor integer handling proc code · 00b7c339
      Amerigo Wang 提交于
      (Based on Octavian's work, and I modified a lot.)
      
      As we are about to add another integer handling proc function a little
      bit of cleanup is in order: add a few helper functions to improve code
      readability and decrease code duplication.
      
      In the process a bug is also fixed: if the user specifies a number
      with more then 20 digits it will be interpreted as two integers
      (e.g. 10000...13 will be interpreted as 100.... and 13).
      
      Behavior for EFAULT handling was changed as well. Previous to this
      patch, when an EFAULT error occurred in the middle of a write
      operation, although some of the elements were set, that was not
      acknowledged to the user (by shorting the write and returning the
      number of bytes accepted). EFAULT is now treated just like any other
      errors by acknowledging the amount of bytes accepted.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00b7c339
  5. 14 4月, 2010 1 次提交
    • D
      Input: implement SysRq as a separate input handler · 97f5f0cd
      Dmitry Torokhov 提交于
      Instead of keeping SysRq support inside of legacy keyboard driver split
      it out into a separate input handler (filter). This stops most SysRq input
      events from leaking into evdev clients (some events, such as first SysRq
      scancode - not keycode - event, are still leaked into both legacy keyboard
      and evdev).
      
      [martinez.javier@gmail.com: fix compile error when CONFIG_MAGIC_SYSRQ is
       not defined]
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      97f5f0cd
  6. 13 3月, 2010 8 次提交
  7. 01 3月, 2010 1 次提交
  8. 26 2月, 2010 1 次提交
    • M
      kprobes: Jump optimization sysctl interface · b2be84df
      Masami Hiramatsu 提交于
      Add /proc/sys/debug/kprobes-optimization sysctl which enables
      and disables kprobes jump optimization on the fly for debugging.
      
      Changes in v7:
       - Remove ctl_name = CTL_UNNUMBERED for upstream compatibility.
      
      Changes in v6:
      - Update comments and coding style.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Anders Kaseorg <andersk@ksplice.com>
      Cc: Tim Abbott <tabbott@ksplice.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      LKML-Reference: <20100225133415.6725.8274.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2be84df
  9. 18 12月, 2009 1 次提交
  10. 17 12月, 2009 1 次提交
  11. 16 12月, 2009 2 次提交
    • A
      'sysctl_max_map_count' should be non-negative · 70da2340
      Amerigo Wang 提交于
      Jan Engelhardt reported we have this problem:
      
      setting max_map_count to a value large enough results in programs dying at
      first try.  This is on 2.6.31.6:
      
      15:59 borg:/proc/sys/vm # echo $[1<<31-1] >max_map_count
      15:59 borg:/proc/sys/vm # cat max_map_count
      1073741824
      15:59 borg:/proc/sys/vm # echo $[1<<31] >max_map_count
      15:59 borg:/proc/sys/vm # cat max_map_count
      Killed
      
      This is because we have a chance to make 'max_map_count' negative.  but
      it's meaningless.  Make it only accept non-negative values.
      Reported-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: James Morris <jmorris@namei.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70da2340
    • L
      hugetlb: derive huge pages nodes allowed from task mempolicy · 06808b08
      Lee Schermerhorn 提交于
      This patch derives a "nodes_allowed" node mask from the numa mempolicy of
      the task modifying the number of persistent huge pages to control the
      allocation, freeing and adjusting of surplus huge pages when the pool page
      count is modified via the new sysctl or sysfs attribute
      "nr_hugepages_mempolicy".  The nodes_allowed mask is derived as follows:
      
      * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
        is produced.  This will cause the hugetlb subsystem to use
        node_online_map as the "nodes_allowed".  This preserves the
        behavior before this patch.
      * For "preferred" mempolicy, including explicit local allocation,
        a nodemask with the single preferred node will be produced.
        "local" policy will NOT track any internode migrations of the
        task adjusting nr_hugepages.
      * For "bind" and "interleave" policy, the mempolicy's nodemask
        will be used.
      * Other than to inform the construction of the nodes_allowed node
        mask, the actual mempolicy mode is ignored.  That is, all modes
        behave like interleave over the resulting nodes_allowed mask
        with no "fallback".
      
      See the updated documentation [next patch] for more information
      about the implications of this patch.
      
      Examples:
      
      Starting with:
      
      	Node 0 HugePages_Total:     0
      	Node 1 HugePages_Total:     0
      	Node 2 HugePages_Total:     0
      	Node 3 HugePages_Total:     0
      
      Default behavior [with or without this patch] balances persistent
      hugepage allocation across nodes [with sufficient contiguous memory]:
      
      	sysctl vm.nr_hugepages[_mempolicy]=32
      
      yields:
      
      	Node 0 HugePages_Total:     8
      	Node 1 HugePages_Total:     8
      	Node 2 HugePages_Total:     8
      	Node 3 HugePages_Total:     8
      
      Of course, we only have nr_hugepages_mempolicy with the patch,
      but with default mempolicy, nr_hugepages_mempolicy behaves the
      same as nr_hugepages.
      
      Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
      '--membind' because it allows multiple nodes to be specified
      and it's easy to type]--we can allocate huge pages on
      individual nodes or sets of nodes.  So, starting from the
      condition above, with 8 huge pages per node, add 8 more to
      node 2 using:
      
      	numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
      
      This yields:
      
      	Node 0 HugePages_Total:     8
      	Node 1 HugePages_Total:     8
      	Node 2 HugePages_Total:    16
      	Node 3 HugePages_Total:     8
      
      The incremental 8 huge pages were restricted to node 2 by the
      specified mempolicy.
      
      Similarly, we can use mempolicy to free persistent huge pages
      from specified nodes:
      
      	numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
      
      yields:
      
      	Node 0 HugePages_Total:     4
      	Node 1 HugePages_Total:     4
      	Node 2 HugePages_Total:    16
      	Node 3 HugePages_Total:     8
      
      The 8 huge pages freed were balanced over nodes 0 and 1.
      
      [rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06808b08
  12. 09 12月, 2009 3 次提交
  13. 19 11月, 2009 1 次提交
  14. 12 11月, 2009 1 次提交
  15. 11 11月, 2009 3 次提交
  16. 06 11月, 2009 1 次提交
  17. 24 9月, 2009 3 次提交
  18. 23 9月, 2009 1 次提交
  19. 22 9月, 2009 1 次提交
    • I
      printk: Remove ratelimit.h from kernel.h · 3fff4c42
      Ingo Molnar 提交于
      Decouple kernel.h from ratelimit.h: the global declaration of
      printk's ratelimit_state is not needed, and it leads to messy
      circular dependencies due to ratelimit.h's (new) adding of a
      spinlock_types.h include.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: David S. Miller <davem@davemloft.net>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3fff4c42
  20. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  21. 16 9月, 2009 2 次提交
    • A
      HWPOISON: The high level memory error handler in the VM v7 · 6a46079c
      Andi Kleen 提交于
      Add the high level memory handler that poisons pages
      that got corrupted by hardware (typically by a two bit flip in a DIMM
      or a cache) on the Linux level. The goal is to prevent everyone
      from accessing these pages in the future.
      
      This done at the VM level by marking a page hwpoisoned
      and doing the appropriate action based on the type of page
      it is.
      
      The code that does this is portable and lives in mm/memory-failure.c
      
      To quote the overview comment:
      
      High level machine check handler. Handles pages reported by the
      hardware as being corrupted usually due to a 2bit ECC memory or cache
      failure.
      
      This focuses on pages detected as corrupted in the background.
      When the current CPU tries to consume corruption the currently
      running process can just be killed directly instead. This implies
      that if the error cannot be handled for some reason it's safe to
      just ignore it because no corruption has been consumed yet. Instead
      when that happens another machine check will happen.
      
      Handles page cache pages in various states. The tricky part
      here is that we can access any page asynchronous to other VM
      users, because memory failures could happen anytime and anywhere,
      possibly violating some of their assumptions. This is why this code
      has to be extremely careful. Generally it tries to use normal locking
      rules, as in get the standard locks, even if that means the
      error handling takes potentially a long time.
      
      Some of the operations here are somewhat inefficient and have non
      linear algorithmic complexity, because the data structures have not
      been optimized for this case. This is in particular the case
      for the mapping from a vma to a process. Since this case is expected
      to be rare we hope we can get away with this.
      
      There are in principle two strategies to kill processes on poison:
      - just unmap the data and wait for an actual reference before
      killing
      - kill as soon as corruption is detected.
      Both have advantages and disadvantages and should be used
      in different situations. Right now both are implemented and can
      be switched with a new sysctl vm.memory_failure_early_kill
      The default is early kill.
      
      The patch does some rmap data structure walking on its own to collect
      processes to kill. This is unusual because normally all rmap data structure
      knowledge is in rmap.c only. I put it here for now to keep
      everything together and rmap knowledge has been seeping out anyways
      
      Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
      Nick Piggin (who did a lot of great work) and others.
      
      Cc: npiggin@suse.de
      Cc: riel@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      6a46079c
    • J
      block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK · cb684b5b
      Jens Axboe 提交于
       kernel/built-in.o:(.data+0x17b0): undefined reference to `blk_iopoll_enabled'
      
      Since the extern declaration makes the compile work, but the actual
      symbol is missing when block/blk-iopoll.o isn't linked in.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cb684b5b
  22. 11 9月, 2009 1 次提交
    • J
      block: add blk-iopoll, a NAPI like approach for block devices · 5e605b64
      Jens Axboe 提交于
      This borrows some code from NAPI and implements a polled completion
      mode for block devices. The idea is the same as NAPI - instead of
      doing the command completion when the irq occurs, schedule a dedicated
      softirq in the hopes that we will complete more IO when the iopoll
      handler is invoked. Devices have a budget of commands assigned, and will
      stay in polled mode as long as they continue to consume their budget
      from the iopoll softirq handler. If they do not, the device is set back
      to interrupt completion mode.
      
      This patch holds the core bits for blk-iopoll, device driver support
      sold separately.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5e605b64
  23. 09 9月, 2009 1 次提交
    • M
      sched: Turn off child_runs_first · 2bba22c5
      Mike Galbraith 提交于
      Set child_runs_first default to off.
      
      It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
      get preempted by child tasks, reducing parallelism.
      
      Note, this patch might make existing races in user
      applications more prominent than before - so breakages
      might be bisected to this commit.
      
      Child-runs-first is broken on SMP to begin with, and we
      already had it off briefly in v2.6.23 so most of the
      offenders ought to be fixed. Would be nice not to revert
      this commit but fix those apps finally ...
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
      [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
        people want to work around broken apps. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2bba22c5