1. 26 6月, 2009 1 次提交
    • K
      x86: Add sysctl to allow panic on IOCK NMI error · 5211a242
      Kurt Garloff 提交于
      This patch introduces a new sysctl:
      
          /proc/sys/kernel/panic_on_io_nmi
      
      which defaults to 0 (off).
      
      When enabled, the kernel panics when the kernel receives an NMI
      caused by an IO error.
      
      The IO error triggered NMI indicates a serious system
      condition, which could result in IO data corruption. Rather
      than contiuing, panicing and dumping might be a better choice,
      so one can figure out what's causing the IO error.
      
      This could be especially important to companies running IO
      intensive applications where corruption must be avoided, e.g. a
      bank's databases.
      
      [ SuSE has been shipping it for a while, it was done at the
        request of a large database vendor, for their users. ]
      Signed-off-by: NKurt Garloff <garloff@suse.de>
      Signed-off-by: NRoberto Angelino <robertangelino@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      LKML-Reference: <20090624213211.GA11291@kroah.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5211a242
  2. 23 6月, 2009 1 次提交
  3. 19 6月, 2009 1 次提交
  4. 17 6月, 2009 1 次提交
  5. 13 6月, 2009 1 次提交
  6. 11 6月, 2009 2 次提交
  7. 04 6月, 2009 1 次提交
  8. 26 5月, 2009 1 次提交
    • P
      perf_counter: Generic per counter interrupt throttle · a78ac325
      Peter Zijlstra 提交于
      Introduce a generic per counter interrupt throttle.
      
      This uses the perf_counter_overflow() quick disable to throttle a specific
      counter when its going too fast when a pmu->unthrottle() method is provided
      which can undo the quick disable.
      
      Power needs to implement both the quick disable and the unthrottle method.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090525153931.703093461@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a78ac325
  9. 15 5月, 2009 1 次提交
    • J
      Revert "mm: add /proc controls for pdflush threads" · cd17cbfd
      Jens Axboe 提交于
      This reverts commit fafd688e.
      
      Work is progressing to switch away from pdflush as the process backing
      for flushing out dirty data. So it seems pointless to add more knobs
      to control pdflush threads. The original author of the patch did not
      have any specific use cases for adding the knobs, so we can easily
      revert this before 2.6.30 to avoid having to maintain this API
      forever.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cd17cbfd
  10. 13 5月, 2009 1 次提交
  11. 12 5月, 2009 1 次提交
    • H
      x86: add extension fields for bootloader type and version · 5031296c
      H. Peter Anvin 提交于
      A long ago, in days of yore, it all began with a god named Thor.
      There were vikings and boats and some plans for a Linux kernel
      header.  Unfortunately, a single 8-bit field was used for bootloader
      type and version.  This has generally worked without *too* much pain,
      but we're getting close to flat running out of ID fields.
      
      Add extension fields for both type and version.  The type will be
      extended if it the old field is 0xE; the version is a simple MSB
      extension.
      
      Keep /proc/sys/kernel/bootloader_type containing
      (type << 4) + (ver & 0xf) for backwards compatiblity, but also add
      /proc/sys/kernel/bootloader_version which contains the full version
      number.
      
      [ Impact: new feature to support more bootloaders ]
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5031296c
  12. 06 5月, 2009 1 次提交
  13. 03 5月, 2009 1 次提交
    • A
      mm: prevent divide error for small values of vm_dirty_bytes · 9e4a5bda
      Andrea Righi 提交于
      Avoid setting less than two pages for vm_dirty_bytes: this is necessary to
      avoid potential division by 0 (like the following) in get_dirty_limits().
      
      [   49.951610] divide error: 0000 [#1] PREEMPT SMP
      [   49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
      [   49.952195] CPU 1
      [   49.952195] Modules linked in: pcspkr
      [   49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
      [   49.952195] RIP: 0010:[<ffffffff802d39a9>]  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
      [   49.952195] RSP: 0018:ffff88001de03a98  EFLAGS: 00010202
      [   49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
      [   49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
      [   49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
      [   49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
      [   49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
      [   49.952195] FS:  00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
      [   49.952195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
      [   49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
      [   49.952195] Stack:
      [   49.952195]  ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
      [   49.952195]  00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
      [   49.952195]  0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
      [   49.952195] Call Trace:
      [   49.952195]  [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
      [   49.952195]  [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120
      [   49.952195]  [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330
      [   49.952195]  [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460
      [   49.952195]  [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0
      [   49.952195]  [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0
      [   49.952195]  [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0
      [   49.952195]  [<ffffffff803034d1>] do_sync_write+0xf1/0x140
      [   49.952195]  [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60
      [   49.952195]  [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40
      [   49.952195]  [<ffffffff8030411b>] vfs_write+0xcb/0x190
      [   49.952195]  [<ffffffff803042d0>] sys_write+0x50/0x90
      [   49.952195]  [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b
      [   49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
      [   49.952195] RIP  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
      [   49.952195]  RSP <ffff88001de03a98>
      [   50.096523] ---[ end trace 008d7aa02f244d7b ]---
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e4a5bda
  14. 14 4月, 2009 1 次提交
  15. 09 4月, 2009 1 次提交
  16. 07 4月, 2009 2 次提交
    • P
      mm: add /proc controls for pdflush threads · fafd688e
      Peter W Morreale 提交于
      Add /proc entries to give the admin the ability to control the minimum and
      maximum number of pdflush threads.  This allows finer control of pdflush
      on both large and small machines.
      
      The rationale is simply one size does not fit all.  Admins on large and/or
      small systems may want to tune the min/max pdflush thread count to best
      suit their needs.  Right now the min/max is hardcoded to 2/8.  While
      probably a fair estimate for smaller machines, large machines with large
      numbers of CPUs and large numbers of filesystems/block devices may benefit
      from larger numbers of threads working on different block devices.
      
      Even if the background flushing algorithm is radically changed, it is
      still likely that multiple threads will be involved and admins would still
      desire finer control on the min/max other than to have to recompile the
      kernel.
      
      The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
      '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.
      
      The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
      the current value of nr_pdflush_threads_max.  This minimum is required
      since additional thread creation is performed in a pdflush thread itself.
      
      The minimum value for nr_pdflush_threads_max is the current value of
      nr_pdflush_threads_min and the maximum value can be 1000.
      
      Documentation/sysctl/vm.txt is also updated.
      
      [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
      Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fafd688e
    • L
      kernel/sysctl.c: avoid annoying warnings · cd5f9a4c
      Linus Torvalds 提交于
      Some of the limit constants are used only depending on some complex
      configuration dependencies, yet it's not worth making the simple
      variables depend on those configuration details.  Just mark them as
      perhaps not being unused, and avoid the warning.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd5f9a4c
  17. 03 4月, 2009 3 次提交
  18. 01 4月, 2009 1 次提交
  19. 13 2月, 2009 1 次提交
  20. 12 2月, 2009 1 次提交
  21. 16 1月, 2009 2 次提交
  22. 14 1月, 2009 2 次提交
  23. 08 1月, 2009 1 次提交
    • P
      NOMMU: Make mmap allocation page trimming behaviour configurable. · dd8632a1
      Paul Mundt 提交于
      NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
      the nearest power-of-2 number of pages.  Currently it then discards the excess
      pages back to the page allocator, making that memory available for use by other
      things.  This can, however, cause greater amount of fragmentation.
      
      To counter this, a sysctl is added in order to fine-tune the trimming
      behaviour.  The default behaviour remains to trim pages aggressively, while
      this can either be disabled completely or set to a higher page-granular
      watermark in order to have finer-grained control.
      
      vm region vm_top bits taken from an earlier patch by David Howells.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      dd8632a1
  24. 07 1月, 2009 1 次提交
    • D
      mm: add dirty_background_bytes and dirty_bytes sysctls · 2da02997
      David Rientjes 提交于
      This change introduces two new sysctls to /proc/sys/vm:
      dirty_background_bytes and dirty_bytes.
      
      dirty_background_bytes is the counterpart to dirty_background_ratio and
      dirty_bytes is the counterpart to dirty_ratio.
      
      With growing memory capacities of individual machines, it's no longer
      sufficient to specify dirty thresholds as a percentage of the amount of
      dirtyable memory over the entire system.
      
      dirty_background_bytes and dirty_bytes specify quantities of memory, in
      bytes, that represent the dirty limits for the entire system.  If either
      of these values is set, its value represents the amount of dirty memory
      that is needed to commence either background or direct writeback.
      
      When a `bytes' or `ratio' file is written, its counterpart becomes a
      function of the written value.  For example, if dirty_bytes is written to
      be 8096, 8K of memory is required to commence direct writeback.
      dirty_ratio is then functionally equivalent to 8K / the amount of
      dirtyable memory:
      
      	dirtyable_memory = free pages + mapped pages + file cache
      
      	dirty_background_bytes = dirty_background_ratio * dirtyable_memory
      		-or-
      	dirty_background_ratio = dirty_background_bytes / dirtyable_memory
      
      		AND
      
      	dirty_bytes = dirty_ratio * dirtyable_memory
      		-or-
      	dirty_ratio = dirty_bytes / dirtyable_memory
      
      Only one of dirty_background_bytes and dirty_background_ratio may be
      specified at a time, and only one of dirty_bytes and dirty_ratio may be
      specified.  When one sysctl is written, the other appears as 0 when read.
      
      The `bytes' files operate on a page size granularity since dirty limits
      are compared with ZVC values, which are in page units.
      
      Prior to this change, the minimum dirty_ratio was 5 as implemented by
      get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
      written value between 0 and 100.  This restriction is maintained, but
      dirty_bytes has a lower limit of only one page.
      
      Also prior to this change, the dirty_background_ratio could not equal or
      exceed dirty_ratio.  This restriction is maintained in addition to
      restricting dirty_background_bytes.  If either background threshold equals
      or exceeds that of the dirty threshold, it is implicitly set to half the
      dirty threshold.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2da02997
  25. 18 12月, 2008 1 次提交
    • S
      trace: add a way to enable or disable the stack tracer · f38f1d2a
      Steven Rostedt 提交于
      Impact: enhancement to stack tracer
      
      The stack tracer currently is either on when configured in or
      off when it is not. It can not be disabled when it is configured on.
      (besides disabling the function tracer that it uses)
      
      This patch adds a way to enable or disable the stack tracer at
      run time. It defaults off on bootup, but a kernel parameter 'stacktrace'
      has been added to enable it on bootup.
      
      A new sysctl has been added "kernel.stack_tracer_enabled" to let
      the user enable or disable the stack tracer at run time.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f38f1d2a
  26. 05 12月, 2008 1 次提交
    • D
      sparc64: Add tsb-ratio sysctl. · 0871420f
      David S. Miller 提交于
      Add a sysctl to tweak the RSS limit used to decide when to grow
      the TSB for an address space.
      
      In order to avoid expensive divides and multiplies only simply
      positive and negative powers of two are supported.
      
      The function computed takes the number of TSB translations that will
      fit at one time in the TSB of a given size, and either adds or
      subtracts a percentage of entries.  This final value is the
      RSS limit.
      
      See tsb_size_to_rss_limit().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0871420f
  27. 02 12月, 2008 1 次提交
    • D
      epoll: introduce resource usage limits · 7ef9964e
      Davide Libenzi 提交于
      It has been thought that the per-user file descriptors limit would also
      limit the resources that a normal user can request via the epoll
      interface.  Vegard Nossum reported a very simple program (a modified
      version attached) that can make a normal user to request a pretty large
      amount of kernel memory, well within the its maximum number of fds.  To
      solve such problem, default limits are now imposed, and /proc based
      configuration has been introduced.  A new directory has been created,
      named /proc/sys/fs/epoll/ and inside there, there are two configuration
      points:
      
        max_user_instances = Maximum number of devices - per user
      
        max_user_watches   = Maximum number of "watched" fds - per user
      
      The current default for "max_user_watches" limits the memory used by epoll
      to store "watches", to 1/32 of the amount of the low RAM.  As example, a
      256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
      That should be enough to not break existing heavy epoll users.  The
      default value for "max_user_instances" is set to 128, that should be
      enough too.
      
      This also changes the userspace, because a new error code can now come out
      from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
      listed, so that should be ok.
      
      [akpm@linux-foundation.org: use get_current_user()]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: <stable@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: NVegard Nossum <vegardno@ifi.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef9964e
  28. 14 11月, 2008 1 次提交
  29. 04 11月, 2008 1 次提交
  30. 27 10月, 2008 1 次提交
    • S
      ftrace: ftrace dump on oops control · 944ac425
      Steven Rostedt 提交于
      Impact: add (default-off) dump-trace-on-oops flag
      
      Currently, ftrace is set up to dump its contents to the console if the
      kernel panics or oops. This can be annoying if you have trace data in
      the buffers and you experience an oops, but the trace data is old or
      static.
      
      Usually when you want ftrace to dump its contents is when you are debugging
      your system and you have set up ftrace to trace the events leading to
      an oops.
      
      This patch adds a control variable called "ftrace_dump_on_oops" that will
      enable the ftrace dump to console on oops. This variable is default off
      but a developer can enable it either through the kernel command line
      by adding "ftrace_dump_on_oops" or at run time by setting (or disabling)
      /proc/sys/kernel/ftrace_dump_on_oops.
      
      v2:
      
         Replaced /** with /* as Randy explained that kernel-doc does
          not yet handle variables.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      944ac425
  31. 21 10月, 2008 1 次提交
  32. 20 10月, 2008 2 次提交
  33. 17 10月, 2008 1 次提交