1. 27 10月, 2010 2 次提交
    • N
      printk: declare printk_ratelimit_state in ratelimit.h · f5d87d85
      Namhyung Kim 提交于
      Adding declaration of printk_ratelimit_state in ratelimit.h removes
      potential build breakage and following sparse warning:
      
       kernel/printk.c:1426:1: warning: symbol 'printk_ratelimit_state' was not declared. Should it be static?
      
      [akpm@linux-foundation.org: remove unneeded ifdef]
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5d87d85
    • E
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet 提交于
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of atomic_t.
      
      get_max_files() is changed to return an unsigned long.  get_nr_files() is
      changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      518de9b3
  2. 26 10月, 2010 3 次提交
    • C
      fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8
      Christoph Hellwig 提交于
      The nr_dentry stat is a globally touched cacheline and atomic operation
      twice over the lifetime of a dentry. It is used for the benfit of userspace
      only. Turn it into a per-cpu counter and always decrement it in d_free instead
      of doing various batching operations to reduce lock hold times in the callers.
      
      Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      312d3ca8
    • D
      fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa
      Dave Chinner 提交于
      The number of inodes allocated does not need to be tied to the
      addition or removal of an inode to/from a list. If we are not tied
      to a list lock, we could update the counters when inodes are
      initialised or destroyed, but to do that we need to convert the
      counters to be per-cpu (i.e. independent of a lock). This means that
      we have the freedom to change the list/locking implementation
      without needing to care about the counters.
      
      Based on a patch originally from Eric Dumazet.
      
      [AV: cleaned up a bit, fixed build breakage on weird configs
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cffbc8aa
    • E
      fs: allow for more than 2^31 files · 7e360c38
      Eric Dumazet 提交于
      Andrew,
      
      Could you please review this patch, you probably are the right guy to
      take it, because it crosses fs and net trees.
      
      Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
      depend on previous patch (sysctl: fix min/max handling in
      __do_proc_doulongvec_minmax())
      
      Thanks !
      
      [PATCH V4] fs: allow for more than 2^31 files
      
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of
      atomic_t.
      
      get_max_files() is changed to return an unsigned long.
      get_nr_files() is changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e360c38
  3. 08 10月, 2010 1 次提交
  4. 05 9月, 2010 1 次提交
  5. 10 8月, 2010 1 次提交
  6. 06 8月, 2010 1 次提交
  7. 28 7月, 2010 2 次提交
  8. 23 7月, 2010 1 次提交
  9. 03 6月, 2010 1 次提交
  10. 26 5月, 2010 1 次提交
  11. 25 5月, 2010 2 次提交
  12. 22 5月, 2010 2 次提交
  13. 17 5月, 2010 1 次提交
    • H
      [S390] debug: enable exception-trace debug facility · ab3c68ee
      Heiko Carstens 提交于
      The exception-trace facility on x86 and other architectures prints
      traces to dmesg whenever a user space application crashes.
      s390 has such a feature since ages however it is called
      userprocess_debug and is enabled differently.
      This patch makes sure that whenever one of the two procfs files
      
      /proc/sys/kernel/userprocess_debug
      /proc/sys/debug/exception-trace
      
      is modified the contents of the second one changes as well.
      That way we keep backwards compatibilty but also support the same
      interface like other architectures do.
      Besides that the output of the traces is improved since it will now
      also contain the corresponding filename of the vma (when available)
      where the process caused a fault or trap.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ab3c68ee
  14. 16 5月, 2010 2 次提交
    • O
      sysctl: add proc_do_large_bitmap · 9f977fb7
      Octavian Purdila 提交于
      The new function can be used to read/write large bitmaps via /proc. A
      comma separated range format is used for compact output and input
      (e.g. 1,3-4,10-10).
      
      Writing into the file will first reset the bitmap then update it
      based on the given input.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f977fb7
    • A
      sysctl: refactor integer handling proc code · 00b7c339
      Amerigo Wang 提交于
      (Based on Octavian's work, and I modified a lot.)
      
      As we are about to add another integer handling proc function a little
      bit of cleanup is in order: add a few helper functions to improve code
      readability and decrease code duplication.
      
      In the process a bug is also fixed: if the user specifies a number
      with more then 20 digits it will be interpreted as two integers
      (e.g. 10000...13 will be interpreted as 100.... and 13).
      
      Behavior for EFAULT handling was changed as well. Previous to this
      patch, when an EFAULT error occurred in the middle of a write
      operation, although some of the elements were set, that was not
      acknowledged to the user (by shorting the write and returning the
      number of bytes accepted). EFAULT is now treated just like any other
      errors by acknowledging the amount of bytes accepted.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00b7c339
  15. 13 5月, 2010 3 次提交
    • D
      lockup_detector: Remove old softlockup code · 2508ce18
      Don Zickus 提交于
      Now that is no longer compiled or used, just remove it.
      
      Also move some of the code wrapped with DETECT_SOFTLOCKUP to the
      LOCKUP_DETECTOR wrappers because that is the code that uses it now.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      2508ce18
    • D
      lockup_detector: Touch_softlockup cleanups and softlockup_tick removal · 332fbdbc
      Don Zickus 提交于
      Just some code cleanup to make touch_softlockup clearer and remove the
      softlockup_tick function as it is no longer needed.
      
      Also remove the /proc softlockup_thres call as it has been changed to
      watchdog_thres.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      332fbdbc
    • D
      lockup_detector: Combine nmi_watchdog and softlockup detector · 58687acb
      Don Zickus 提交于
      The new nmi_watchdog (which uses the perf event subsystem) is very
      similar in structure to the softlockup detector.  Using Ingo's
      suggestion, I combined the two functionalities into one file:
      kernel/watchdog.c.
      
      Now both the nmi_watchdog (or hardlockup detector) and softlockup
      detector sit on top of the perf event subsystem, which is run every
      60 seconds or so to see if there are any lockups.
      
      To detect hardlockups, cpus not responding to interrupts, I
      implemented an hrtimer that runs 5 times for every perf event
      overflow event.  If that stops counting on a cpu, then the cpu is
      most likely in trouble.
      
      To detect softlockups, tasks not yielding to the scheduler, I used the
      previous kthread idea that now gets kicked every time the hrtimer fires.
      If the kthread isn't being scheduled neither is anyone else and the
      warning is printed to the console.
      
      I tested this on x86_64 and both the softlockup and hardlockup paths
      work.
      
      V2:
      - cleaned up the Kconfig and softlockup combination
      - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
      - seperated out the softlockup case from perf event subsystem
      - re-arranged the enabling/disabling nmi watchdog from proc space
      - added cpumasks for hardlockup failure cases
      - removed fallback to soft events if no PMU exists for hard events
      
      V3:
      - comment cleanups
      - drop support for older softlockup code
      - per_cpu cleanups
      - completely remove software clock base hardlockup detector
      - use per_cpu masking on hard/soft lockup detection
      - #ifdef cleanups
      - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
      - documentation additions
      
      V4:
      - documentation fixes
      - convert per_cpu to __get_cpu_var
      - powerpc compile fixes
      
      V5:
      - split apart warn flags for hard and soft lockups
      
      TODO:
      - figure out how to make an arch-agnostic clock2cycles call
        (if possible) to feed into perf events as a sample period
      
      [fweisbec: merged conflict patch]
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      58687acb
  16. 14 4月, 2010 1 次提交
    • D
      Input: implement SysRq as a separate input handler · 97f5f0cd
      Dmitry Torokhov 提交于
      Instead of keeping SysRq support inside of legacy keyboard driver split
      it out into a separate input handler (filter). This stops most SysRq input
      events from leaking into evdev clients (some events, such as first SysRq
      scancode - not keycode - event, are still leaked into both legacy keyboard
      and evdev).
      
      [martinez.javier@gmail.com: fix compile error when CONFIG_MAGIC_SYSRQ is
       not defined]
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      97f5f0cd
  17. 13 3月, 2010 8 次提交
  18. 01 3月, 2010 1 次提交
  19. 26 2月, 2010 1 次提交
    • M
      kprobes: Jump optimization sysctl interface · b2be84df
      Masami Hiramatsu 提交于
      Add /proc/sys/debug/kprobes-optimization sysctl which enables
      and disables kprobes jump optimization on the fly for debugging.
      
      Changes in v7:
       - Remove ctl_name = CTL_UNNUMBERED for upstream compatibility.
      
      Changes in v6:
      - Update comments and coding style.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Anders Kaseorg <andersk@ksplice.com>
      Cc: Tim Abbott <tabbott@ksplice.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      LKML-Reference: <20100225133415.6725.8274.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2be84df
  20. 14 2月, 2010 1 次提交
  21. 18 12月, 2009 1 次提交
  22. 17 12月, 2009 1 次提交
  23. 16 12月, 2009 2 次提交
    • A
      'sysctl_max_map_count' should be non-negative · 70da2340
      Amerigo Wang 提交于
      Jan Engelhardt reported we have this problem:
      
      setting max_map_count to a value large enough results in programs dying at
      first try.  This is on 2.6.31.6:
      
      15:59 borg:/proc/sys/vm # echo $[1<<31-1] >max_map_count
      15:59 borg:/proc/sys/vm # cat max_map_count
      1073741824
      15:59 borg:/proc/sys/vm # echo $[1<<31] >max_map_count
      15:59 borg:/proc/sys/vm # cat max_map_count
      Killed
      
      This is because we have a chance to make 'max_map_count' negative.  but
      it's meaningless.  Make it only accept non-negative values.
      Reported-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: James Morris <jmorris@namei.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70da2340
    • L
      hugetlb: derive huge pages nodes allowed from task mempolicy · 06808b08
      Lee Schermerhorn 提交于
      This patch derives a "nodes_allowed" node mask from the numa mempolicy of
      the task modifying the number of persistent huge pages to control the
      allocation, freeing and adjusting of surplus huge pages when the pool page
      count is modified via the new sysctl or sysfs attribute
      "nr_hugepages_mempolicy".  The nodes_allowed mask is derived as follows:
      
      * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
        is produced.  This will cause the hugetlb subsystem to use
        node_online_map as the "nodes_allowed".  This preserves the
        behavior before this patch.
      * For "preferred" mempolicy, including explicit local allocation,
        a nodemask with the single preferred node will be produced.
        "local" policy will NOT track any internode migrations of the
        task adjusting nr_hugepages.
      * For "bind" and "interleave" policy, the mempolicy's nodemask
        will be used.
      * Other than to inform the construction of the nodes_allowed node
        mask, the actual mempolicy mode is ignored.  That is, all modes
        behave like interleave over the resulting nodes_allowed mask
        with no "fallback".
      
      See the updated documentation [next patch] for more information
      about the implications of this patch.
      
      Examples:
      
      Starting with:
      
      	Node 0 HugePages_Total:     0
      	Node 1 HugePages_Total:     0
      	Node 2 HugePages_Total:     0
      	Node 3 HugePages_Total:     0
      
      Default behavior [with or without this patch] balances persistent
      hugepage allocation across nodes [with sufficient contiguous memory]:
      
      	sysctl vm.nr_hugepages[_mempolicy]=32
      
      yields:
      
      	Node 0 HugePages_Total:     8
      	Node 1 HugePages_Total:     8
      	Node 2 HugePages_Total:     8
      	Node 3 HugePages_Total:     8
      
      Of course, we only have nr_hugepages_mempolicy with the patch,
      but with default mempolicy, nr_hugepages_mempolicy behaves the
      same as nr_hugepages.
      
      Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
      '--membind' because it allows multiple nodes to be specified
      and it's easy to type]--we can allocate huge pages on
      individual nodes or sets of nodes.  So, starting from the
      condition above, with 8 huge pages per node, add 8 more to
      node 2 using:
      
      	numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
      
      This yields:
      
      	Node 0 HugePages_Total:     8
      	Node 1 HugePages_Total:     8
      	Node 2 HugePages_Total:    16
      	Node 3 HugePages_Total:     8
      
      The incremental 8 huge pages were restricted to node 2 by the
      specified mempolicy.
      
      Similarly, we can use mempolicy to free persistent huge pages
      from specified nodes:
      
      	numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
      
      yields:
      
      	Node 0 HugePages_Total:     4
      	Node 1 HugePages_Total:     4
      	Node 2 HugePages_Total:    16
      	Node 3 HugePages_Total:     8
      
      The 8 huge pages freed were balanced over nodes 0 and 1.
      
      [rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06808b08