1. 24 3月, 2011 2 次提交
  2. 08 3月, 2011 1 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  3. 23 2月, 2011 1 次提交
  4. 16 2月, 2011 1 次提交
  5. 03 2月, 2011 1 次提交
    • R
      sched: Use a buddy to implement yield_task_fair() · ac53db59
      Rik van Riel 提交于
      Use the buddy mechanism to implement yield_task_fair.  This
      allows us to skip onto the next highest priority se at every
      level in the CFS tree, unless doing so would introduce gross
      unfairness in CPU time distribution.
      
      We order the buddy selection in pick_next_entity to check
      yield first, then last, then next.  We need next to be able
      to override yield, because it is possible for the "next" and
      "yield" task to be different processen in the same sub-tree
      of the CFS tree.  When they are, we need to go into that
      sub-tree regardless of the "yield" hint, and pick the correct
      entity once we get to the right level.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ac53db59
  6. 02 2月, 2011 1 次提交
  7. 25 1月, 2011 1 次提交
  8. 14 1月, 2011 3 次提交
  9. 10 12月, 2010 1 次提交
  10. 30 11月, 2010 1 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
  11. 18 11月, 2010 3 次提交
    • P
      sched: Add sysctl_sched_shares_window · a7a4f8a7
      Paul Turner 提交于
      Introduce a new sysctl for the shares window and disambiguate it from
      sched_time_avg.
      
      A 10ms window appears to be a good compromise between accuracy and performance.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234938.112173964@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a7a4f8a7
    • P
      sched: Rewrite tg_shares_up) · 2069dd75
      Peter Zijlstra 提交于
      By tracking a per-cpu load-avg for each cfs_rq and folding it into a
      global task_group load on each tick we can rework tg_shares_up to be
      strictly per-cpu.
      
      This should improve cpu-cgroup performance for smp systems
      significantly.
      
      [ Paul: changed to use queueing cfs_rq + bug fixes ]
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.580480400@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2069dd75
    • D
      x86, nmi_watchdog: Remove the old nmi_watchdog · 5f2b0ba4
      Don Zickus 提交于
      Now that we have a new nmi_watchdog that is more generic and
      sits on top of the perf subsystem, we really do not need the old
      nmi_watchdog any more.
      
      In addition, the old nmi_watchdog doesn't really work if you are
      using the default clocksource, hpet.  The old nmi_watchdog code
      relied on local apic interrupts to determine if the cpu is still
      alive.  With hpet as the clocksource, these interrupts don't
      increment any more and the old nmi_watchdog triggers false
      postives.
      
      This piece removes the old nmi_watchdog code and stubs out any
      variables and functions calls.  The stubs are the same ones used
      by the new nmi_watchdog code, so it should be well tested.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f2b0ba4
  12. 16 11月, 2010 1 次提交
  13. 12 11月, 2010 1 次提交
  14. 27 10月, 2010 2 次提交
    • N
      printk: declare printk_ratelimit_state in ratelimit.h · f5d87d85
      Namhyung Kim 提交于
      Adding declaration of printk_ratelimit_state in ratelimit.h removes
      potential build breakage and following sparse warning:
      
       kernel/printk.c:1426:1: warning: symbol 'printk_ratelimit_state' was not declared. Should it be static?
      
      [akpm@linux-foundation.org: remove unneeded ifdef]
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5d87d85
    • E
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet 提交于
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of atomic_t.
      
      get_max_files() is changed to return an unsigned long.  get_nr_files() is
      changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      518de9b3
  15. 26 10月, 2010 3 次提交
    • C
      fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8
      Christoph Hellwig 提交于
      The nr_dentry stat is a globally touched cacheline and atomic operation
      twice over the lifetime of a dentry. It is used for the benfit of userspace
      only. Turn it into a per-cpu counter and always decrement it in d_free instead
      of doing various batching operations to reduce lock hold times in the callers.
      
      Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      312d3ca8
    • D
      fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa
      Dave Chinner 提交于
      The number of inodes allocated does not need to be tied to the
      addition or removal of an inode to/from a list. If we are not tied
      to a list lock, we could update the counters when inodes are
      initialised or destroyed, but to do that we need to convert the
      counters to be per-cpu (i.e. independent of a lock). This means that
      we have the freedom to change the list/locking implementation
      without needing to care about the counters.
      
      Based on a patch originally from Eric Dumazet.
      
      [AV: cleaned up a bit, fixed build breakage on weird configs
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cffbc8aa
    • E
      fs: allow for more than 2^31 files · 7e360c38
      Eric Dumazet 提交于
      Andrew,
      
      Could you please review this patch, you probably are the right guy to
      take it, because it crosses fs and net trees.
      
      Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
      depend on previous patch (sysctl: fix min/max handling in
      __do_proc_doulongvec_minmax())
      
      Thanks !
      
      [PATCH V4] fs: allow for more than 2^31 files
      
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of
      atomic_t.
      
      get_max_files() is changed to return an unsigned long.
      get_nr_files() is changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e360c38
  16. 08 10月, 2010 1 次提交
  17. 05 9月, 2010 1 次提交
  18. 10 8月, 2010 1 次提交
  19. 06 8月, 2010 1 次提交
  20. 28 7月, 2010 2 次提交
  21. 23 7月, 2010 1 次提交
  22. 03 6月, 2010 1 次提交
  23. 26 5月, 2010 1 次提交
  24. 25 5月, 2010 2 次提交
  25. 22 5月, 2010 2 次提交
  26. 17 5月, 2010 1 次提交
    • H
      [S390] debug: enable exception-trace debug facility · ab3c68ee
      Heiko Carstens 提交于
      The exception-trace facility on x86 and other architectures prints
      traces to dmesg whenever a user space application crashes.
      s390 has such a feature since ages however it is called
      userprocess_debug and is enabled differently.
      This patch makes sure that whenever one of the two procfs files
      
      /proc/sys/kernel/userprocess_debug
      /proc/sys/debug/exception-trace
      
      is modified the contents of the second one changes as well.
      That way we keep backwards compatibilty but also support the same
      interface like other architectures do.
      Besides that the output of the traces is improved since it will now
      also contain the corresponding filename of the vma (when available)
      where the process caused a fault or trap.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ab3c68ee
  27. 16 5月, 2010 2 次提交
    • O
      sysctl: add proc_do_large_bitmap · 9f977fb7
      Octavian Purdila 提交于
      The new function can be used to read/write large bitmaps via /proc. A
      comma separated range format is used for compact output and input
      (e.g. 1,3-4,10-10).
      
      Writing into the file will first reset the bitmap then update it
      based on the given input.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f977fb7
    • A
      sysctl: refactor integer handling proc code · 00b7c339
      Amerigo Wang 提交于
      (Based on Octavian's work, and I modified a lot.)
      
      As we are about to add another integer handling proc function a little
      bit of cleanup is in order: add a few helper functions to improve code
      readability and decrease code duplication.
      
      In the process a bug is also fixed: if the user specifies a number
      with more then 20 digits it will be interpreted as two integers
      (e.g. 10000...13 will be interpreted as 100.... and 13).
      
      Behavior for EFAULT handling was changed as well. Previous to this
      patch, when an EFAULT error occurred in the middle of a write
      operation, although some of the elements were set, that was not
      acknowledged to the user (by shorting the write and returning the
      number of bytes accepted). EFAULT is now treated just like any other
      errors by acknowledging the amount of bytes accepted.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00b7c339
  28. 13 5月, 2010 1 次提交
    • D
      lockup_detector: Remove old softlockup code · 2508ce18
      Don Zickus 提交于
      Now that is no longer compiled or used, just remove it.
      
      Also move some of the code wrapped with DETECT_SOFTLOCKUP to the
      LOCKUP_DETECTOR wrappers because that is the code that uses it now.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      2508ce18