1. 08 3月, 2011 1 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  2. 25 1月, 2011 1 次提交
  3. 14 1月, 2011 3 次提交
  4. 10 12月, 2010 1 次提交
  5. 30 11月, 2010 1 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
  6. 18 11月, 2010 3 次提交
    • P
      sched: Add sysctl_sched_shares_window · a7a4f8a7
      Paul Turner 提交于
      Introduce a new sysctl for the shares window and disambiguate it from
      sched_time_avg.
      
      A 10ms window appears to be a good compromise between accuracy and performance.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234938.112173964@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a7a4f8a7
    • P
      sched: Rewrite tg_shares_up) · 2069dd75
      Peter Zijlstra 提交于
      By tracking a per-cpu load-avg for each cfs_rq and folding it into a
      global task_group load on each tick we can rework tg_shares_up to be
      strictly per-cpu.
      
      This should improve cpu-cgroup performance for smp systems
      significantly.
      
      [ Paul: changed to use queueing cfs_rq + bug fixes ]
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.580480400@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2069dd75
    • D
      x86, nmi_watchdog: Remove the old nmi_watchdog · 5f2b0ba4
      Don Zickus 提交于
      Now that we have a new nmi_watchdog that is more generic and
      sits on top of the perf subsystem, we really do not need the old
      nmi_watchdog any more.
      
      In addition, the old nmi_watchdog doesn't really work if you are
      using the default clocksource, hpet.  The old nmi_watchdog code
      relied on local apic interrupts to determine if the cpu is still
      alive.  With hpet as the clocksource, these interrupts don't
      increment any more and the old nmi_watchdog triggers false
      postives.
      
      This piece removes the old nmi_watchdog code and stubs out any
      variables and functions calls.  The stubs are the same ones used
      by the new nmi_watchdog code, so it should be well tested.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f2b0ba4
  7. 16 11月, 2010 1 次提交
  8. 12 11月, 2010 1 次提交
  9. 27 10月, 2010 2 次提交
    • N
      printk: declare printk_ratelimit_state in ratelimit.h · f5d87d85
      Namhyung Kim 提交于
      Adding declaration of printk_ratelimit_state in ratelimit.h removes
      potential build breakage and following sparse warning:
      
       kernel/printk.c:1426:1: warning: symbol 'printk_ratelimit_state' was not declared. Should it be static?
      
      [akpm@linux-foundation.org: remove unneeded ifdef]
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5d87d85
    • E
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet 提交于
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of atomic_t.
      
      get_max_files() is changed to return an unsigned long.  get_nr_files() is
      changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      518de9b3
  10. 26 10月, 2010 3 次提交
    • C
      fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8
      Christoph Hellwig 提交于
      The nr_dentry stat is a globally touched cacheline and atomic operation
      twice over the lifetime of a dentry. It is used for the benfit of userspace
      only. Turn it into a per-cpu counter and always decrement it in d_free instead
      of doing various batching operations to reduce lock hold times in the callers.
      
      Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      312d3ca8
    • D
      fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa
      Dave Chinner 提交于
      The number of inodes allocated does not need to be tied to the
      addition or removal of an inode to/from a list. If we are not tied
      to a list lock, we could update the counters when inodes are
      initialised or destroyed, but to do that we need to convert the
      counters to be per-cpu (i.e. independent of a lock). This means that
      we have the freedom to change the list/locking implementation
      without needing to care about the counters.
      
      Based on a patch originally from Eric Dumazet.
      
      [AV: cleaned up a bit, fixed build breakage on weird configs
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cffbc8aa
    • E
      fs: allow for more than 2^31 files · 7e360c38
      Eric Dumazet 提交于
      Andrew,
      
      Could you please review this patch, you probably are the right guy to
      take it, because it crosses fs and net trees.
      
      Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
      depend on previous patch (sysctl: fix min/max handling in
      __do_proc_doulongvec_minmax())
      
      Thanks !
      
      [PATCH V4] fs: allow for more than 2^31 files
      
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of
      atomic_t.
      
      get_max_files() is changed to return an unsigned long.
      get_nr_files() is changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e360c38
  11. 08 10月, 2010 1 次提交
  12. 05 9月, 2010 1 次提交
  13. 10 8月, 2010 1 次提交
  14. 06 8月, 2010 1 次提交
  15. 28 7月, 2010 2 次提交
  16. 23 7月, 2010 1 次提交
  17. 03 6月, 2010 1 次提交
  18. 26 5月, 2010 1 次提交
  19. 25 5月, 2010 2 次提交
  20. 22 5月, 2010 2 次提交
  21. 17 5月, 2010 1 次提交
    • H
      [S390] debug: enable exception-trace debug facility · ab3c68ee
      Heiko Carstens 提交于
      The exception-trace facility on x86 and other architectures prints
      traces to dmesg whenever a user space application crashes.
      s390 has such a feature since ages however it is called
      userprocess_debug and is enabled differently.
      This patch makes sure that whenever one of the two procfs files
      
      /proc/sys/kernel/userprocess_debug
      /proc/sys/debug/exception-trace
      
      is modified the contents of the second one changes as well.
      That way we keep backwards compatibilty but also support the same
      interface like other architectures do.
      Besides that the output of the traces is improved since it will now
      also contain the corresponding filename of the vma (when available)
      where the process caused a fault or trap.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ab3c68ee
  22. 16 5月, 2010 2 次提交
    • O
      sysctl: add proc_do_large_bitmap · 9f977fb7
      Octavian Purdila 提交于
      The new function can be used to read/write large bitmaps via /proc. A
      comma separated range format is used for compact output and input
      (e.g. 1,3-4,10-10).
      
      Writing into the file will first reset the bitmap then update it
      based on the given input.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f977fb7
    • A
      sysctl: refactor integer handling proc code · 00b7c339
      Amerigo Wang 提交于
      (Based on Octavian's work, and I modified a lot.)
      
      As we are about to add another integer handling proc function a little
      bit of cleanup is in order: add a few helper functions to improve code
      readability and decrease code duplication.
      
      In the process a bug is also fixed: if the user specifies a number
      with more then 20 digits it will be interpreted as two integers
      (e.g. 10000...13 will be interpreted as 100.... and 13).
      
      Behavior for EFAULT handling was changed as well. Previous to this
      patch, when an EFAULT error occurred in the middle of a write
      operation, although some of the elements were set, that was not
      acknowledged to the user (by shorting the write and returning the
      number of bytes accepted). EFAULT is now treated just like any other
      errors by acknowledging the amount of bytes accepted.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00b7c339
  23. 13 5月, 2010 3 次提交
    • D
      lockup_detector: Remove old softlockup code · 2508ce18
      Don Zickus 提交于
      Now that is no longer compiled or used, just remove it.
      
      Also move some of the code wrapped with DETECT_SOFTLOCKUP to the
      LOCKUP_DETECTOR wrappers because that is the code that uses it now.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      2508ce18
    • D
      lockup_detector: Touch_softlockup cleanups and softlockup_tick removal · 332fbdbc
      Don Zickus 提交于
      Just some code cleanup to make touch_softlockup clearer and remove the
      softlockup_tick function as it is no longer needed.
      
      Also remove the /proc softlockup_thres call as it has been changed to
      watchdog_thres.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      332fbdbc
    • D
      lockup_detector: Combine nmi_watchdog and softlockup detector · 58687acb
      Don Zickus 提交于
      The new nmi_watchdog (which uses the perf event subsystem) is very
      similar in structure to the softlockup detector.  Using Ingo's
      suggestion, I combined the two functionalities into one file:
      kernel/watchdog.c.
      
      Now both the nmi_watchdog (or hardlockup detector) and softlockup
      detector sit on top of the perf event subsystem, which is run every
      60 seconds or so to see if there are any lockups.
      
      To detect hardlockups, cpus not responding to interrupts, I
      implemented an hrtimer that runs 5 times for every perf event
      overflow event.  If that stops counting on a cpu, then the cpu is
      most likely in trouble.
      
      To detect softlockups, tasks not yielding to the scheduler, I used the
      previous kthread idea that now gets kicked every time the hrtimer fires.
      If the kthread isn't being scheduled neither is anyone else and the
      warning is printed to the console.
      
      I tested this on x86_64 and both the softlockup and hardlockup paths
      work.
      
      V2:
      - cleaned up the Kconfig and softlockup combination
      - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
      - seperated out the softlockup case from perf event subsystem
      - re-arranged the enabling/disabling nmi watchdog from proc space
      - added cpumasks for hardlockup failure cases
      - removed fallback to soft events if no PMU exists for hard events
      
      V3:
      - comment cleanups
      - drop support for older softlockup code
      - per_cpu cleanups
      - completely remove software clock base hardlockup detector
      - use per_cpu masking on hard/soft lockup detection
      - #ifdef cleanups
      - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
      - documentation additions
      
      V4:
      - documentation fixes
      - convert per_cpu to __get_cpu_var
      - powerpc compile fixes
      
      V5:
      - split apart warn flags for hard and soft lockups
      
      TODO:
      - figure out how to make an arch-agnostic clock2cycles call
        (if possible) to feed into perf events as a sample period
      
      [fweisbec: merged conflict patch]
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      58687acb
  24. 14 4月, 2010 1 次提交
    • D
      Input: implement SysRq as a separate input handler · 97f5f0cd
      Dmitry Torokhov 提交于
      Instead of keeping SysRq support inside of legacy keyboard driver split
      it out into a separate input handler (filter). This stops most SysRq input
      events from leaking into evdev clients (some events, such as first SysRq
      scancode - not keycode - event, are still leaked into both legacy keyboard
      and evdev).
      
      [martinez.javier@gmail.com: fix compile error when CONFIG_MAGIC_SYSRQ is
       not defined]
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      97f5f0cd
  25. 13 3月, 2010 3 次提交