1. 26 7月, 2008 1 次提交
  2. 25 7月, 2008 1 次提交
    • M
      hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() · a1e78772
      Mel Gorman 提交于
      This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
      a similar manner to the reservations taken for MAP_SHARED mappings.  The
      reserve count is accounted both globally and on a per-VMA basis for
      private mappings.  This guarantees that a process that successfully calls
      mmap() will successfully fault all pages in the future unless fork() is
      called.
      
      The characteristics of private mappings of hugetlbfs files behaviour after
      this patch are;
      
      1. The process calling mmap() is guaranteed to succeed all future faults until
         it forks().
      2. On fork(), the parent may die due to SIGKILL on writes to the private
         mapping if enough pages are not available for the COW. For reasonably
         reliable behaviour in the face of a small huge page pool, children of
         hugepage-aware processes should not reference the mappings; such as
         might occur when fork()ing to exec().
      3. On fork(), the child VMAs inherit no reserves. Reads on pages already
         faulted by the parent will succeed. Successful writes will depend on enough
         huge pages being free in the pool.
      4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
         and at fault time otherwise.
      
      Before this patch, all reads or writes in the child potentially needs page
      allocations that can later lead to the death of the parent.  This applies
      to reads and writes of uninstantiated pages as well as COW.  After the
      patch it is only a write to an instantiated page that causes problems.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1e78772
  3. 17 7月, 2008 1 次提交
    • R
      ptrace children revamp · f470021a
      Roland McGrath 提交于
      ptrace no longer fiddles with the children/sibling links, and the
      old ptrace_children list is gone.  Now ptrace, whether of one's own
      children or another's via PTRACE_ATTACH, just uses the new ptraced
      list instead.
      
      There should be no user-visible difference that matters.  The only
      change is the order in which do_wait() sees multiple stopped
      children and stopped ptrace attachees.  Since wait_task_stopped()
      was changed earlier so it no longer reorders the children list, we
      already know this won't cause any new problems.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      f470021a
  4. 14 7月, 2008 1 次提交
    • I
      lockdep: fix kernel/fork.c warning · d12c1a37
      Ingo Molnar 提交于
      fix:
      
      [    0.184011] ------------[ cut here ]------------
      [    0.188011] WARNING: at kernel/fork.c:918 copy_process+0x1c0/0x1084()
      [    0.192011] Pid: 0, comm: swapper Not tainted 2.6.26-tip-00351-g01d4a50-dirty #14521
      [    0.196011]  [<c0135d48>] warn_on_slowpath+0x3c/0x60
      [    0.200012]  [<c016f805>] ? __alloc_pages_internal+0x92/0x36b
      [    0.208012]  [<c033de5e>] ? __spin_lock_init+0x24/0x4a
      [    0.212012]  [<c01347e3>] copy_process+0x1c0/0x1084
      [    0.216013]  [<c013575f>] do_fork+0xb8/0x1ad
      [    0.220013]  [<c034f75e>] ? acpi_os_release_lock+0x8/0xa
      [    0.228013]  [<c034ff7a>] ? acpi_os_vprintf+0x20/0x24
      [    0.232014]  [<c01129ee>] kernel_thread+0x75/0x7d
      [    0.236014]  [<c0a491eb>] ? kernel_init+0x0/0x24a
      [    0.240014]  [<c0a491eb>] ? kernel_init+0x0/0x24a
      [    0.244014]  [<c01151b0>] ? kernel_thread_helper+0x0/0x10
      [    0.252015]  [<c06c6ac0>] rest_init+0x14/0x50
      [    0.256015]  [<c0a498ce>] start_kernel+0x2b9/0x2c0
      [    0.260015]  [<c0a4904f>] __init_begin+0x4f/0x57
      [    0.264016]  =======================
      [    0.268016] ---[ end trace 4eaa2a86a8e2da22 ]---
      [    0.272016] enabled ExtINT on CPU#0
      
      which occurs if CONFIG_TRACE_IRQFLAGS=y, CONFIG_DEBUG_LOCKDEP=y,
      but CONFIG_PROVE_LOCKING is disabled.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d12c1a37
  5. 03 7月, 2008 1 次提交
  6. 24 5月, 2008 1 次提交
    • S
      ftrace: trace irq disabled critical timings · 81d68a96
      Steven Rostedt 提交于
      This patch adds latency tracing for critical timings
      (how long interrupts are disabled for).
      
       "irqsoff" is added to /debugfs/tracing/available_tracers
      
      Note:
        tracing_max_latency
          also holds the max latency for irqsoff (in usecs).
         (default to large number so one must start latency tracing)
      
        tracing_thresh
          threshold (in usecs) to always print out if irqs off
          is detected to be longer than stated here.
          If irq_thresh is non-zero, then max_irq_latency
          is ignored.
      
      Here's an example of a trace with ftrace_enabled = 0
      
      =======
      preemption latency trace v1.1.5 on 2.6.24-rc7
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      --------------------------------------------------------------------
       latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
          -----------------
       => started at: _spin_lock_irqsave+0x2a/0xb7
       => ended at:   _spin_unlock_irqrestore+0x32/0x5f
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
       swapper-0     1d.s3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
       swapper-0     1d.s3  100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1d.s3  100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)
      
      vim:ft=help
      =======
      
      And this is a trace with ftrace_enabled == 1
      
      =======
      preemption latency trace v1.1.5 on 2.6.24-rc7
      --------------------------------------------------------------------
       latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
          -----------------
       => started at: _spin_lock_irqsave+0x2a/0xb7
       => ended at:   _spin_unlock_irqrestore+0x32/0x5f
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
       swapper-0     1dNs3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
       swapper-0     1dNs3   46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
       swapper-0     1dNs3   46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
       swapper-0     1dNs3   46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
       swapper-0     1dNs3   47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
       swapper-0     1dNs3   47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
       swapper-0     1dNs3   97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
       swapper-0     1dNs3   98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
       swapper-0     1dNs3   99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
       swapper-0     1dNs3  101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1dNs3  102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1dNs3  102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)
      
      vim:ft=help
      =======
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      81d68a96
  7. 17 5月, 2008 1 次提交
  8. 02 5月, 2008 1 次提交
  9. 30 4月, 2008 1 次提交
  10. 29 4月, 2008 4 次提交
    • M
      procfs task exe symlink · 925d1c40
      Matt Helsley 提交于
      The kernel implements readlink of /proc/pid/exe by getting the file from
      the first executable VMA.  Then the path to the file is reconstructed and
      reported as the result.
      
      Because of the VMA walk the code is slightly different on nommu systems.
      This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
      walking the VMAs to find the first executable file-backed VMA we store a
      reference to the exec'd file in the mm_struct.
      
      That reference would prevent the filesystem holding the executable file
      from being unmounted even after unmapping the VMAs.  So we track the number
      of VM_EXECUTABLE VMAs and drop the new reference when the last one is
      unmapped.  This avoids pinning the mounted filesystem.
      
      [akpm@linux-foundation.org: improve comments]
      [yamamoto@valinux.co.jp: fix dup_mmap]
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Howells <dhowells@redhat.com>
      Cc:"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      925d1c40
    • M
      ipc: sysvsem: force unshare(CLONE_SYSVSEM) when CLONE_NEWIPC · 6013f67f
      Manfred Spraul 提交于
      sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
      cause a kernel memory corruption.  CLONE_NEWIPC must detach from the existing
      undo lists.
      
      Fix, part 2: perform an implicit CLONE_SYSVSEM in CLONE_NEWIPC.  CLONE_NEWIPC
      creates a new IPC namespace, the task cannot access the existing semaphore
      arrays after the unshare syscall.  Thus the task can/must detach from the
      existing undo list entries, too.
      
      This fixes the kernel corruption, because it makes it impossible that
      undo records from two different namespaces are in sysvsem.undo_list.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6013f67f
    • M
      ipc: sysvsem: implement sys_unshare(CLONE_SYSVSEM) · 9edff4ab
      Manfred Spraul 提交于
      sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
      cause a kernel memory corruption.  CLONE_NEWIPC must detach from the existing
      undo lists.
      
      Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM)
      
      The original reason to not support it was the potential (inevitable?)
      confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the
      inverse meaning of clone(CLONE_SYSVSEM).
      
      Our two most reasonable options then appear to be (1) fully support
      CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM,
      but always do it anyway on unshare(CLONE_SYSVSEM).  This patch does
      (1).
      
      Changelog:
      	Apr 16: SEH: switch to Manfred's alternative patch which
      		removes the unshare_semundo() function which
      		always refused CLONE_SYSVSEM.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9edff4ab
    • B
      cgroups: add an owner to the mm_struct · cf475ad2
      Balbir Singh 提交于
      Remove the mem_cgroup member from mm_struct and instead adds an owner.
      
      This approach was suggested by Paul Menage.  The advantage of this approach
      is that, once the mm->owner is known, using the subsystem id, the cgroup
      can be determined.  It also allows several control groups that are
      virtually grouped by mm_struct, to exist independent of the memory
      controller i.e., without adding mem_cgroup's for each controller, to
      mm_struct.
      
      A new config option CONFIG_MM_OWNER is added and the memory resource
      controller selects this config option.
      
      This patch also adds cgroup callbacks to notify subsystems when mm->owner
      changes.  The mm_cgroup_changed callback is called with the task_lock() of
      the new task held and is called just prior to changing the mm->owner.
      
      I am indebted to Paul Menage for the several reviews of this patchset and
      helping me make it lighter and simpler.
      
      This patch was tested on a powerpc box, it was compiled with both the
      MM_OWNER config turned on and off.
      
      After the thread group leader exits, it's moved to init_css_state by
      cgroup_exit(), thus all future charges from runnings threads would be
      redirected to the init_css_set's subsystem.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>,
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf475ad2
  11. 28 4月, 2008 2 次提交
    • L
      mempolicy: rename mpol_copy to mpol_dup · 846a16bf
      Lee Schermerhorn 提交于
      This patch renames mpol_copy() to mpol_dup() because, well, that's what it
      does.  Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an
      existing mempolicy, allocates a new one and copies the contents.
      
      In a later patch, I want to use the name mpol_copy() to copy the contents from
      one mempolicy to another like, e.g., strcpy() does for strings.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      846a16bf
    • L
      mempolicy: rename mpol_free to mpol_put · f0be3d32
      Lee Schermerhorn 提交于
      This is a change that was requested some time ago by Mel Gorman.  Makes sense
      to me, so here it is.
      
      Note: I retain the name "mpol_free_shared_policy()" because it actually does
      free the shared_policy, which is NOT a reference counted object.  However, ...
      
      The mempolicy object[s] referenced by the shared_policy are reference counted,
      so mpol_put() is used to release the reference held by the shared_policy.  The
      mempolicy might not be freed at this time, because some task attached to the
      shared object associated with the shared policy may be in the process of
      allocating a page based on the mempolicy.  In that case, the task performing
      the allocation will hold a reference on the mempolicy, obtained via
      mpol_shared_policy_lookup().  The mempolicy will be freed when all tasks
      holding such a reference have called mpol_put() for the mempolicy.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0be3d32
  12. 27 4月, 2008 2 次提交
  13. 26 4月, 2008 1 次提交
  14. 25 4月, 2008 3 次提交
    • A
      [PATCH] sanitize unshare_files/reset_files_struct · 3b125388
      Al Viro 提交于
      * let unshare_files() give caller the displaced files_struct
      * don't bother with grabbing reference only to drop it in the
        caller if it hadn't been shared in the first place
      * in that form unshare_files() is trivially implemented via
        unshare_fd(), so we eliminate the duplicate logics in fork.c
      * reset_files_struct() is not just only called for current;
        it will break the system if somebody ever calls it for anything
        else (we can't modify ->files of somebody else).  Lose the
        task_struct * argument.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3b125388
    • A
      [PATCH] sanitize handling of shared descriptor tables in failing execve() · fd8328be
      Al Viro 提交于
      * unshare_files() can fail; doing it after irreversible actions is wrong
        and de_thread() is certainly irreversible.
      * since we do it unconditionally anyway, we might as well do it in do_execve()
        and save ourselves the PITA in binfmt handlers, etc.
      * while we are at it, binfmt_som actually leaked files_struct on failure.
      
      As a side benefit, unshare_files(), put_files_struct() and reset_files_struct()
      become unexported.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fd8328be
    • A
      [PATCH] close race in unshare_files() · 6b335d9c
      Al Viro 提交于
      updating current->files requires task_lock
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b335d9c
  15. 20 4月, 2008 2 次提交
  16. 29 3月, 2008 1 次提交
  17. 15 2月, 2008 1 次提交
  18. 09 2月, 2008 3 次提交
  19. 08 2月, 2008 1 次提交
  20. 07 2月, 2008 2 次提交
  21. 06 2月, 2008 3 次提交
    • S
      capabilities: introduce per-process capability bounding set · 3b7391de
      Serge E. Hallyn 提交于
      The capability bounding set is a set beyond which capabilities cannot grow.
       Currently cap_bset is per-system.  It can be manipulated through sysctl,
      but only init can add capabilities.  Root can remove capabilities.  By
      default it includes all caps except CAP_SETPCAP.
      
      This patch makes the bounding set per-process when file capabilities are
      enabled.  It is inherited at fork from parent.  Noone can add elements,
      CAP_SETPCAP is required to remove them.
      
      One example use of this is to start a safer container.  For instance, until
      device namespaces or per-container device whitelists are introduced, it is
      best to take CAP_MKNOD away from a container.
      
      The bounding set will not affect pP and pE immediately.  It will only
      affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
      and exec() does not constrain pI'.  So to really start a shell with no way
      of regain CAP_MKNOD, you would do
      
      	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
      	cap_t cap = cap_get_proc();
      	cap_value_t caparray[1];
      	caparray[0] = CAP_MKNOD;
      	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
      	cap_set_proc(cap);
      	cap_free(cap);
      
      The following test program will get and set the bounding
      set (but not pI).  For instance
      
      	./bset get
      		(lists capabilities in bset)
      	./bset drop cap_net_raw
      		(starts shell with new bset)
      		(use capset, setuid binary, or binary with
      		file capabilities to try to increase caps)
      
      ************************************************************
      cap_bound.c
      ************************************************************
       #include <sys/prctl.h>
       #include <linux/capability.h>
       #include <sys/types.h>
       #include <unistd.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
      
       #ifndef PR_CAPBSET_READ
       #define PR_CAPBSET_READ 23
       #endif
      
       #ifndef PR_CAPBSET_DROP
       #define PR_CAPBSET_DROP 24
       #endif
      
      int usage(char *me)
      {
      	printf("Usage: %s get\n", me);
      	printf("       %s drop <capability>\n", me);
      	return 1;
      }
      
       #define numcaps 32
      char *captable[numcaps] = {
      	"cap_chown",
      	"cap_dac_override",
      	"cap_dac_read_search",
      	"cap_fowner",
      	"cap_fsetid",
      	"cap_kill",
      	"cap_setgid",
      	"cap_setuid",
      	"cap_setpcap",
      	"cap_linux_immutable",
      	"cap_net_bind_service",
      	"cap_net_broadcast",
      	"cap_net_admin",
      	"cap_net_raw",
      	"cap_ipc_lock",
      	"cap_ipc_owner",
      	"cap_sys_module",
      	"cap_sys_rawio",
      	"cap_sys_chroot",
      	"cap_sys_ptrace",
      	"cap_sys_pacct",
      	"cap_sys_admin",
      	"cap_sys_boot",
      	"cap_sys_nice",
      	"cap_sys_resource",
      	"cap_sys_time",
      	"cap_sys_tty_config",
      	"cap_mknod",
      	"cap_lease",
      	"cap_audit_write",
      	"cap_audit_control",
      	"cap_setfcap"
      };
      
      int getbcap(void)
      {
      	int comma=0;
      	unsigned long i;
      	int ret;
      
      	printf("i know of %d capabilities\n", numcaps);
      	printf("capability bounding set:");
      	for (i=0; i<numcaps; i++) {
      		ret = prctl(PR_CAPBSET_READ, i);
      		if (ret < 0)
      			perror("prctl");
      		else if (ret==1)
      			printf("%s%s", (comma++) ? ", " : " ", captable[i]);
      	}
      	printf("\n");
      	return 0;
      }
      
      int capdrop(char *str)
      {
      	unsigned long i;
      
      	int found=0;
      	for (i=0; i<numcaps; i++) {
      		if (strcmp(captable[i], str) == 0) {
      			found=1;
      			break;
      		}
      	}
      	if (!found)
      		return 1;
      	if (prctl(PR_CAPBSET_DROP, i)) {
      		perror("prctl");
      		return 1;
      	}
      	return 0;
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc<2)
      		return usage(argv[0]);
      	if (strcmp(argv[1], "get")==0)
      		return getbcap();
      	if (strcmp(argv[1], "drop")!=0 || argc<3)
      		return usage(argv[0]);
      	if (capdrop(argv[2])) {
      		printf("unknown capability\n");
      		return 1;
      	}
      	return execl("/bin/bash", "/bin/bash", NULL);
      }
      ************************************************************
      
      [serue@us.ibm.com: fix typo]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>a
      Signed-off-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Tested-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7391de
    • B
      add mm argument to pte/pmd/pud/pgd_free · 5e541973
      Benjamin Herrenschmidt 提交于
      (with Martin Schwidefsky <schwidefsky@de.ibm.com>)
      
      The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
      first argument.  The free functions do not get the mm_struct argument.  This
      is 1) asymmetrical and 2) to do mm related page table allocations the mm
      argument is needed on the free function as well.
      
      [kamalesh@linux.vnet.ibm.com: i386 fix]
      [akpm@linux-foundation.org: coding-syle fixes]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e541973
    • A
      clone: prepare to recycle CLONE_STOPPED · bdff746a
      Andrew Morton 提交于
      Ulrich says that we never used this clone flags and that nothing should be
      using it.
      
      As we're down to only a single bit left in clone's flags argument, let's add a
      warning to check that no userspace is actually using it.  Hopefully we will
      be able to recycle it.
      
      Roland said:
      
        CLONE_STOPPED was previously used by some NTPL versions when under
        thread_db (i.e.  only when being actively debugged by gdb), but not for a
        long time now, and it never worked reliably when it was used.  Removing it
        seems fine to me.
      
      [akpm@linux-foundation.org: it looks like CLONE_DETACHED is being used]
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdff746a
  22. 30 1月, 2008 1 次提交
  23. 28 1月, 2008 3 次提交
  24. 26 1月, 2008 2 次提交
    • A
      sched: latencytop support · 9745512c
      Arjan van de Ven 提交于
      LatencyTOP kernel infrastructure; it measures latencies in the
      scheduler and tracks it system wide and per process.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9745512c
    • P
      sched: rt group scheduling · 6f505b16
      Peter Zijlstra 提交于
      Extend group scheduling to also cover the realtime classes. It uses the time
      limiting introduced by the previous patch to allow multiple realtime groups.
      
      The hard time limit is required to keep behaviour deterministic.
      
      The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
      number of task groups. This is the worst case behaviour I can't seem to get out
      of, the avg. case of the algorithms can be improved, I focused on correctness
      and worst case.
      
      [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6f505b16