1. 20 10月, 2007 12 次提交
    • E
      Fix tsk->exit_state usage · 270f722d
      Eugene Teo 提交于
      tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD.  A non-zero test
      is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing
      tsk->exit_state is sufficient.
      Signed-off-by: NEugene Teo <eugeneteo@kernel.sg>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      270f722d
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
    • P
      pid namespaces: initialize the namespace's proc_mnt · 6f4e6433
      Pavel Emelyanov 提交于
      The namespace's proc_mnt must be kern_mount-ed to make this pointer always
      valid, independently of whether the user space mounted the proc or not.  This
      solves raced in proc_flush_task, etc.  with the proc_mnt switching from NULL
      to not-NULL.
      
      The initialization is done after the init's pid is created and hashed to make
      proc_get_sb() finr it and get for root inode.
      
      Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
      superblock holds the namespace we must explicitly break this circle to destroy
      all the stuff.  This is done after the init of the namespace dies.  Running a
      few steps forward - when init exits it will kill all its children, so no
      proc_mnt will be needed after its death.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f4e6433
    • P
      pid namespaces: allow cloning of new namespace · 30e49c26
      Pavel Emelyanov 提交于
      When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
      create a new struct pid for the new process.  Allocate pid_t's for the new
      process in the new pid namespace and all ancestor pid namespaces.  Make the
      newly cloned process the session and process group leader.
      
      Since the active pid namespace is special and expected to be the first entry
      in pid->upid_list, preserve the order of pid namespaces.
      
      The size of 'struct pid' is dependent on the the number of pid namespaces the
      process exists in, so we use multiple pid-caches'.  Only one pid cache is
      created during system startup and this used by processes that exist only in
      init_pid_ns.
      
      When a process clones its pid namespace, we create additional pid caches as
      necessary and use the pid cache to allocate 'struct pids' for that depth.
      
      Note, that with this patch the newly created namespace won't work, since the
      rest of the kernel still uses global pids, but this is to be fixed soon.  Init
      pid namespace still works.
      
      [oleg@tv-sign.ru: merge fix]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30e49c26
    • P
      pid namespaces: move alloc_pid() lower in copy_process() · 425fb2b4
      Pavel Emelyanov 提交于
      When we create new namespace we will need to allocate the struct pid, that
      will have one extra struct upid in array, comparing to the parent.
      
      Thus we need to know the new namespace (if any) in alloc_pid() to init this
      struct upid properly, so move the alloc_pid() call lower in copy_process().
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      425fb2b4
    • P
      pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid · 8ef047aa
      Pavel Emelyanov 提交于
      Each struct upid element of struct pid has to be initialized properly, i.e.
      its nr mst be allocated from appropriate pidmap and ns set to appropriate
      namespace.
      
      When allocating a new pid, we need to know the namespace this pid will live
      in, so the additional argument is added to alloc_pid().
      
      On the other hand, the rest of the kernel still uses the pid->nr and
      pid->pid_chain fields, so these ones are still initialized, but this will be
      removed soon.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ef047aa
    • P
      Make access to task's nsproxy lighter · cf7b708c
      Pavel Emelyanov 提交于
      When someone wants to deal with some other taks's namespaces it has to lock
      the task and then to get the desired namespace if the one exists.  This is
      slow on read-only paths and may be impossible in some cases.
      
      E.g.  Oleg recently noticed a race between unshare() and the (sent for
      review in cgroups) pid namespaces - when the task notifies the parent it
      has to know the parent's namespace, but taking the task_lock() is
      impossible there - the code is under write locked tasklist lock.
      
      On the other hand switching the namespace on task (daemonize) and releasing
      the namespace (after the last task exit) is rather rare operation and we
      can sacrifice its speed to solve the issues above.
      
      The access to other task namespaces is proposed to be performed
      like this:
      
           rcu_read_lock();
           nsproxy = task_nsproxy(tsk);
           if (nsproxy != NULL) {
                   / *
                     * work with the namespaces here
                     * e.g. get the reference on one of them
                     * /
           } / *
               * NULL task_nsproxy() means that this task is
               * almost dead (zombie)
               * /
           rcu_read_unlock();
      
      This patch has passed the review by Eric and Oleg :) and,
      of course, tested.
      
      [clg@fr.ibm.com: fix unshare()]
      [ebiederm@xmission.com: Update get_net_ns_by_pid]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf7b708c
    • S
      pid namespaces: move alloc_pid() to copy_process() · a6f5e063
      Sukadev Bhattiprolu 提交于
      Move alloc_pid() into copy_process().  This will keep all pid and pid
      namespace code together and simplify error handling when we support multiple
      pid namespaces.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6f5e063
    • P
      pid namespaces: round up the API · a47afb0f
      Pavel Emelianov 提交于
      The set of functions process_session, task_session, process_group and
      task_pgrp is confusing, as the names can be mixed with each other when looking
      at the code for a long time.
      
      The proposals are to
      * equip the functions that return the integer with _nr suffix to
        represent that fact,
      * and to make all functions work with task (not process) by making
        the common prefix of the same name.
      
      For monotony the routines signal_session() and set_signal_session() are
      replaced with task_session_nr() and set_task_session(), especially since they
      are only used with the explicit task->signal dereference.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47afb0f
    • P
      Task Control Groups: make cpusets a client of cgroups · 8793d854
      Paul Menage 提交于
      Remove the filesystem support logic from the cpusets system and makes cpusets
      a cgroup subsystem
      
      The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
      passed through to the cgroup filesystem with the appropriate options to
      emulate the old cpuset filesystem behaviour.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8793d854
    • P
      Task Control Groups: shared cgroup subsystem group arrays · 817929ec
      Paul Menage 提交于
      Replace the struct css_set embedded in task_struct with a pointer; all tasks
      that have the same set of memberships across all hierarchies will share a
      css_set object, and will be linked via their css_sets field to the "tasks"
      list_head in the css_set.
      
      Assuming that many tasks share the same cgroup assignments, this reduces
      overall space usage and keeps the size of the task_struct down (three pointers
      added to task_struct compared to a non-cgroups kernel, no matter how many
      subsystems are registered).
      
      [akpm@linux-foundation.org: fix a printk]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      817929ec
    • P
      Task Control Groups: add fork()/exit() hooks · b4f48b63
      Paul Menage 提交于
      This adds the necessary hooks to the fork() and exit() paths to ensure
      that new children inherit their parent's cgroup assignments, and that
      exiting processes release reference counts on their cgroups.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4f48b63
  2. 19 10月, 2007 4 次提交
  3. 17 10月, 2007 4 次提交
  4. 15 10月, 2007 1 次提交
  5. 11 10月, 2007 1 次提交
    • E
      [NET]: Add network namespace clone & unshare support. · 9dd776b6
      Eric W. Biederman 提交于
      This patch allows you to create a new network namespace
      using sys_clone, or sys_unshare.
      
      As the network namespace is still experimental and under development
      clone and unshare support is only made available when CONFIG_NET_NS is
      selected at compile time.
      
      As this patch introduces network namespace support into code paths
      that exist when the CONFIG_NET is not selected there are a few
      additions made to net_namespace.h to allow a few more functions
      to be used when the networking stack is not compiled in.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd776b6
  6. 21 9月, 2007 1 次提交
    • D
      signalfd simplification · b8fceee1
      Davide Libenzi 提交于
      This simplifies signalfd code, by avoiding it to remain attached to the
      sighand during its lifetime.
      
      In this way, the signalfd remain attached to the sighand only during
      poll(2) (and select and epoll) and read(2).  This also allows to remove
      all the custom "tsk == current" checks in kernel/signal.c, since
      dequeue_signal() will only be called by "current".
      
      I think this is also what Ben was suggesting time ago.
      
      The external effect of this, is that a thread can extract only its own
      private signals and the group ones.  I think this is an acceptable
      behaviour, in that those are the signals the thread would be able to
      fetch w/out signalfd.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8fceee1
  7. 20 7月, 2007 4 次提交
  8. 18 7月, 2007 1 次提交
    • R
      Freezer: make kernel threads nonfreezable by default · 83144186
      Rafael J. Wysocki 提交于
      Currently, the freezer treats all tasks as freezable, except for the kernel
      threads that explicitly set the PF_NOFREEZE flag for themselves.  This
      approach is problematic, since it requires every kernel thread to either
      set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
      care for the freezing of tasks at all.
      
      It seems better to only require the kernel threads that want to or need to
      be frozen to use some freezer-related code and to remove any
      freezer-related code from the other (nonfreezable) kernel threads, which is
      done in this patch.
      
      The patch causes all kernel threads to be nonfreezable by default (ie.  to
      have PF_NOFREEZE set by default) and introduces the set_freezable()
      function that should be called by the freezable kernel threads in order to
      unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
      threads call set_freezable(), so it shouldn't cause any (intentional)
      change of behaviour to appear.  Additionally, it updates documentation to
      describe the freezing of tasks more accurately.
      
      [akpm@linux-foundation.org: build fixes]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NNigel Cunningham <nigel@nigel.suspend2.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83144186
  9. 17 7月, 2007 4 次提交
    • S
      user namespace: add unshare · 77ec739d
      Serge E. Hallyn 提交于
      This patch enables the unshare of user namespaces.
      
      It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
      resets the current user_struct and adds a new root user (uid == 0)
      
      For now, unsharing the user namespace allows a process to reset its
      user_struct accounting and uid 0 in the new user namespace should be contained
      using appropriate means, for instance selinux
      
      The plan, when the full support is complete (all uid checks covered), is to
      keep the original user's rights in the original namespace, and let a process
      become uid 0 in the new namespace, with full capabilities to the new
      namespace.
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Acked-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andrew Morgan <agm@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77ec739d
    • C
      user namespace: add the framework · acce292c
      Cedric Le Goater 提交于
      Basically, it will allow a process to unshare its user_struct table,
      resetting at the same time its own user_struct and all the associated
      accounting.
      
      A new root user (uid == 0) is added to the user namespace upon creation.
      Such root users have full privileges and it seems that theses privileges
      should be controlled through some means (process capabilities ?)
      
      The unshare is not included in this patch.
      
      Changes since [try #4]:
      	- Updated get_user_ns and put_user_ns to accept NULL, and
      	  get_user_ns to return the namespace.
      
      Changes since [try #3]:
      	- moved struct user_namespace to files user_namespace.{c,h}
      
      Changes since [try #2]:
      	- removed struct user_namespace* argument from find_user()
      
      Changes since [try #1]:
      	- removed struct user_namespace* argument from find_user()
      	- added a root_user per user namespace
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andrew Morgan <agm@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acce292c
    • M
      Audit: add TTY input auditing · 522ed776
      Miloslav Trmac 提交于
      Add TTY input auditing, used to audit system administrator's actions.  This is
      required by various security standards such as DCID 6/3 and PCI to provide
      non-repudiation of administrator's actions and to allow a review of past
      actions if the administrator seems to overstep their duties or if the system
      becomes misconfigured for unknown reasons.  These requirements do not make it
      necessary to audit TTY output as well.
      
      Compared to an user-space keylogger, this approach records TTY input using the
      audit subsystem, correlated with other audit events, and it is completely
      transparent to the user-space application (e.g.  the console ioctls still
      work).
      
      TTY input auditing works on a higher level than auditing all system calls
      within the session, which would produce an overwhelming amount of mostly
      useless audit events.
      
      Add an "audit_tty" attribute, inherited across fork ().  Data read from TTYs
      by process with the attribute is sent to the audit subsystem by the kernel.
      The audit netlink interface is extended to allow modifying the audit_tty
      attribute, and to allow sending explanatory audit events from user-space (for
      example, a shell might send an event containing the final command, after the
      interactive command-line editing and history expansion is performed, which
      might be difficult to decipher from the TTY input alone).
      
      Because the "audit_tty" attribute is inherited across fork (), it would be set
      e.g.  for sshd restarted within an audited session.  To prevent this, the
      audit_tty attribute is cleared when a process with no open TTY file
      descriptors (e.g.  after daemon startup) opens a TTY.
      
      See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
      more detailed rationale document for an older version of this patch.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NMiloslav Trmac <mitr@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Paul Fulghum <paulkf@microgate.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Steve Grubb <sgrubb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      522ed776
    • T
      Use boot based time for process start time and boot time in /proc · 924b42d5
      Tomas Janousek 提交于
      Commit 411187fb caused boot time to move and
      process start times to become invalid after suspend.  Using boot based time
      for those restores the old behaviour and fixes the issue.
      
      [akpm@linux-foundation.org: little cleanup]
      Signed-off-by: NTomas Janousek <tjanouse@redhat.com>
      Cc: Tomas Smetana <tsmetana@redhat.com>
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      924b42d5
  10. 10 7月, 2007 1 次提交
  11. 24 5月, 2007 1 次提交
    • R
      freezer: fix vfork problem · ba96a0c8
      Rafael J. Wysocki 提交于
      Currently try_to_freeze_tasks() has to wait until all of the vforked processes
      exit and for this reason every user can make it fail.  To fix this problem we
      can introduce the additional process flag PF_FREEZER_SKIP to be used by tasks
      that do not want to be counted as freezable by the freezer and want to have
      TIF_FREEZE set nevertheless.  Then, this flag can be set by tasks using
      sys_vfork() before they call wait_for_completion(&vfork) and cleared after
      they have woken up.  After clearing it, the tasks should call try_to_freeze()
      as soon as possible.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba96a0c8
  12. 17 5月, 2007 1 次提交
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  13. 11 5月, 2007 5 次提交
    • D
      signal/timer/event: signalfd core · fba2afaa
      Davide Libenzi 提交于
      This patch series implements the new signalfd() system call.
      
      I took part of the original Linus code (and you know how badly it can be
      broken :), and I added even more breakage ;) Signals are fetched from the same
      signal queue used by the process, so signalfd will compete with standard
      kernel delivery in dequeue_signal().  If you want to reliably fetch signals on
      the signalfd file, you need to block them with sigprocmask(SIG_BLOCK).  This
      seems to be working fine on my Dual Opteron machine.  I made a quick test
      program for it:
      
      http://www.xmailserver.org/signafd-test.c
      
      The signalfd() system call implements signal delivery into a file descriptor
      receiver.  The signalfd file descriptor if created with the following API:
      
      int signalfd(int ufd, const sigset_t *mask, size_t masksize);
      
      The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
      to close/create cycle (Linus idea).  Use "ufd" == -1 if you want a brand new
      signalfd file.
      
      The "mask" allows to specify the signal mask of signals that we are interested
      in.  The "masksize" parameter is the size of "mask".
      
      The signalfd fd supports the poll(2) and read(2) system calls.  The poll(2)
      will return POLLIN when signals are available to be dequeued.  As a direct
      consequence of supporting the Linux poll subsystem, the signalfd fd can use
      used together with epoll(2) too.
      
      The read(2) system call will return a "struct signalfd_siginfo" structure in
      the userspace supplied buffer.  The return value is the number of bytes copied
      in the supplied buffer, or -1 in case of error.  The read(2) call can also
      return 0, in case the sighand structure to which the signalfd was attached,
      has been orphaned.  The O_NONBLOCK flag is also supported, and read(2) will
      return -EAGAIN in case no signal is available.
      
      If the size of the buffer passed to read(2) is lower than sizeof(struct
      signalfd_siginfo), -EINVAL is returned.  A read from the signalfd can also
      return -ERESTARTSYS in case a signal hits the process.  The format of the
      struct signalfd_siginfo is, and the valid fields depends of the (->code &
      __SI_MASK) value, in the same way a struct siginfo would:
      
      struct signalfd_siginfo {
      	__u32 signo;	/* si_signo */
      	__s32 err;	/* si_errno */
      	__s32 code;	/* si_code */
      	__u32 pid;	/* si_pid */
      	__u32 uid;	/* si_uid */
      	__s32 fd;	/* si_fd */
      	__u32 tid;	/* si_fd */
      	__u32 band;	/* si_band */
      	__u32 overrun;	/* si_overrun */
      	__u32 trapno;	/* si_trapno */
      	__s32 status;	/* si_status */
      	__s32 svint;	/* si_int */
      	__u64 svptr;	/* si_ptr */
      	__u64 utime;	/* si_utime */
      	__u64 stime;	/* si_stime */
      	__u64 addr;	/* si_addr */
      };
      
      [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fba2afaa
    • S
      Use task_pgrp() task_session() in copy_process() · 0800d308
      Sukadev Bhattiprolu 提交于
      Use task_pgrp() and task_session() in copy_process(), and avoid find_pid()
      call when attaching the task to its process group and session.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <containers@lists.osdl.org>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0800d308
    • S
      Use struct pid parameter in copy_process() · 85868995
      Sukadev Bhattiprolu 提交于
      Modify copy_process() to take a struct pid * parameter instead of a pid_t.
      This simplifies the code a bit and also avoids having to call find_pid() to
      convert the pid_t to a struct pid.
      
      Changelog:
      	- Fixed Badari Pulavarty's comments and passed in &init_struct_pid
      	  from fork_idle().
      	- Fixed Eric Biederman's comments and simplified this patch and
      	  used a new patch to remove the likely(pid) check.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: <containers@lists.osdl.org>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85868995
    • S
      attach_pid() with struct pid parameter · e713d0da
      Sukadev Bhattiprolu 提交于
      attach_pid() currently takes a pid_t and then uses find_pid() to find the
      corresponding struct pid.  Sometimes we already have the struct pid.  We can
      then skip find_pid() if attach_pid() were to take a struct pid parameter.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e713d0da
    • E
      getrusage(): fill ru_inblock and ru_oublock fields if possible · 6eaeeaba
      Eric Dumazet 提交于
      If CONFIG_TASK_IO_ACCOUNTING is defined, we update io accounting counters for
      each task.
      
      This patch permits reporting of values using the well known getrusage()
      syscall, filling ru_inblock and ru_oublock instead of null values.
      
      As TASK_IO_ACCOUNTING currently counts bytes counts, we approximate blocks
      count doing : nr_blocks = nr_bytes / 512
      
      Example of use :
      ----------------------
      After patch is applied, /usr/bin/time command can now give a good
      approximation of IO that the process had to do.
      
      $ /usr/bin/time grep tototo /usr/include/*
      Command exited with non-zero status 1
      0.00user 0.02system 0:02.11elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
      24288inputs+0outputs (0major+259minor)pagefaults 0swaps
      
      $ /usr/bin/time dd if=/dev/zero of=/tmp/testfile count=1000
      1000+0 enregistrements lus
      1000+0 enregistrements écrits
      512000 octets (512 kB) copiés, 0,00326601 seconde, 157 MB/s
      0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+3000outputs (0major+299minor)pagefaults 0swaps
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6eaeeaba