1. 11 1月, 2012 3 次提交
  2. 09 1月, 2012 1 次提交
  3. 07 1月, 2012 2 次提交
  4. 06 1月, 2012 1 次提交
    • L
      cgroup: fix to allow mounting a hierarchy by name · 0d19ea86
      Li Zefan 提交于
      If we mount a hierarchy with a specified name, the name is unique,
      and we can use it to mount the hierarchy without specifying its
      set of subsystem names. This feature is documented is
      Documentation/cgroups/cgroups.txt section 2.3
      
      Here's an example:
      
      	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
      	# mount -t cgroup -o name=myhier xxx /cgroup2
      
      But it was broken by commit 32a8cf23
      (cgroup: make the mount options parsing more accurate)
      
      This fixes the regression.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0d19ea86
  5. 05 1月, 2012 3 次提交
    • B
      PM / Hibernate: Implement compat_ioctl for /dev/snapshot · c336078b
      Ben Hutchings 提交于
      This allows uswsusp built for i386 to run on an x86_64 kernel (tested
      with Debian package version 1.0+20110509-2).
      
      References: http://bugs.debian.org/502816Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      c336078b
    • O
      ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b
      Oleg Nesterov 提交于
      This is the temporary simple fix for 3.2, we need more changes in this
      area.
      
      1. do_signal_stop() assumes that the running untraced thread in the
         stopped thread group is not possible. This was our goal but it is
         not yet achieved: a stopped-but-resumed tracee can clone the running
         thread which can initiate another group-stop.
      
         Remove WARN_ON_ONCE(!current->ptrace).
      
      2. A new thread always starts with ->jobctl = 0. If it is auto-attached
         and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
         but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
         in do_jobctl_trap() if another debugger attaches.
      
         Change __ptrace_unlink() to set the artificial SIGSTOP for report.
      
         Alternatively we could change ptrace_init_task() to copy signr from
         current, but this means we can copy it for no reason and hide the
         possible similar problems.
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.1]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a88951b
    • O
      ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race · 50b8d257
      Oleg Nesterov 提交于
      Test-case:
      
      	int main(void)
      	{
      		int pid, status;
      
      		pid = fork();
      		if (!pid) {
      			for (;;) {
      				if (!fork())
      					return 0;
      				if (waitpid(-1, &status, 0) < 0) {
      					printf("ERR!! wait: %m\n");
      					return 0;
      				}
      			}
      		}
      
      		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
      		assert(waitpid(-1, NULL, 0) == pid);
      
      		assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
      					PTRACE_O_TRACEFORK) == 0);
      
      		do {
      			ptrace(PTRACE_CONT, pid, 0, 0);
      			pid = waitpid(-1, NULL, 0);
      		} while (pid > 0);
      
      		return 1;
      	}
      
      It fails because ->real_parent sees its child in EXIT_DEAD state
      while the tracer is going to change the state back to EXIT_ZOMBIE
      in wait_task_zombie().
      
      The offending commit is 823b018e which moved the EXIT_DEAD check,
      but in fact we should not blame it. The original code was not
      correct as well because it didn't take ptrace_reparented() into
      account and because we can't really trust ->ptrace.
      
      This patch adds the additional check to close this particular
      race but it doesn't solve the whole problem. We simply can't
      rely on ->ptrace in this case, it can be cleared if the tracer
      is multithreaded by the exiting ->parent.
      
      I think we should kill EXIT_DEAD altogether, we should always
      remove the soon-to-be-reaped child from ->children or at least
      we should never do the DEAD->ZOMBIE transition. But this is too
      complex for 3.2.
      Reported-and-tested-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Tested-by: NLukasz Michalik <lmi@ift.uni.wroc.pl>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.0+]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50b8d257
  6. 04 1月, 2012 11 次提交
  7. 02 1月, 2012 1 次提交
  8. 01 1月, 2012 1 次提交
    • H
      futex: Fix uninterruptible loop due to gate_area · e6780f72
      Hugh Dickins 提交于
      It was found (by Sasha) that if you use a futex located in the gate
      area we get stuck in an uninterruptible infinite loop, much like the
      ZERO_PAGE issue.
      
      While looking at this problem, PeterZ realized you'll get into similar
      trouble when hitting any install_special_pages() mapping.  And are there
      still drivers setting up their own special mmaps without page->mapping,
      and without special VM or pte flags to make get_user_pages fail?
      
      In most cases, if page->mapping is NULL, we do not need to retry at all:
      Linus points out that even /proc/sys/vm/drop_caches poses no problem,
      because it ends up using remove_mapping(), which takes care not to
      interfere when the page reference count is raised.
      
      But there is still one case which does need a retry: if memory pressure
      called shmem_writepage in between get_user_pages_fast dropping page
      table lock and our acquiring page lock, then the page gets switched from
      filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
      Fault it back in to get the page->mapping needed for key->shared.inode.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6780f72
  9. 31 12月, 2011 1 次提交
  10. 28 12月, 2011 4 次提交
  11. 24 12月, 2011 2 次提交
  12. 23 12月, 2011 1 次提交
  13. 22 12月, 2011 7 次提交
    • M
      cgroup: only need to check oldcgrp==newgrp once · 892a2b90
      Mandeep Singh Baines 提交于
      In cgroup_attach_proc it is now sufficient to only check that
      oldcgrp==newcgrp once. Now that we are using threadgroup_lock()
      during the migrations, oldcgrp will not change.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: containers@lists.linux-foundation.org
      Cc: cgroups@vger.kernel.org
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      892a2b90
    • M
      cgroup: remove redundant get/put of task struct · b07ef774
      Mandeep Singh Baines 提交于
      threadgroup_lock() guarantees that the target threadgroup will
      remain stable - no new task will be added, no new PF_EXITING
      will be set and exec won't happen.
      
      Changes in V2:
      * https://lkml.org/lkml/2011/12/20/369 (Tejun Heo)
        * Undo incorrect removal of get/put from attach_task_by_pid()
      * Author
        * Remove a comment which is made stale by this change
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: containers@lists.linux-foundation.org
      Cc: cgroups@vger.kernel.org
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      b07ef774
    • M
      cgroup: remove redundant get/put of old css_set from migrate · 026085ef
      Mandeep Singh Baines 提交于
      We can now assume that the css_set reference held by the task
      will not go away for an exiting task. PF_EXITING state can be
      trusted throughout migration by checking it after locking
      threadgroup.
      
      Changes in V4:
      * https://lkml.org/lkml/2011/12/20/368 (Tejun Heo)
        * Fix typo in commit message
        * Undid the rename of css_set_check_fetched
      * https://lkml.org/lkml/2011/12/20/427 (Li Zefan)
        * Fix comment in cgroup_task_migrate()
      Changes in V3:
      * https://lkml.org/lkml/2011/12/20/255 (Frederic Weisbecker)
        * Fixed to put error in retval
      Changes in V2:
      * https://lkml.org/lkml/2011/12/19/289 (Tejun Heo)
        * Updated commit message
      
      -tj: removed stale patch description about dropped function rename.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: containers@lists.linux-foundation.org
      Cc: cgroups@vger.kernel.org
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      026085ef
    • K
      clockevents: remove sysdev.h · 7239f65c
      Kay Sievers 提交于
      This isn't needed in the clockevents.c file, and the header file is
      going away soon, so just remove the #include
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      7239f65c
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
    • F
      cgroup: Remove unnecessary task_lock before fetching css_set on migration · c84cdf75
      Frederic Weisbecker 提交于
      When we fetch the css_set of the tasks on cgroup migration, we don't need
      anymore to synchronize against cgroup_exit() that could swap the old one
      with init_css_set. Now that we are using threadgroup_lock() during
      the migrations, we don't need to worry about it anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NMandeep Singh Baines <msb@chromium.org>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Containers <containers@lists.linux-foundation.org>
      Cc: Cgroups <cgroups@vger.kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      c84cdf75
    • F
      cgroup: Drop task_lock(parent) on cgroup_fork() · 7e381b0e
      Frederic Weisbecker 提交于
      We don't need to hold the parent task_lock() on the
      parent in cgroup_fork() because we are already synchronized
      against the two places that may change the parent css_set
      concurrently:
      
      - cgroup_exit(), but the parent obviously can't exit concurrently
      - cgroup migration: we are synchronized against threadgroup_lock()
      
      So we can safely remove the task_lock() there.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Containers <containers@lists.linux-foundation.org>
      Cc: Cgroups <cgroups@vger.kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      7e381b0e
  14. 21 12月, 2011 2 次提交
    • D
      sched: Fix cgroup movement of waking process · 62af3783
      Daisuke Nishimura 提交于
      There is a small race between try_to_wake_up() and sched_move_task(),
      which is trying to move the process being woken up.
      
          try_to_wake_up() on CPU0       sched_move_task() on CPU1
      --------------------------------+---------------------------------
        raw_spin_lock_irqsave(p->pi_lock)
        task_waking_fair()
          ->p.se.vruntime -= cfs_rq->min_vruntime
        ttwu_queue()
          ->send reschedule IPI to CPU1
        raw_spin_unlock_irqsave(p->pi_lock)
                                         task_rq_lock()
                                           -> tring to aquire both p->pi_lock and
                                              rq->lock with IRQ disabled
                                         task_move_group_fair()
                                           -> p.se.vruntime
                                                -= (old)cfs_rq->min_vruntime
                                                += (new)cfs_rq->min_vruntime
                                         task_rq_unlock()
      
                                         (via IPI)
                                         sched_ttwu_pending()
                                           raw_spin_lock(rq->lock)
                                           ttwu_do_activate()
                                             ...
                                             enqueue_entity()
                                               child.se->vruntime += cfs_rq->min_vruntime
                                           raw_spin_unlock(rq->lock)
      
      As a result, vruntime of the process becomes far bigger than min_vruntime,
      if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.
      
      This patch fixes this problem by just ignoring such process in
      task_move_group_fair(), because the vruntime has already been normalized in
      task_waking_fair().
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20111215143741.df82dd50.nishimura@mxp.nes.nec.co.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>
      62af3783
    • D
      sched: Fix cgroup movement of newly created process · 7ceff013
      Daisuke Nishimura 提交于
      There is a small race between do_fork() and sched_move_task(), which is
      trying to move the child.
      
                  do_fork()                 sched_move_task()
      --------------------------------+---------------------------------
        copy_process()
          sched_fork()
            task_fork_fair()
              -> vruntime of the child is initialized
                 based on that of the parent.
        -> we can see the child in "tasks" file now.
                                          task_rq_lock()
                                          task_move_group_fair()
                                            -> child.se.vruntime
                                                 -= (old)cfs_rq->min_vruntime
                                                 += (new)cfs_rq->min_vruntime
                                          task_rq_unlock()
        wake_up_new_task()
          ...
          enqueue_entity()
            child.se.vruntime += cfs_rq->min_vruntime
      
      As a result, vruntime of the child becomes far bigger than min_vruntime,
      if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.
      
      This patch fixes this problem by just ignoring such process in
      task_move_group_fair(), because the vruntime has already been normalized in
      task_fork_fair().
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20111215143607.2ee12c5d.nishimura@mxp.nes.nec.co.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>
      7ceff013