1. 04 1月, 2012 6 次提交
  2. 01 1月, 2012 1 次提交
    • H
      futex: Fix uninterruptible loop due to gate_area · e6780f72
      Hugh Dickins 提交于
      It was found (by Sasha) that if you use a futex located in the gate
      area we get stuck in an uninterruptible infinite loop, much like the
      ZERO_PAGE issue.
      
      While looking at this problem, PeterZ realized you'll get into similar
      trouble when hitting any install_special_pages() mapping.  And are there
      still drivers setting up their own special mmaps without page->mapping,
      and without special VM or pte flags to make get_user_pages fail?
      
      In most cases, if page->mapping is NULL, we do not need to retry at all:
      Linus points out that even /proc/sys/vm/drop_caches poses no problem,
      because it ends up using remove_mapping(), which takes care not to
      interfere when the page reference count is raised.
      
      But there is still one case which does need a retry: if memory pressure
      called shmem_writepage in between get_user_pages_fast dropping page
      table lock and our acquiring page lock, then the page gets switched from
      filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
      Fault it back in to get the page->mapping needed for key->shared.inode.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6780f72
  3. 31 12月, 2011 1 次提交
  4. 21 12月, 2011 2 次提交
    • M
      binary_sysctl(): fix memory leak · 3d3c8f93
      Michel Lespinasse 提交于
      binary_sysctl() calls sysctl_getname() which allocates from names_cache
      slab usin __getname()
      
      The matching function to free the name is __putname(), and not putname()
      which should be used only to match getname() allocations.
      
      This is because when auditing is enabled, putname() calls audit_putname
      *instead* (not in addition) to __putname().  Then, if a syscall is in
      progress, audit_putname does not release the name - instead, it expects
      the name to get released when the syscall completes, but that will happen
      only if audit_getname() was called previously, i.e.  if the name was
      allocated with getname() rather than the naked __getname().  So,
      __getname() followed by putname() ends up leaking memory.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d3c8f93
    • D
      cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask · b246272e
      David Rientjes 提交于
      Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
      nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
      new set of allowed cpuset nodes where the two nodemasks, as a result of
      the remap, are now disjoint.
      
      c0ff7453 ("cpuset,mm: fix no node to alloc memory when changing
      cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
      nodes from changing for a thread.  This causes any update to a set of
      allowed nodes to stall until put_mems_allowed() is called.
      
      This stall is unncessary, however, if at least one node remains unchanged
      in the update to the set of allowed nodes.  This was addressed by
      89e8a244 ("cpusets: avoid looping when storing to mems_allowed if one
      node remains set"), but it's still possible that an empty nodemask may be
      read from a mempolicy because the old nodemask may be remapped to the new
      nodemask during rebind.  To prevent this, only avoid the stall if there is
      no mempolicy for the thread being changed.
      
      This is a temporary solution until all reads from mempolicy nodemasks can
      be guaranteed to not be empty without the get_mems_allowed()
      synchronization.
      
      Also moves the check for nodemask intersection inside task_lock() so that
      tsk->mems_allowed cannot change.  This ensures that nothing can set this
      tsk's mems_allowed out from under us and also protects tsk->mempolicy.
      Reported-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b246272e
  5. 20 12月, 2011 1 次提交
    • M
      cgroups: fix a css_set not found bug in cgroup_attach_proc · e0197aae
      Mandeep Singh Baines 提交于
      There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
      is not called for the PF_EXITING case, find_existing_css_set() will return
      NULL inside cgroup_task_migrate() causing a BUG.
      
      This bug is easy to reproduce. Create a zombie and echo its pid to
      cgroup.procs.
      
      $ cat zombie.c
      \#include <unistd.h>
      
      int main()
      {
        if (fork())
            pause();
        return 0;
      }
      $
      
      We are hitting this bug pretty regularly on ChromeOS.
      
      This bug is already fixed by Tejun Heo's cgroup patchset which is
      targetted for the next merge window:
      
      https://lkml.org/lkml/2011/11/1/356
      
      I've create a smaller patch here which just fixes this bug so that a
      fix can be merged into the current release and stable.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Downstream-Bug-Report: http://crosbug.com/23953Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: containers@lists.linux-foundation.org
      Cc: cgroups@vger.kernel.org
      Cc: stable@kernel.org
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Olof Johansson <olofj@chromium.org>
      e0197aae
  6. 19 12月, 2011 1 次提交
  7. 16 12月, 2011 1 次提交
  8. 14 12月, 2011 1 次提交
  9. 09 12月, 2011 2 次提交
  10. 07 12月, 2011 2 次提交
  11. 06 12月, 2011 6 次提交
  12. 05 12月, 2011 1 次提交
    • P
      perf: Fix loss of notification with multi-event · 10c6db11
      Peter Zijlstra 提交于
      When you do:
              $ perf record -e cycles,cycles,cycles noploop 10
      
      You expect about 10,000 samples for each event, i.e., 10s at
      1000samples/sec. However, this is not what's happening. You
      get much fewer samples, maybe 3700 samples/event:
      
      $ perf report -D | tail -15
      Aggregated stats:
                 TOTAL events:      10998
                  MMAP events:         66
                  COMM events:          2
                SAMPLE events:      10930
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      cycles stats:
                 TOTAL events:       3642
                SAMPLE events:       3642
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      
      On a Intel Nehalem or even AMD64, there are 4 counters capable
      of measuring cycles, so there is plenty of space to measure those
      events without multiplexing (even with the NMI watchdog active).
      And even with multiplexing, we'd expect roughly the same number
      of samples per event.
      
      The root of the problem was that when the event that caused the buffer
      to become full was not the first event passed on the cmdline, the user
      notification would get lost. The notification was sent to the file
      descriptor of the overflowed event but the perf tool was not polling
      on it.  The perf tool aggregates all samples into a single buffer,
      i.e., the buffer of the first event. Consequently, it assumes
      notifications for any event will come via that descriptor.
      
      The seemingly straight forward solution of moving the waitq into the
      ringbuffer object doesn't work because of life-time issues. One could
      perf_event_set_output() on a fd that you're also blocking on and cause
      the old rb object to be freed while its waitq would still be
      referenced by the blocked thread -> FAIL.
      
      Therefore link all events to the ringbuffer and broadcast the wakeup
      from the ringbuffer object to all possible events that could be waited
      upon. This is rather ugly, and we're open to better solutions but it
      works for now.
      Reported-by: NStephane Eranian <eranian@google.com>
      Finished-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      10c6db11
  13. 02 12月, 2011 5 次提交
  14. 29 11月, 2011 1 次提交
  15. 25 11月, 2011 1 次提交
    • M
      cgroup_freezer: fix freezing groups with stopped tasks · 884a45d9
      Michal Hocko 提交于
      2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
      transitions) removed is_task_frozen_enough and replaced it with a simple
      frozen call. This, however, breaks freezing for a group with stopped tasks
      because those cannot be frozen and so the group remains in CGROUP_FREEZING
      state (update_if_frozen doesn't count stopped tasks) and never reaches
      CGROUP_FROZEN.
      
      Let's add is_task_frozen_enough back and use it at the original locations
      (update_if_frozen and try_to_freeze_cgroup). Semantically we consider
      stopped tasks as frozen enough so we should consider both cases when
      testing frozen tasks.
      
      Testcase:
      mkdir /dev/freezer
      mount -t cgroup -o freezer none /dev/freezer
      mkdir /dev/freezer/foo
      sleep 1h &
      pid=$!
      kill -STOP $pid
      echo $pid > /dev/freezer/foo/tasks
      echo FROZEN > /dev/freezer/foo/freezer.state
      while true
      do
      	cat /dev/freezer/foo/freezer.state
      	[ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
      	sleep 1
      done
      echo OK
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Tomasz Buchert <tomasz.buchert@inria.fr>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      884a45d9
  16. 24 11月, 2011 1 次提交
  17. 19 11月, 2011 3 次提交
  18. 18 11月, 2011 2 次提交
  19. 17 11月, 2011 1 次提交
  20. 16 11月, 2011 1 次提交