1. 12 7月, 2012 1 次提交
  2. 08 7月, 2012 2 次提交
    • T
      cgroup: fix cgroup hierarchy umount race · 5db9a4d9
      Tejun Heo 提交于
      48ddbe19 "cgroup: make css->refcnt clearing on cgroup removal
      optional" allowed a css to linger after the associated cgroup is
      removed.  As a css holds a reference on the cgroup's dentry, it means
      that cgroup dentries may linger for a while.
      
      Destroying a superblock which has dentries with positive refcnts is a
      critical bug and triggers BUG() in vfs code.  As each cgroup dentry
      holds an s_active reference, any lingering cgroup has both its dentry
      and the superblock pinned and thus preventing premature release of
      superblock.
      
      Unfortunately, after 48ddbe19, there's a small window while
      releasing a cgroup which is directly under the root of the hierarchy.
      When a cgroup directory is released, vfs layer first deletes the
      corresponding dentry and then invokes dput() on the parent, which may
      recurse further, so when a cgroup directly below root cgroup is
      released, the cgroup is first destroyed - which releases the s_active
      it was holding - and then the dentry for the root cgroup is dput().
      
      This creates a window where the root dentry's refcnt isn't zero but
      superblock's s_active is.  If umount happens before or during this
      window, vfs will see the root dentry with non-zero refcnt and trigger
      BUG().
      
      Before 48ddbe19, this problem didn't exist because the last dentry
      reference was guaranteed to be put synchronously from rmdir(2)
      invocation which holds s_active around the whole process.
      
      Fix it by holding an extra superblock->s_active reference across
      dput() from css release, which is the dput() path added by 48ddbe19
      and the only one which doesn't hold an extra s_active ref across the
      final cgroup dput().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4FEEA5CB.8070809@huawei.com>
      Reported-by: Nshyju pv <shyju.pv@huawei.com>
      Tested-by: Nshyju pv <shyju.pv@huawei.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5db9a4d9
    • T
      Revert "cgroup: superblock can't be released with active dentries" · 7db5b3ca
      Tejun Heo 提交于
      This reverts commit fa980ca8.  The
      commit was an attempt to fix a race condition where a cgroup hierarchy
      may be unmounted with positive dentry reference on root cgroup.  While
      the commit made the race condition slightly more difficult to trigger,
      the race was still there and could be reliably triggered using a
      different test case.
      
      Revert the incorrect fix.  The next commit will describe the race and
      fix it correctly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4FEEA5CB.8070809@huawei.com>
      Reported-by: Nshyju pv <shyju.pv@huawei.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7db5b3ca
  3. 01 7月, 2012 1 次提交
  4. 30 6月, 2012 1 次提交
  5. 29 6月, 2012 1 次提交
    • K
      printk: flush continuation lines immediately to console · 084681d1
      Kay Sievers 提交于
      Continuation lines are buffered internally, intended to merge the
      chunked printk()s into a single record, and to isolate potentially
      racy continuation users from usual terminated line users.
      
      This though, has the effect that partial lines are not printed to
      the console in the moment they are emitted. In case the kernel
      crashes in the meantime, the potentially interesting printed
      information would never reach the consoles.
      
      Here we share the continuation buffer with the console copy logic,
      and partial lines are always immediately flushed to the available
      consoles. They are still buffered internally to improve the
      readability and integrity of the messages and minimize the amount
      of needed record headers to store.
      Signed-off-by: NKay Sievers <kay@vrfy.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      084681d1
  6. 27 6月, 2012 2 次提交
  7. 26 6月, 2012 2 次提交
  8. 21 6月, 2012 4 次提交
  9. 19 6月, 2012 1 次提交
  10. 18 6月, 2012 1 次提交
    • S
      perf: Use css_tryget() to avoid propping up css refcount · 9c5da09d
      Salman Qazi 提交于
      An rmdir pushes css's ref count to zero.  However, if the associated
      directory is open at the time, the dentry ref count is non-zero.  If
      the fd for this directory is then passed into perf_event_open, it
      does a css_get().  This bounces the ref count back up from zero.  This
      is a problem by itself.  But what makes it turn into a crash is the
      fact that we end up doing an extra dput, since we perform a dput
      when css_put sees the ref count go down to zero.
      
      css_tryget() does not fall into that trap. So, we use that instead.
      
      Reproduction test-case for the bug:
      
       #include <unistd.h>
       #include <sys/types.h>
       #include <sys/stat.h>
       #include <fcntl.h>
       #include <linux/unistd.h>
       #include <linux/perf_event.h>
       #include <string.h>
       #include <errno.h>
       #include <stdio.h>
      
       #define PERF_FLAG_PID_CGROUP    (1U << 2)
      
       int perf_event_open(struct perf_event_attr *hw_event_uptr,
                           pid_t pid, int cpu, int group_fd, unsigned long flags) {
               return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
                       group_fd, flags);
       }
      
       /*
        * Directly poke at the perf_event bug, since it's proving hard to repro
        * depending on where in the kernel tree.  what moved?
        */
       int main(int argc, char **argv)
       {
              int fd;
              struct perf_event_attr attr;
              memset(&attr, 0, sizeof(attr));
              attr.exclude_kernel = 1;
              attr.size = sizeof(attr);
              mkdir("/dev/cgroup/perf_event/blah", 0777);
              fd = open("/dev/cgroup/perf_event/blah", O_RDONLY);
              perror("open");
              rmdir("/dev/cgroup/perf_event/blah");
              sleep(2);
              perf_event_open(&attr, fd, 0, -1,  PERF_FLAG_PID_CGROUP);
              perror("perf_event_open");
              close(fd);
              return 0;
       }
      Signed-off-by: NSalman Qazi <sqazi@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c5da09d
  11. 16 6月, 2012 3 次提交
  12. 14 6月, 2012 2 次提交
    • D
      watchdog: Quiet down the boot messages · a7027046
      Don Zickus 提交于
      A bunch of bugzillas have complained how noisy the nmi_watchdog
      is during boot-up especially with its expected failure cases
      (like virt and bios resource contention).
      
      This is my attempt to quiet them down and keep it less confusing
      for the end user.  What I did is print the message for cpu0 and
      save it for future comparisons.  If future cpus have an
      identical message as cpu0, then don't print the redundant info.
      However, if a future cpu has a different message, happily print
      that loudly.
      
      Before the change, you would see something like:
      
          ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
          CPU0: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz stepping 0a
          Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
          ... version:                2
          ... bit width:              40
          ... generic registers:      2
          ... value mask:             000000ffffffffff
          ... max period:             000000007fffffff
          ... fixed-purpose events:   3
          ... event mask:             0000000700000003
          NMI watchdog enabled, takes one hw-pmu counter.
          Booting Node   0, Processors  #1
          NMI watchdog enabled, takes one hw-pmu counter.
           #2
          NMI watchdog enabled, takes one hw-pmu counter.
           #3 Ok.
          NMI watchdog enabled, takes one hw-pmu counter.
          Brought up 4 CPUs
          Total of 4 processors activated (22607.24 BogoMIPS).
      
      After the change, it is simplified to:
      
          ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
          CPU0: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz stepping 0a
          Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
          ... version:                2
          ... bit width:              40
          ... generic registers:      2
          ... value mask:             000000ffffffffff
          ... max period:             000000007fffffff
          ... fixed-purpose events:   3
          ... event mask:             0000000700000003
          NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
          Booting Node   0, Processors  #1 #2 #3 Ok.
          Brought up 4 CPUs
      
      V2: little changes based on Joe Perches' feedback
      V3: printk cleanup based on Ingo's feedback; checkpatch fix
      V4: keep printk as one long line
      V5: Ingo fix ups
      Reported-and-tested-by: NNathan Zimmer <nzimmer@sgi.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: nzimmer@sgi.com
      Cc: joe@perches.com
      Link: http://lkml.kernel.org/r/1339594548-17227-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7027046
    • E
      splice: fix racy pipe->buffers uses · 047fe360
      Eric Dumazet 提交于
      Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
      by splice_shrink_spd() called from vmsplice_to_pipe()
      
      commit 35f3d14d (pipe: add support for shrinking and growing pipes)
      added capability to adjust pipe->buffers.
      
      Problem is some paths don't hold pipe mutex and assume pipe->buffers
      doesn't change for their duration.
      
      Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
      use it in place of pipe->buffers where appropriate.
      
      splice_shrink_spd() loses its struct pipe_inode_info argument.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tom Herbert <therbert@google.com>
      Cc: stable <stable@vger.kernel.org> # 2.6.35
      Tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      047fe360
  13. 13 6月, 2012 1 次提交
  14. 09 6月, 2012 1 次提交
    • R
      sched/fair: fix lots of kernel-doc warnings · cd96891d
      Randy Dunlap 提交于
      Fix lots of new kernel-doc warnings in kernel/sched/fair.c:
      
        Warning(kernel/sched/fair.c:3625): No description found for parameter 'env'
        Warning(kernel/sched/fair.c:3625): Excess function parameter 'sd' description in 'update_sg_lb_stats'
        Warning(kernel/sched/fair.c:3735): No description found for parameter 'env'
        Warning(kernel/sched/fair.c:3735): Excess function parameter 'sd' description in 'update_sd_pick_busiest'
        Warning(kernel/sched/fair.c:3735): Excess function parameter 'this_cpu' description in 'update_sd_pick_busiest'
        .. more warnings
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd96891d
  15. 08 6月, 2012 6 次提交
  16. 07 6月, 2012 6 次提交
  17. 06 6月, 2012 5 次提交