1. 09 12月, 2011 1 次提交
  2. 07 12月, 2011 1 次提交
    • A
      fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API · 02125a82
      Al Viro 提交于
      __d_path() API is asking for trouble and in case of apparmor d_namespace_path()
      getting just that.  The root cause is that when __d_path() misses the root
      it had been told to look for, it stores the location of the most remote ancestor
      in *root.  Without grabbing references.  Sure, at the moment of call it had
      been pinned down by what we have in *path.  And if we raced with umount -l, we
      could have very well stopped at vfsmount/dentry that got freed as soon as
      prepend_path() dropped vfsmount_lock.
      
      It is safe to compare these pointers with pre-existing (and known to be still
      alive) vfsmount and dentry, as long as all we are asking is "is it the same
      address?".  Dereferencing is not safe and apparmor ended up stepping into
      that.  d_namespace_path() really wants to examine the place where we stopped,
      even if it's not connected to our namespace.  As the result, it looked
      at ->d_sb->s_magic of a dentry that might've been already freed by that point.
      All other callers had been careful enough to avoid that, but it's really
      a bad interface - it invites that kind of trouble.
      
      The fix is fairly straightforward, even though it's bigger than I'd like:
      	* prepend_path() root argument becomes const.
      	* __d_path() is never called with NULL/NULL root.  It was a kludge
      to start with.  Instead, we have an explicit function - d_absolute_root().
      Same as __d_path(), except that it doesn't get root passed and stops where
      it stops.  apparmor and tomoyo are using it.
      	* __d_path() returns NULL on path outside of root.  The main
      caller is show_mountinfo() and that's precisely what we pass root for - to
      skip those outside chroot jail.  Those who don't want that can (and do)
      use d_path().
      	* __d_path() root argument becomes const.  Everyone agrees, I hope.
      	* apparmor does *NOT* try to use __d_path() or any of its variants
      when it sees that path->mnt is an internal vfsmount.  In that case it's
      definitely not mounted anywhere and dentry_path() is exactly what we want
      there.  Handling of sysctl()-triggered weirdness is moved to that place.
      	* if apparmor is asked to do pathname relative to chroot jail
      and __d_path() tells it we it's not in that jail, the sucker just calls
      d_absolute_path() instead.  That's the other remaining caller of __d_path(),
      BTW.
              * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
      the normal seq_file logics will take care of growing the buffer and redoing
      the call of ->show() just fine).  However, if it gets path not reachable
      from root, it returns SEQ_SKIP.  The only caller adjusted (i.e. stopped
      ignoring the return value as it used to do).
      Reviewed-by: NJohn Johansen <john.johansen@canonical.com>
      ACKed-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      02125a82
  3. 06 12月, 2011 11 次提交
  4. 05 12月, 2011 1 次提交
    • P
      perf: Fix loss of notification with multi-event · 10c6db11
      Peter Zijlstra 提交于
      When you do:
              $ perf record -e cycles,cycles,cycles noploop 10
      
      You expect about 10,000 samples for each event, i.e., 10s at
      1000samples/sec. However, this is not what's happening. You
      get much fewer samples, maybe 3700 samples/event:
      
      $ perf report -D | tail -15
      Aggregated stats:
                 TOTAL events:      10998
                  MMAP events:         66
                  COMM events:          2
                SAMPLE events:      10930
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      cycles stats:
                 TOTAL events:       3642
                SAMPLE events:       3642
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      
      On a Intel Nehalem or even AMD64, there are 4 counters capable
      of measuring cycles, so there is plenty of space to measure those
      events without multiplexing (even with the NMI watchdog active).
      And even with multiplexing, we'd expect roughly the same number
      of samples per event.
      
      The root of the problem was that when the event that caused the buffer
      to become full was not the first event passed on the cmdline, the user
      notification would get lost. The notification was sent to the file
      descriptor of the overflowed event but the perf tool was not polling
      on it.  The perf tool aggregates all samples into a single buffer,
      i.e., the buffer of the first event. Consequently, it assumes
      notifications for any event will come via that descriptor.
      
      The seemingly straight forward solution of moving the waitq into the
      ringbuffer object doesn't work because of life-time issues. One could
      perf_event_set_output() on a fd that you're also blocking on and cause
      the old rb object to be freed while its waitq would still be
      referenced by the blocked thread -> FAIL.
      
      Therefore link all events to the ringbuffer and broadcast the wakeup
      from the ringbuffer object to all possible events that could be waited
      upon. This is rather ugly, and we're open to better solutions but it
      works for now.
      Reported-by: NStephane Eranian <eranian@google.com>
      Finished-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      10c6db11
  5. 04 12月, 2011 1 次提交
  6. 02 12月, 2011 1 次提交
  7. 01 12月, 2011 1 次提交
  8. 29 11月, 2011 4 次提交
  9. 27 11月, 2011 4 次提交
  10. 24 11月, 2011 3 次提交
  11. 23 11月, 2011 4 次提交
  12. 22 11月, 2011 1 次提交
    • P
      netfilter: nf_conntrack: make event callback registration per-netns · 70e9942f
      Pablo Neira Ayuso 提交于
      This patch fixes an oops that can be triggered following this recipe:
      
      0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
      1) container is started.
      2) connect to it via lxc-console.
      3) generate some traffic with the container to create some conntrack
         entries in its table.
      4) stop the container: you hit one oops because the conntrack table
         cleanup tries to report the destroy event to user-space but the
         per-netns nfnetlink socket has already gone (as the nfnetlink
         socket is per-netns but event callback registration is global).
      
      To fix this situation, we make the ctnl_notifier per-netns so the
      callback is registered/unregistered if the container is
      created/destroyed.
      
      Alex Bligh and Alexey Dobriyan originally proposed one small patch to
      check if the nfnetlink socket is gone in nfnetlink_has_listeners,
      but this is a very visited path for events, thus, it may reduce
      performance and it looks a bit hackish to check for the nfnetlink
      socket only to workaround this situation. As a result, I decided
      to follow the bigger path choice, which seems to look nicer to me.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Reported-by: NAlex Bligh <alex@alex.org.uk>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      70e9942f
  13. 20 11月, 2011 1 次提交
    • M
      drm/radeon/kms: add a CS ioctl flag not to rewrite tiling flags in the CS · e70f224c
      Marek Olšák 提交于
      This adds a new optional chunk to the CS ioctl that specifies optional flags
      to the CS parser. Why this is useful is explained below. Note that some regs
      no longer need the NOP relocation packet if this feature is enabled.
      Tested on r300g and r600g with this flag disabled and enabled.
      
      Assume there are two contexts sharing the same mipmapped tiled texture.
      One context wants to render into the first mipmap and the other one
      wants to render into the last mipmap. As you probably know, the hardware
      has a MACRO_SWITCH feature, which turns off macro tiling for small mipmaps,
      but that only applies to samplers.
      (at least on r300-r500, though later hardware likely behaves the same)
      
      So we want to just re-set the tiling flags before rendering (writing
      packets), right? ... No. The contexts run in parallel, so they may
      set the tiling flags simultaneously and then fire their command streams
      also simultaneously. The last one setting the flags wins, the other one
      loses.
      
      Another problem is when one context wants to render into the first and
      the last mipmap in one CS. Impossible. It must flush before changing
      tiling flags and do the rendering into the smaller mipmaps in another CS.
      
      Yet another problem is that writing copy_blit in userspace would be a mess
      involving re-setting tiling flags to please the kernel, and causing races
      with other contexts at the same time.
      
      The only way out of this is to send tiling flags with each CS, ideally
      with each relocation. But we already do that through the registers.
      So let's just use what we have in the registers.
      Signed-off-by: NMarek Olšák <maraeo@gmail.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      e70f224c
  14. 19 11月, 2011 1 次提交
    • D
      hugetlb: remove dummy definitions of HPAGE_MASK and HPAGE_SIZE · a5c86e98
      David Rientjes 提交于
      Dummy, non-zero definitions for HPAGE_MASK and HPAGE_SIZE were added in
      51c6f666 ("mm: ZAP_BLOCK causes redundant work") to avoid a divide
      by zero in generic kernel code.
      
      That code has since been removed, but probably should never have been
      added in the first place: we don't want HPAGE_SIZE to act like PAGE_SIZE
      for code that is working with hugepages, for example, when the
      dependency on CONFIG_HUGETLB_PAGE has not been fulfilled.
      
      Because hugepage size can differ from architecture to architecture, each
      is required to have their own definitions for both HPAGE_MASK and
      HPAGE_SIZE.  This is always done in arch/*/include/asm/page.h.
      
      So, just remove the dummy and dangerous definitions since they are no
      longer needed and reveals the correct dependencies.  Tested on
      architectures using the definitions with allyesconfig: x86 (even with
      thp), hppa, mips, powerpc, s390, sh3, sh4, sparc, and sparc64, and with
      defconfig on ia64.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5c86e98
  15. 18 11月, 2011 2 次提交
    • K
      pstore: pass allocated memory region back to caller · f6f82851
      Kees Cook 提交于
      The buf_lock cannot be held while populating the inodes, so make the backend
      pass forward an allocated and filled buffer instead. This solves the following
      backtrace. The effect is that "buf" is only ever used to notify the backends
      that something was written to it, and shouldn't be used in the read path.
      
      To replace the buf_lock during the read path, isolate the open/read/close
      loop with a separate mutex to maintain serialized access to the backend.
      
      Note that is is up to the pstore backend to cope if the (*write)() path is
      called in the middle of the read path.
      
      [   59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847
      [   59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount
      [   59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1
      [   59.691019] Call Trace:
      [   59.691019]  [<810252d5>] __might_sleep+0xc3/0xca
      [   59.691019]  [<810a26e6>] kmem_cache_alloc+0x32/0xf3
      [   59.691019]  [<810b53ac>] ? __d_lookup_rcu+0x6f/0xf4
      [   59.691019]  [<810b68b1>] alloc_inode+0x2a/0x64
      [   59.691019]  [<810b6903>] new_inode+0x18/0x43
      [   59.691019]  [<81142447>] pstore_get_inode.isra.1+0x11/0x98
      [   59.691019]  [<81142623>] pstore_mkfile+0xae/0x26f
      [   59.691019]  [<810a2a66>] ? kmem_cache_free+0x19/0xb1
      [   59.691019]  [<8116c821>] ? ida_get_new_above+0x140/0x158
      [   59.691019]  [<811708ea>] ? __init_rwsem+0x1e/0x2c
      [   59.691019]  [<810b67e8>] ? inode_init_always+0x111/0x1b0
      [   59.691019]  [<8102127e>] ? should_resched+0xd/0x27
      [   59.691019]  [<8137977f>] ? _cond_resched+0xd/0x21
      [   59.691019]  [<81142abf>] pstore_get_records+0x52/0xa7
      [   59.691019]  [<8114254b>] pstore_fill_super+0x7d/0x91
      [   59.691019]  [<810a7ff5>] mount_single+0x46/0x82
      [   59.691019]  [<8114231a>] pstore_mount+0x15/0x17
      [   59.691019]  [<811424ce>] ? pstore_get_inode.isra.1+0x98/0x98
      [   59.691019]  [<810a8199>] mount_fs+0x5a/0x12d
      [   59.691019]  [<810b9174>] ? alloc_vfsmnt+0xa4/0x14a
      [   59.691019]  [<810b9474>] vfs_kern_mount+0x4f/0x7d
      [   59.691019]  [<810b9d7e>] do_kern_mount+0x34/0xb2
      [   59.691019]  [<810bb15f>] do_mount+0x5fc/0x64a
      [   59.691019]  [<810912fb>] ? strndup_user+0x2e/0x3f
      [   59.691019]  [<810bb3cb>] sys_mount+0x66/0x99
      [   59.691019]  [<8137b537>] sysenter_do_call+0x12/0x26
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      f6f82851
    • R
      PM Sleep: Do not extend wakeup paths to devices with ignore_children set · 8b258cc8
      Rafael J. Wysocki 提交于
      Commit 4ca46ff3 (PM / Sleep: Mark
      devices involved in wakeup signaling during suspend) introduced
      the power.wakeup_path field in struct dev_pm_info to mark devices
      whose children are enabled to wake up the system from sleep states,
      so that power domains containing the parents that provide their
      children with wakeup power and/or relay their wakeup signals are not
      turned off.  Unfortunately, that introduced a PM regression on SH7372
      whose power consumption in the system "memory sleep" state increased
      as a result of it, because it prevented the power domain containing
      the I2C controller from being turned off when some children of that
      controller were enabled to wake up the system, although the
      controller was not necessary for them to signal wakeup.
      
      To fix this issue use the observation that devices whose
      power.ignore_children flag is set for runtime PM should be treated
      analogously during system suspend.  Namely, they shouldn't be
      included in wakeup paths going through their children.  Since the
      SH7372 I2C controller's power.ignore_children flag is set, doing so
      will restore the previous behavior of that SOC.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8b258cc8
  16. 17 11月, 2011 3 次提交