1. 09 7月, 2012 3 次提交
  2. 02 7月, 2012 4 次提交
  3. 20 6月, 2012 1 次提交
  4. 14 6月, 2012 4 次提交
  5. 22 3月, 2012 1 次提交
    • M
      cpuset: mm: reduce large amounts of memory barrier related damage v3 · cc9a6c87
      Mel Gorman 提交于
      Commit c0ff7453 ("cpuset,mm: fix no node to alloc memory when
      changing cpuset's mems") wins a super prize for the largest number of
      memory barriers entered into fast paths for one commit.
      
      [get|put]_mems_allowed is incredibly heavy with pairs of full memory
      barriers inserted into a number of hot paths.  This was detected while
      investigating at large page allocator slowdown introduced some time
      after 2.6.32.  The largest portion of this overhead was shown by
      oprofile to be at an mfence introduced by this commit into the page
      allocator hot path.
      
      For extra style points, the commit introduced the use of yield() in an
      implementation of what looks like a spinning mutex.
      
      This patch replaces the full memory barriers on both read and write
      sides with a sequence counter with just read barriers on the fast path
      side.  This is much cheaper on some architectures, including x86.  The
      main bulk of the patch is the retry logic if the nodemask changes in a
      manner that can cause a false failure.
      
      While updating the nodemask, a check is made to see if a false failure
      is a risk.  If it is, the sequence number gets bumped and parallel
      allocators will briefly stall while the nodemask update takes place.
      
      In a page fault test microbenchmark, oprofile samples from
      __alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
      actual results were
      
                                   3.3.0-rc3          3.3.0-rc3
                                   rc3-vanilla        nobarrier-v2r1
          Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
          Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
          Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
          Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
          Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
          Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
          Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
          Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
          Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
          Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
          Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
          Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
          Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
          Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
          Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
          MMTests Statistics: duration
          Sys Time Running Test (seconds)             135.68    132.17
          User+Sys Time Running Test (seconds)         164.2    160.13
          Total Elapsed Time (seconds)                123.46    120.87
      
      The overall improvement is small but the System CPU time is much
      improved and roughly in correlation to what oprofile reported (these
      performance figures are without profiling so skew is expected).  The
      actual number of page faults is noticeably improved.
      
      For benchmarks like kernel builds, the overall benefit is marginal but
      the system CPU time is slightly reduced.
      
      To test the actual bug the commit fixed I opened two terminals.  The
      first ran within a cpuset and continually ran a small program that
      faulted 100M of anonymous data.  In a second window, the nodemask of the
      cpuset was continually randomised in a loop.
      
      Without the commit, the program would fail every so often (usually
      within 10 seconds) and obviously with the commit everything worked fine.
      With this patch applied, it also worked fine so the fix should be
      functionally equivalent.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc9a6c87
  6. 10 3月, 2012 1 次提交
  7. 23 1月, 2012 1 次提交
  8. 10 1月, 2012 1 次提交
  9. 05 12月, 2011 1 次提交
  10. 17 11月, 2011 1 次提交
  11. 11 11月, 2011 2 次提交
  12. 28 9月, 2011 1 次提交
    • V
      mm: restrict access to slab files under procfs and sysfs · ab067e99
      Vasiliy Kulikov 提交于
      Historically /proc/slabinfo and files under /sys/kernel/slab/* have
      world read permissions and are accessible to the world.  slabinfo
      contains rather private information related both to the kernel and
      userspace tasks.  Depending on the situation, it might reveal either
      private information per se or information useful to make another
      targeted attack.  Some examples of what can be learned by
      reading/watching for /proc/slabinfo entries:
      
      1) dentry (and different *inode*) number might reveal other processes fs
      activity.  The number of dentry "active objects" doesn't strictly show
      file count opened/touched by a process, however, there is a good
      correlation between them.  The patch "proc: force dcache drop on
      unauthorized access" relies on the privacy of dentry count.
      
      2) different inode entries might reveal the same information as (1), but
      these are more fine granted counters.  If a filesystem is mounted in a
      private mount point (or even a private namespace) and fs type differs from
      other mounted fs types, fs activity in this mount point/namespace is
      revealed.  If there is a single ecryptfs mount point, the whole fs
      activity of a single user is revealed.  Number of files in ecryptfs
      mount point is a private information per se.
      
      3) fuse_* reveals number of files / fs activity of a user in a user
      private mount point.  It is approx. the same severity as ecryptfs
      infoleak in (2).
      
      4) sysfs_dir_cache similar to (2) reveals devices' addition/removal,
      which can be otherwise hidden by "chmod 0700 /sys/".  With 0444 slabinfo
      the precise number of sysfs files is known to the world.
      
      5) buffer_head might reveal some kernel activity.  With other
      information leaks an attacker might identify what specific kernel
      routines generate buffer_head activity.
      
      6) *kmalloc* infoleaks are very situational.  Attacker should watch for
      the specific kmalloc size entry and filter the noise related to the unrelated
      kernel activity.  If an attacker has relatively silent victim system, he
      might get rather precise counters.
      
      Additional information sources might significantly increase the slabinfo
      infoleak benefits.  E.g. if an attacker knows that the processes
      activity on the system is very low (only core daemons like syslog and
      cron), he may run setxid binaries / trigger local daemon activity /
      trigger network services activity / await sporadic cron jobs activity
      / etc. and get rather precise counters for fs and network activity of
      these privileged tasks, which is unknown otherwise.
      
      Also hiding slabinfo and /sys/kernel/slab/* is a one step to complicate
      exploitation of kernel heap overflows (and possibly, other bugs).  The
      related discussion:
      
      http://thread.gmane.org/gmane.linux.kernel/1108378
      
      To keep compatibility with old permission model where non-root
      monitoring daemon could watch for kernel memleaks though slabinfo one
      should do:
      
          groupadd slabinfo
          usermod -a -G slabinfo $MONITOR_USER
      
      And add the following commands to init scripts (to mountall.conf in
      Ubuntu's upstart case):
      
          chmod g+r /proc/slabinfo /sys/kernel/slab/*/*
          chgrp slabinfo /proc/slabinfo /sys/kernel/slab/*/*
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Reviewed-by: NKees Cook <kees@ubuntu.com>
      Reviewed-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: NChristoph Lameter <cl@gentwo.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      CC: Valdis.Kletnieks@vt.edu
      CC: Linus Torvalds <torvalds@linux-foundation.org>
      CC: Alan Cox <alan@linux.intel.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      ab067e99
  13. 04 8月, 2011 2 次提交
  14. 01 8月, 2011 1 次提交
    • S
      slab: use print_hex_dump · fdde6abb
      Sebastian Andrzej Siewior 提交于
      Less code and the advantage of ascii dump.
      
      before:
      | Slab corruption: names_cache start=c5788000, len=4096
      | 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00
      | 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      | 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff
      | 030: ff ff ff ff e2 b4 17 18 c7 e4 08 06 00 01 08 00
      | 040: 06 04 00 01 e2 b4 17 18 c7 e4 0a 00 00 01 00 00
      | 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b
      
      after:
      | Slab corruption: size-4096 start=c38a9000, len=4096
      | 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00  kk....V...$...*.
      | 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      | 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff  ................
      | 030: ff ff ff ff d2 56 5f aa db 9c 08 06 00 01 08 00  .....V_.........
      | 040: 06 04 00 01 d2 56 5f aa db 9c 0a 00 00 01 00 00  .....V_.........
      | 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b  ........kkkkkkkk
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      fdde6abb
  15. 31 7月, 2011 1 次提交
  16. 28 7月, 2011 1 次提交
  17. 22 7月, 2011 1 次提交
  18. 21 7月, 2011 1 次提交
  19. 18 7月, 2011 1 次提交
    • H
      slab: fix DEBUG_SLAB build · c225150b
      Hugh Dickins 提交于
      Fix CONFIG_SLAB=y CONFIG_DEBUG_SLAB=y build error and warnings.
      
      Now that ARCH_SLAB_MINALIGN defaults to __alignof__(unsigned long long),
      it is always defined (when slab.h included), but cannot be used in #if:
      mm/slab.c: In function `cache_alloc_debugcheck_after':
      mm/slab.c:3156:5: warning: "__alignof__" is not defined
      mm/slab.c:3156:5: error: missing binary operator before token "("
      make[1]: *** [mm/slab.o] Error 1
      
      So just remove the #if and #endif lines, but then 64-bit build warns:
      mm/slab.c: In function `cache_alloc_debugcheck_after':
      mm/slab.c:3156:6: warning: cast from pointer to integer of different size
      mm/slab.c:3158:10: warning: format `%d' expects type `int', but argument
                                  3 has type `long unsigned int'
      Fix those with casts, whatever the actual type of ARCH_SLAB_MINALIGN.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      c225150b
  20. 04 6月, 2011 1 次提交
  21. 21 5月, 2011 1 次提交
    • L
      sanitize <linux/prefetch.h> usage · 268bb0ce
      Linus Torvalds 提交于
      Commit e66eed65 ("list: remove prefetching from regular list
      iterators") removed the include of prefetch.h from list.h, which
      uncovered several cases that had apparently relied on that rather
      obscure header file dependency.
      
      So this fixes things up a bit, using
      
         grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
         grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')
      
      to guide us in finding files that either need <linux/prefetch.h>
      inclusion, or have it despite not needing it.
      
      There are more of them around (mostly network drivers), but this gets
      many core ones.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      268bb0ce
  22. 31 3月, 2011 1 次提交
  23. 23 3月, 2011 1 次提交
  24. 12 3月, 2011 1 次提交
  25. 14 2月, 2011 1 次提交
    • P
      Revert "slab: Fix missing DEBUG_SLAB last user" · 3ff84a7f
      Pekka Enberg 提交于
      This reverts commit 5c5e3b33.
      
      The commit breaks ARM thusly:
      
      | Mount-cache hash table entries: 512
      | slab error in verify_redzone_free(): cache `idr_layer_cache': memory outside object was overwritten
      | Backtrace:
      | [<c0227088>] (dump_backtrace+0x0/0x110) from [<c0431afc>] (dump_stack+0x18/0x1c)
      | [<c0431ae4>] (dump_stack+0x0/0x1c) from [<c0293304>] (__slab_error+0x28/0x30)
      | [<c02932dc>] (__slab_error+0x0/0x30) from [<c0293a74>] (cache_free_debugcheck+0x1c0/0x2b8)
      | [<c02938b4>] (cache_free_debugcheck+0x0/0x2b8) from [<c0293f78>] (kmem_cache_free+0x3c/0xc0)
      | [<c0293f3c>] (kmem_cache_free+0x0/0xc0) from [<c032b1c8>] (ida_get_new_above+0x19c/0x1c0)
      | [<c032b02c>] (ida_get_new_above+0x0/0x1c0) from [<c02af7ec>] (alloc_vfsmnt+0x54/0x144)
      | [<c02af798>] (alloc_vfsmnt+0x0/0x144) from [<c0299830>] (vfs_kern_mount+0x30/0xec)
      | [<c0299800>] (vfs_kern_mount+0x0/0xec) from [<c0299908>] (kern_mount_data+0x1c/0x20)
      | [<c02998ec>] (kern_mount_data+0x0/0x20) from [<c02146c4>] (sysfs_init+0x68/0xc8)
      | [<c021465c>] (sysfs_init+0x0/0xc8) from [<c02137d4>] (mnt_init+0x90/0x1b0)
      | [<c0213744>] (mnt_init+0x0/0x1b0) from [<c0213388>] (vfs_caches_init+0x100/0x140)
      | [<c0213288>] (vfs_caches_init+0x0/0x140) from [<c0208c0c>] (start_kernel+0x2e8/0x368)
      | [<c0208924>] (start_kernel+0x0/0x368) from [<c0208034>] (__enable_mmu+0x0/0x2c)
      | c0113268: redzone 1:0xd84156c5c032b3ac, redzone 2:0xd84156c5635688c0.
      | slab error in cache_alloc_debugcheck_after(): cache `idr_layer_cache': double free, or memory outside object was overwritten
      | ...
      | c011307c: redzone 1:0x9f91102ffffffff, redzone 2:0x9f911029d74e35b
      | slab: Internal list corruption detected in cache 'idr_layer_cache'(24), slabp c0113000(16). Hexdump:
      |
      | 000: 20 4f 10 c0 20 4f 10 c0 7c 00 00 00 7c 30 11 c0
      | 010: 10 00 00 00 10 00 00 00 00 00 c9 17 fe ff ff ff
      | 020: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
      | 030: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
      | 040: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
      | 050: fe ff ff ff fe ff ff ff fe ff ff ff 11 00 00 00
      | 060: 12 00 00 00 13 00 00 00 14 00 00 00 15 00 00 00
      | 070: 16 00 00 00 17 00 00 00 c0 88 56 63
      | kernel BUG at /home/rmk/git/linux-2.6-rmk/mm/slab.c:2928!
      
      Reference: https://lkml.org/lkml/2011/2/7/238
      Cc: <stable@kernel.org> # 2.6.35.y and later
      Reported-and-analyzed-by: NRussell King <rmk@arm.linux.org.uk>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      3ff84a7f
  26. 24 1月, 2011 1 次提交
  27. 15 1月, 2011 1 次提交
  28. 07 1月, 2011 1 次提交
  29. 17 12月, 2010 1 次提交
  30. 15 12月, 2010 1 次提交
    • T
      workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync() · afe2c511
      Tejun Heo 提交于
      cancel_rearming_delayed_work[queue]() has been superceded by
      cancel_delayed_work_sync() quite some time ago.  Convert all the
      in-kernel users.  The conversions are completely equivalent and
      trivial.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: netdev@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: xfs-masters@oss.sgi.com
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: netfilter-devel@vger.kernel.org
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      afe2c511