1. 02 8月, 2010 18 次提交
  2. 31 7月, 2010 4 次提交
  3. 30 7月, 2010 1 次提交
    • D
      CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials · de09a977
      David Howells 提交于
      It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
      credentials by incrementing their usage count after their replacement by the
      task being accessed.
      
      What happens is that get_task_cred() can race with commit_creds():
      
      	TASK_1			TASK_2			RCU_CLEANER
      	-->get_task_cred(TASK_2)
      	rcu_read_lock()
      	__cred = __task_cred(TASK_2)
      				-->commit_creds()
      				old_cred = TASK_2->real_cred
      				TASK_2->real_cred = ...
      				put_cred(old_cred)
      				  call_rcu(old_cred)
      		[__cred->usage == 0]
      	get_cred(__cred)
      		[__cred->usage == 1]
      	rcu_read_unlock()
      							-->put_cred_rcu()
      							[__cred->usage == 1]
      							panic()
      
      However, since a tasks credentials are generally not changed very often, we can
      reasonably make use of a loop involving reading the creds pointer and using
      atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
      
      If successful, we can safely return the credentials in the knowledge that, even
      if the task we're accessing has released them, they haven't gone to the RCU
      cleanup code.
      
      We then change task_state() in procfs to use get_task_cred() rather than
      calling get_cred() on the result of __task_cred(), as that suffers from the
      same problem.
      
      Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
      tripped when it is noticed that the usage count is not zero as it ought to be,
      for example:
      
      kernel BUG at kernel/cred.c:168!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/kernel/mm/ksm/run
      CPU 0
      Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
      745
      RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
      RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
      RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
      RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
      R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
      FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
      Stack:
       ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
      <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
      <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
      Call Trace:
       [<ffffffff810698cd>] put_cred+0x13/0x15
       [<ffffffff81069b45>] commit_creds+0x16b/0x175
       [<ffffffff8106aace>] set_current_groups+0x47/0x4e
       [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
       [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
      Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
      48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
      04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
      RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
       RSP <ffff88019e7e9eb8>
      ---[ end trace df391256a100ebdd ]---
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de09a977
  4. 29 7月, 2010 2 次提交
  5. 28 7月, 2010 2 次提交
  6. 27 7月, 2010 3 次提交
  7. 25 7月, 2010 1 次提交
  8. 24 7月, 2010 4 次提交
  9. 23 7月, 2010 2 次提交
    • S
      ceph: avoid dcache readdir for snapdir · a0dff78d
      Sage Weil 提交于
      We should always go to the MDS for readdir on the hidden snapdir.  The
      set of snapshots can change at any time; the client can't trust its cache
      for that.
      Signed-off-by: NSage Weil <sage@newdream.net>
      a0dff78d
    • D
      CIFS: Fix a malicious redirect problem in the DNS lookup code · 4c0c03ca
      David Howells 提交于
      Fix the security problem in the CIFS filesystem DNS lookup code in which a
      malicious redirect could be installed by a random user by simply adding a
      result record into one of their keyrings with add_key() and then invoking a
      CIFS CFS lookup [CVE-2010-2524].
      
      This is done by creating an internal keyring specifically for the caching of
      DNS lookups.  To enforce the use of this keyring, the module init routine
      creates a set of override credentials with the keyring installed as the thread
      keyring and instructs request_key() to only install lookup result keys in that
      keyring.
      
      The override is then applied around the call to request_key().
      
      This has some additional benefits when a kernel service uses this module to
      request a key:
      
       (1) The result keys are owned by root, not the user that caused the lookup.
      
       (2) The result keys don't pop up in the user's keyrings.
      
       (3) The result keys don't come out of the quota of the user that caused the
           lookup.
      
      The keyring can be viewed as root by doing cat /proc/keys:
      
      2a0ca6c3 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: 1/4
      
      It can then be listed with 'keyctl list' by root.
      
      	# keyctl list 0x2a0ca6c3
      	1 key in keyring:
      	726766307: --alswrv     0     0 dns_resolver: foo.bar.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-and-Tested-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NSteve French <smfrench@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c0c03ca
  10. 22 7月, 2010 1 次提交
  11. 20 7月, 2010 2 次提交
    • D
      xfs: track AGs with reclaimable inodes in per-ag radix tree · 16fd5367
      Dave Chinner 提交于
      https://bugzilla.kernel.org/show_bug.cgi?id=16348
      
      When the filesystem grows to a large number of allocation groups,
      the summing of recalimable inodes gets expensive. In many cases,
      most AGs won't have any reclaimable inodes and so we are wasting CPU
      time aggregating over these AGs. This is particularly important for
      the inode shrinker that gets called frequently under memory
      pressure.
      
      To avoid the overhead, track AGs with reclaimable inodes in the
      per-ag radix tree so that we can find all the AGs with reclaimable
      inodes via a simple gang tag lookup. This involves setting the tag
      when the first reclaimable inode is tracked in the AG, and removing
      the tag when the last reclaimable inode is removed from the tree.
      Then the summation process becomes a loop walking the radix tree
      summing AGs with the reclaim tag set.
      
      This significantly reduces the overhead of scanning - a 6400 AG
      filesystea now only uses about 25% of a cpu in kswapd while slab
      reclaim progresses instead of being permanently stuck at 100% CPU
      and making little progress. Clean filesystems filesystems will see
      no overhead and the overhead only increases linearly with the number
      of dirty AGs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      16fd5367
    • D
      xfs: convert inode shrinker to per-filesystem contexts · 70e60ce7
      Dave Chinner 提交于
      Now the shrinker passes us a context, wire up a shrinker context per
      filesystem. This allows us to remove the global mount list and the
      locking problems that introduced. It also means that a shrinker call
      does not need to traverse clean filesystems before finding a
      filesystem with reclaimable inodes.  This significantly reduces
      scanning overhead when lots of filesystems are present.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      70e60ce7