1. 23 8月, 2011 8 次提交
    • D
      KEYS: Correctly destroy key payloads when their keytype is removed · 0c061b57
      David Howells 提交于
      unregister_key_type() has code to mark a key as dead and make it unavailable in
      one loop and then destroy all those unavailable key payloads in the next loop.
      However, the loop to mark keys dead renders the key undetectable to the second
      loop by changing the key type pointer also.
      
      Fix this by the following means:
      
       (1) The key code has two garbage collectors: one deletes unreferenced keys and
           the other alters keyrings to delete links to old dead, revoked and expired
           keys.  They can end up holding each other up as both want to scan the key
           serial tree under spinlock.  Combine these into a single routine.
      
       (2) Move the dead key marking, dead link removal and dead key removal into the
           garbage collector as a three phase process running over the three cycles
           of the normal garbage collection procedure.  This is tracked by the
           KEY_GC_REAPING_DEAD_1, _2 and _3 state flags.
      
           unregister_key_type() then just unlinks the key type from the list, wakes
           up the garbage collector and waits for the third phase to complete.
      
       (3) Downgrade the key types sem in unregister_key_type() once it has deleted
           the key type from the list so that it doesn't block the keyctl() syscall.
      
       (4) Dead keys that cannot be simply removed in the third phase have their
           payloads destroyed with the key's semaphore write-locked to prevent
           interference by the keyctl() syscall.  There should be no in-kernel users
           of dead keys of that type by the point of unregistration, though keyctl()
           may be holding a reference.
      
       (5) Only perform timer recalculation in the GC if the timer actually expired.
           If it didn't, we'll get another cycle when it goes off - and if the key
           that actually triggered it has been removed, it's not a problem.
      
       (6) Only garbage collect link if the timer expired or if we're doing dead key
           clean up phase 2.
      
       (7) As only key_garbage_collector() is permitted to use rb_erase() on the key
           serial tree, it doesn't need to revalidate its cursor after dropping the
           spinlock as the node the cursor points to must still exist in the tree.
      
       (8) Drop the spinlock in the GC if there is contention on it or if we need to
           reschedule.  After dealing with that, get the spinlock again and resume
           scanning.
      
      This has been tested in the following ways:
      
       (1) Run the keyutils testsuite against it.
      
       (2) Using the AF_RXRPC and RxKAD modules to test keytype removal:
      
           Load the rxrpc_s key type:
      
      	# insmod /tmp/af-rxrpc.ko
      	# insmod /tmp/rxkad.ko
      
           Create a key (http://people.redhat.com/~dhowells/rxrpc/listen.c):
      
      	# /tmp/listen &
      	[1] 8173
      
           Find the key:
      
      	# grep rxrpc_s /proc/keys
      	091086e1 I--Q--     1 perm 39390000     0     0 rxrpc_s   52:2
      
           Link it to a session keyring, preferably one with a higher serial number:
      
      	# keyctl link 0x20e36251 @s
      
           Kill the process (the key should remain as it's linked to another place):
      
      	# fg
      	/tmp/listen
      	^C
      
           Remove the key type:
      
      	rmmod rxkad
      	rmmod af-rxrpc
      
           This can be made a more effective test by altering the following part of
           the patch:
      
      	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2)) {
      		/* Make sure everyone revalidates their keys if we marked a
      		 * bunch as being dead and make sure all keyring ex-payloads
      		 * are destroyed.
      		 */
      		kdebug("dead sync");
      		synchronize_rcu();
      
           To call synchronize_rcu() in GC phase 1 instead.  That causes that the
           keyring's old payload content to hang around longer until it's RCU
           destroyed - which usually happens after GC phase 3 is complete.  This
           allows the destroy_dead_key branch to be tested.
      Reported-by: NBenjamin Coddington <bcodding@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      0c061b57
    • D
      KEYS: The dead key link reaper should be non-reentrant · d199798b
      David Howells 提交于
      The dead key link reaper should be non-reentrant as it relies on global state
      to keep track of where it's got to when it returns to the work queue manager to
      give it some air.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      d199798b
    • D
      KEYS: Make the key reaper non-reentrant · b072e9bc
      David Howells 提交于
      Make the key reaper non-reentrant by sticking it on the appropriate system work
      queue when we queue it.  This will allow it to have global state and drop
      locks.  It should probably be non-reentrant already as it may spend a long time
      holding the key serial spinlock, and so multiple entrants can spend long
      periods of time just sitting there spinning, waiting to get the lock.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      b072e9bc
    • D
      KEYS: Move the unreferenced key reaper to the keys garbage collector file · 8bc16dea
      David Howells 提交于
      Move the unreferenced key reaper function to the keys garbage collector file
      as that's a more appropriate place with the dead key link reaper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      8bc16dea
    • D
      CRED: Fix prepare_kernel_cred() to provide a new thread_group_cred struct · 012146d0
      David Howells 提交于
      Fix prepare_kernel_cred() to provide a new, separate thread_group_cred struct
      otherwise when using request_key() ____call_usermodehelper() calls
      umh_keys_init() with the new creds pointing to init_tgcred, which
      umh_keys_init() then blithely alters.
      
      The problem can be demonstrated by:
      
      	# keyctl request2 user a debug:a @s
      	249681132
      	# grep req /proc/keys
      	079906a5 I--Q--     1 perm 1f3f0000     0     0 keyring   _req.249681132: 1/4
      	38ef1626 IR----     1 expd 0b010000     0     0 .request_ key:ee1d4ec pid:4371 ci:1
      
      The keyring _req.XXXX should have gone away, but something (init_tgcred) is
      pinning it.
      
      That key actually requested can then be removed and a new one created:
      
      	# keyctl unlink 249681132
      	1 links removed
      	[root@andromeda ~]# grep req /proc/keys
      	116cecac IR----     1 expd 0b010000     0     0 .request_ key:eeb4911 pid:4379 ci:1
      	36d1cbf8 I--Q--     1 perm 1f3f0000     0     0 keyring   _req.250300689: 1/4
      
      which causes the old _req keyring to go away and a new one to take its place.
      
      This is a consequence of the changes in:
      
      	commit 87966996
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Fri Jun 17 11:25:59 2011 +0100
      	KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring
      
      and:
      
      	commit 17f60a7d
      	Author: Eric Paris <eparis@redhat.com>
      	Date:   Fri Apr 1 17:07:50 2011 -0400
      	capabilites: allow the application of capability limits to usermode helpers
      
      After this patch is applied, the _req keyring and the .request_key key are
      cleaned up.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      012146d0
    • D
      KEYS: __key_link() should use the RCU deref wrapper for keyring payloads · 6d528b08
      David Howells 提交于
      __key_link() should use the RCU deref wrapper rcu_dereference_locked_keyring()
      for accessing keyring payloads rather than calling rcu_dereference_protected()
      directly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      6d528b08
    • D
      KEYS: keyctl_get_keyring_ID() should create a session keyring if create flag set · 3ecf1b4f
      David Howells 提交于
      The keyctl call:
      
      	keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1)
      
      should create a session keyring if the process doesn't have one of its own
      because the create flag argument is set - rather than subscribing to and
      returning the user-session keyring as:
      
      	keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0)
      
      will do.
      
      This can be tested by commenting out pam_keyinit in the /etc/pam.d files and
      running the following program a couple of times in a row:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <keyutils.h>
      	int main(int argc, char *argv[])
      	{
      		key_serial_t uk, usk, sk, nsk;
      		uk  = keyctl_get_keyring_ID(KEY_SPEC_USER_KEYRING, 0);
      		usk = keyctl_get_keyring_ID(KEY_SPEC_USER_SESSION_KEYRING, 0);
      		sk  = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0);
      		nsk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1);
      		printf("keys: %08x %08x %08x %08x\n", uk, usk, sk, nsk);
      		return 0;
      	}
      
      Without this patch, I see:
      
      	keys: 3975ddc7 119c0c66 119c0c66 119c0c66
      	keys: 3975ddc7 119c0c66 119c0c66 119c0c66
      
      With this patch, I see:
      
      	keys: 2cb4997b 34112878 34112878 17db2ce3
      	keys: 2cb4997b 34112878 34112878 39f3c73e
      
      As can be seen, the session keyring starts off the same as the user-session
      keyring each time, but with the patch a new session keyring is created when
      the create flag is set.
      Reported-by: NGreg Wettstein <greg@enjellic.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NGreg Wettstein <greg@enjellic.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      3ecf1b4f
    • D
      KEYS: If install_session_keyring() is given a keyring, it should install it · 99599537
      David Howells 提交于
      If install_session_keyring() is given a keyring, it should install it rather
      than just creating a new one anyway.  This was accidentally broken in:
      
      	commit d84f4f99
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Fri Nov 14 10:39:23 2008 +1100
      	Subject: CRED: Inaugurate COW credentials
      
      The impact of that commit is that pam_keyinit no longer works correctly if
      'force' isn't specified against a login process. This is because:
      
      	keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0)
      
      now always creates a new session keyring and thus the check whether the session
      keyring and the user-session keyring are the same is always false.  This leads
      pam_keyinit to conclude that a session keyring is installed and it shouldn't be
      revoked by pam_keyinit here if 'revoke' is specified.
      
      Any system that specifies 'force' against pam_keyinit in the PAM configuration
      files for login methods (login, ssh, su -l, kdm, etc.) is not affected since
      that bypasses the broken check and forces the creation of a new session keyring
      anyway (for which the revoke flag is not cleared) - and any subsequent call to
      pam_keyinit really does have a session keyring already installed, and so the
      check works correctly there.
      
      Reverting to the previous behaviour will cause the kernel to subscribe the
      process to the user-session keyring as its session keyring if it doesn't have a
      session keyring of its own.  pam_keyinit will detect this and install a new
      session keyring anyway (and won't clear the revert flag).
      
      This can be tested by commenting out pam_keyinit in the /etc/pam.d files and
      running the following program a couple of times in a row:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <keyutils.h>
      	int main(int argc, char *argv[])
      	{
      		key_serial_t uk, usk, sk;
      		uk = keyctl_get_keyring_ID(KEY_SPEC_USER_KEYRING, 0);
      		usk = keyctl_get_keyring_ID(KEY_SPEC_USER_SESSION_KEYRING, 0);
      		sk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0);
      		printf("keys: %08x %08x %08x\n", uk, usk, sk);
      		return 0;
      	}
      
      Without the patch, I see:
      
      	keys: 3884e281 24c4dfcf 22825f8e
      	keys: 3884e281 24c4dfcf 068772be
      
      With the patch, I see:
      
      	keys: 26be9c83 0e755ce0 0e755ce0
      	keys: 26be9c83 0e755ce0 0e755ce0
      
      As can be seen, with the patch, the session keyring is the same as the
      user-session keyring each time; without the patch a new session keyring is
      generated each time.
      Reported-by: NGreg Wettstein <greg@enjellic.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NGreg Wettstein <greg@enjellic.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      99599537
  2. 18 8月, 2011 2 次提交
  3. 17 8月, 2011 1 次提交
  4. 16 8月, 2011 2 次提交
  5. 12 8月, 2011 2 次提交
    • Z
      capabilities: do not grant full privs for setuid w/ file caps + no effective caps · 4d49f671
      Zhi Li 提交于
      A task (when !SECURE_NOROOT) which executes a setuid-root binary will
      obtain root privileges while executing that binary.  If the binary also
      has effective capabilities set, then only those capabilities will be
      granted.  The rationale is that the same binary can carry both setuid-root
      and the minimal file capability set, so that on a filesystem not
      supporting file caps the binary can still be executed with privilege,
      while on a filesystem supporting file caps it will run with minimal
      privilege.
      
      This special case currently does NOT happen if there are file capabilities
      but no effective capabilities.  Since capability-aware programs can very
      well start with empty pE but populated pP and move those caps to pE when
      needed.  In other words, if the file has file capabilities but NOT
      effective capabilities, then we should do the same thing as if there
      were file capabilities, and not grant full root privileges.
      
      This patchset does that.
      
      (Changelog by Serge Hallyn).
      Signed-off-by: NZhi Li <lizhi1215@gmail.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      4d49f671
    • M
      CIFS: remove local xattr definitions · f995e740
      Mimi Zohar 提交于
      Local XATTR_TRUSTED_PREFIX_LEN and XATTR_SECURITY_PREFIX_LEN definitions
      redefined ones in 'linux/xattr.h'. This was caused by commit 9d8f13ba
      ("security: new security_inode_init_security API adds function callback")
      including 'linux/xattr.h' in 'linux/security.h'.
      
      In file included from include/linux/security.h:39,
                       from include/net/sock.h:54,
                       from fs/cifs/cifspdu.h:25,
                       from fs/cifs/xattr.c:26:
      
      This patch removes the local definitions.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      f995e740
  6. 11 8月, 2011 2 次提交
  7. 09 8月, 2011 2 次提交
  8. 08 8月, 2011 9 次提交
  9. 07 8月, 2011 12 次提交
    • A
      Fix POSIX ACL permission check · 206b1d09
      Ari Savolainen 提交于
      After commit 3567866b: "RCUify freeing acls, let check_acl() go ahead in
      RCU mode if acl is cached" posix_acl_permission is being called with an
      unsupported flag and the permission check fails. This patch fixes the issue.
      Signed-off-by: NAri Savolainen <ari.m.savolainen@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      206b1d09
    • L
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd · c2f340a6
      Linus Torvalds 提交于
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        ore: Make ore its own module
        exofs: Rename raid engine from exofs/ios.c => ore
        exofs: ios: Move to a per inode components & device-table
        exofs: Move exofs specific osd operations out of ios.c
        exofs: Add offset/length to exofs_get_io_state
        exofs: Fix truncate for the raid-groups case
        exofs: Small cleanup of exofs_fill_super
        exofs: BUG: Avoid sbi realloc
        exofs: Remove pnfs-osd private definitions
        nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
      c2f340a6
    • L
      vfs: optimize inode cache access patterns · 3ddcd056
      Linus Torvalds 提交于
      The inode structure layout is largely random, and some of the vfs paths
      really do care.  The path lookup in particular is already quite D$
      intensive, and profiles show that accessing the 'inode->i_op->xyz'
      fields is quite costly.
      
      We already optimized the dcache to not unnecessarily load the d_op
      structure for members that are often NULL using the DCACHE_OP_xyz bits
      in dentry->d_flags, and this does something very similar for the inode
      ops that are used during pathname lookup.
      
      It also re-orders the fields so that the fields accessed by 'stat' are
      together at the beginning of the inode structure, and roughly in the
      order accessed.
      
      The effect of this seems to be in the 1-2% range for an empty kernel
      "make -j" run (which is fairly kernel-intensive, mostly in filename
      lookup), so it's visible.  The numbers are fairly noisy, though, and
      likely depend a lot on exact microarchitecture.  So there's more tuning
      to be done.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ddcd056
    • L
      vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e
      Linus Torvalds 提交于
      Gcc tends to generate better code with small integers, including the
      DCACHE_xyz flag tests - so move the common ones to be first in the list.
      Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
      DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
      tree.
      
      And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
      common case to be a nice straight-line fall-through.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      830c0f0e
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7cd4767e
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: Compute protocol sequence numbers and fragment IDs using MD5.
        crypto: Move md5_transform to lib/md5.c
      7cd4767e
    • B
      ore: Make ore its own module · cf283ade
      Boaz Harrosh 提交于
      Export everything from ore need exporting. Change Kbuild and Kconfig
      to build ore.ko as an independent module. Import ore from exofs
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      cf283ade
    • B
      exofs: Rename raid engine from exofs/ios.c => ore · 8ff660ab
      Boaz Harrosh 提交于
      ORE stands for "Objects Raid Engine"
      
      This patch is a mechanical rename of everything that was in ios.c
      and its API declaration to an ore.c and an osd_ore.h header. The ore
      engine will later be used by the pnfs objects layout driver.
      
      * File ios.c => ore.c
      
      * Declaration of types and API are moved from exofs.h to a new
        osd_ore.h
      
      * All used types are prefixed by ore_ from their exofs_ name.
      
      * Shift includes from exofs.h to osd_ore.h so osd_ore.h is
        independent, include it from exofs.h.
      
      Other than a pure rename there are no other changes. Next patch
      will move the ore into it's own module and will export the API
      to be used by exofs and later the layout driver
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      8ff660ab
    • B
      exofs: ios: Move to a per inode components & device-table · 9e9db456
      Boaz Harrosh 提交于
      Exofs raid engine was saving on memory space by having a single layout-info,
      single pid, and a single device-table, global to the filesystem. Then passing
      a credential and object_id info at the io_state level, private for each
      inode. It would also devise this contraption of rotating the device table
      view for each inode->ino to spread out the device usage.
      
      This is not compatible with the pnfs-objects standard, demanding that
      each inode can have it's own layout-info, device-table, and each object
      component it's own pid, oid and creds.
      
      So: Bring exofs raid engine to be usable for generic pnfs-objects use by:
      
      * Define an exofs_comp structure that holds obj_id and credential info.
      
      * Break up exofs_layout struct to an exofs_components structure that holds a
        possible array of exofs_comp and the array of devices + the size of the
        arrays.
      
      * Add a "comps" parameter to get_io_state() that specifies the ids creds
        and device array to use for each IO.
      
        This enables to keep the layout global, but the device-table view, creds
        and IDs at the inode level. It only adds two 64bit to each inode, since
        some of these members already existed in another form.
      
      * ios raid engine now access layout-info and comps-info through the passed
        pointers. Everything is pre-prepared by caller for generic access of
        these structures and arrays.
      
      At the exofs Level:
      
      * Super block holds an exofs_components struct that holds the device
        array, previously in layout. The devices there are in device-table
        order. The device-array is twice bigger and repeats the device-table
        twice so now each inode's device array can point to a random device
        and have a round-robin view of the table, making it compatible to
        previous exofs versions.
      
      * Each inode has an exofs_components struct that is initialized at
        load time, with it's own view of the device table IDs and creds.
        When doing IO this gets passed to the io_state together with the
        layout.
      
      While preforming this change. Bugs where found where credentials with the
      wrong IDs where used to access the different SB objects (super.c). As well
      as some dead code. It was never noticed because the target we use does not
      check the credentials.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      9e9db456
    • B
      exofs: Move exofs specific osd operations out of ios.c · 85e44df4
      Boaz Harrosh 提交于
      ios.c will be moving to an external library, for use by the
      objects-layout-driver. Remove from it some exofs specific functions.
      
      Also g_attr_logical_length is used both by inode.c and ios.c
      move definition to the later, to keep it independent
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      85e44df4
    • B
      exofs: Add offset/length to exofs_get_io_state · e1042ba0
      Boaz Harrosh 提交于
      In future raid code we will need to know the IO offset/length
      and if it's a read or write to determine some of the array
      sizes we'll need.
      
      So add a new exofs_get_rw_state() API for use when
      writeing/reading. All other simple cases are left using the
      old way.
      
      The major change to this is that now we need to call
      exofs_get_io_state later at inode.c::read_exec and
      inode.c::write_exec when we actually know these things. So this
      patch is kept separate so I can test things apart from other
      changes.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      e1042ba0
    • D
      net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea
      David S. Miller 提交于
      Computers have become a lot faster since we compromised on the
      partial MD4 hash which we use currently for performance reasons.
      
      MD5 is a much safer choice, and is inline with both RFC1948 and
      other ISS generators (OpenBSD, Solaris, etc.)
      
      Furthermore, only having 24-bits of the sequence number be truly
      unpredictable is a very serious limitation.  So the periodic
      regeneration and 8-bit counter have been removed.  We compute and
      use a full 32-bit sequence number.
      
      For ipv6, DCCP was found to use a 32-bit truncated initial sequence
      number (it needs 43-bits) and that is fixed here as well.
      Reported-by: NDan Kaminsky <dan@doxpara.com>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5714ea
    • D
      crypto: Move md5_transform to lib/md5.c · bc0b96b5
      David S. Miller 提交于
      We are going to use this for TCP/IP sequence number and fragment ID
      generation.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc0b96b5