1. 24 4月, 2009 2 次提交
    • O
      check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ · 437f7fdb
      Oleg Nesterov 提交于
      write_lock(&current->fs->lock) guarantees we can't wrongly miss
      LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
      instead of ->siglock to iterate over the sub-threads. We must see
      all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
      takes fs->lock too.
      
      With or without this patch we can miss the freshly cloned thread
      and set LSM_UNSAFE_SHARE, we don't care.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      [ Fixed lock/unlock typo  - Hugh ]
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      437f7fdb
    • O
      do_execve() must not clear fs->in_exec if it was set by another thread · 8c652f96
      Oleg Nesterov 提交于
      If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
      unconditionally. This is wrong if we race with our sub-thread which
      also does do_execve:
      
      	Two threads T1 and T2 and another process P, all share the same
      	->fs.
      
      	T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
      	->fs is shared, we set LSM_UNSAFE but not ->in_exec.
      
      	P exits and decrements fs->users.
      
      	T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
      	shared, we set fs->in_exec.
      
      	T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
      	return to the user-space.
      
      	T1 does clone(CLONE_FS /* without CLONE_THREAD */).
      
      	T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
      	another process.
      
      Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
      do_execve() to clear ->in_exec depending on res.
      
      When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
      It can be set only if we don't share ->fs with another process, and since
      we already killed all sub-threads either ->in_exec == 0 or we are the
      only user of this ->fs.
      
      Also, we do not need fs->lock to clear fs->in_exec.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c652f96
  2. 23 4月, 2009 7 次提交
    • M
      [S390] /proc/stat idle field for idle cpus · e1c80530
      Martin Schwidefsky 提交于
      The cpu idle field in the output of /proc/stat is too small for cpus
      that have been idle for more than a tick. Add the architecture hook
      arch_idle_time that allows to add the not accounted idle time of a
      sleeping cpu without waking the cpu.
      
      The s390 implementation of arch_idle_time uses the already existing
      s390_idle_data per_cpu variable to find the sleep time of a neighboring
      idle cpu.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e1c80530
    • S
      GFS2: Ensure that the inode goal block settings are updated · d9ba7615
      Steven Whitehouse 提交于
      GFS2 has a goal block associated with each inode indicating the
      search start position for future block allocations (in fact there
      are two, but thats for backward compatibility with GFS1 as they
      are set to identical locations in GFS2).
      
      In some circumstances, depending on the ordering of updates to
      the inode it was possible for the goal block settings to not
      be updated on disk. This patch ensures that the goal block will
      always get updated, thus reducing the potential for searching
      the same (already allocated) blocks again when looking for free
      space during block allocation.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d9ba7615
    • S
      GFS2: Fix bug in block allocation · d8bd504a
      Steven Whitehouse 提交于
      The new bitfit algorithm was counting from the wrong end of
      64 bit words in the bitfield. This fixes it by using __ffs64
      instead of fls64
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8bd504a
    • T
      ext4: Fix potential inode allocation soft lockup in Orlov allocator · b5451f7b
      Theodore Ts'o 提交于
      If the Orlov allocator is having trouble finding an appropriate block
      group, the fallback code could loop forever, causing a soft lockup
      warning in find_group_orlov():
      
      BUG: soft lockup - CPU#0 stuck for 61s! [cp:11728]
           ...
      Pid: 11728, comm: cp Not tainted (2.6.30-rc1-dirty #77) Lenovo          
      EIP: 0060:[<c021650e>] EFLAGS: 00000246 CPU: 0
      EIP is at ext4_get_group_desc+0x54/0x9d
          ...
      Call Trace:
       [<c0218021>] find_group_orlov+0x2ee/0x334
       [<c0120a5f>] ? sched_clock+0x8/0xb
       [<c02188e3>] ext4_new_inode+0x2cf/0xb1a
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b5451f7b
    • T
      ext4: Make the extent validity check more paranoid · e84a26ce
      Theodore Ts'o 提交于
      Instead of just checking that the extent block number is greater or
      equal than s_first_data_block, make sure it it is not pointing into
      the block group descriptors, since that is clearly wrong.  This helps
      prevent filesystem from getting very badly corrupted in case an extent
      block is corrupted.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e84a26ce
    • T
      eCryptfs: Larger buffer for encrypted symlink targets · 3a6b42ca
      Tyler Hicks 提交于
      When using filename encryption with eCryptfs, the value of the symlink
      in the lower filesystem is encrypted and stored as a Tag 70 packet.
      This results in a longer symlink target than if the target value wasn't
      encrypted.
      
      Users were reporting these messages in their syslog:
      
      [ 45.653441] ecryptfs_parse_tag_70_packet: max_packet_size is [56]; real
      packet size is [51]
      [ 45.653444] ecryptfs_decode_and_decrypt_filename: Could not parse tag
      70 packet from filename; copying through filename as-is
      
      This was due to bufsiz, one the arguments in readlink(), being used to
      when allocating the buffer passed to the lower inode's readlink().
      That symlink target may be very large, but when decoded and decrypted,
      could end up being smaller than bufsize.
      
      To fix this, the buffer passed to the lower inode's readlink() will
      always be PATH_MAX in size when filename encryption is enabled.  Any
      necessary truncation occurs after the decoding and decrypting.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      3a6b42ca
    • T
      eCryptfs: Lock lower directory inode mutex during lookup · ca8e34f2
      Tyler Hicks 提交于
      This patch locks the lower directory inode's i_mutex before calling
      lookup_one_len() to find the appropriate dentry in the lower filesystem.
      This bug was found thanks to the warning set in commit 2f9092e1.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      ca8e34f2
  3. 22 4月, 2009 9 次提交
    • T
      eCryptfs: Remove ecryptfs_unlink_sigs warnings · e77cc8d2
      Tyler Hicks 提交于
      A feature was added to the eCryptfs umount helper to automatically
      unlink the keys used for an eCryptfs mount from the kernel keyring upon
      umount.  This patch keeps the unrecognized mount option warnings for
      ecryptfs_unlink_sigs out of the logs.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      e77cc8d2
    • T
      eCryptfs: Fix data corruption when using ecryptfs_passthrough · 13a791b4
      Tyler Hicks 提交于
      ecryptfs_passthrough is a mount option that allows eCryptfs to allow
      data to be written to non-eCryptfs files in the lower filesystem.  The
      passthrough option was causing data corruption due to it not always
      being treated as a non-eCryptfs file.
      
      The first 8 bytes of an eCryptfs file contains the decrypted file size.
      This value was being written to the non-eCryptfs files, too.  Also,
      extra 0x00 characters were being written to make the file size a
      multiple of PAGE_CACHE_SIZE.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      13a791b4
    • T
      eCryptfs: Print FNEK sig properly in /proc/mounts · 3a5203ab
      Tyler Hicks 提交于
      The filename encryption key signature is not properly displayed in
      /proc/mounts.  The "ecryptfs_sig=" mount option name is displayed for
      all global authentication tokens, included those for filename keys.
      
      This patch checks the global authentication token flags to determine if
      the key is a FEKEK or FNEK and prints the appropriate mount option name
      before the signature.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      3a5203ab
    • T
      eCryptfs: NULL pointer dereference in ecryptfs_send_miscdev() · 57ea34d1
      Tyler Hicks 提交于
      If data is NULL, msg_ctx->msg is set to NULL and then dereferenced
      afterwards.  ecryptfs_send_raw_message() is the only place that
      ecryptfs_send_miscdev() is called with data being NULL, but the only
      caller of that function (ecryptfs_process_helo()) is never called.  In
      short, there is currently no way to trigger the NULL pointer
      dereference.
      
      This patch removes the two unused functions and modifies
      ecryptfs_send_miscdev() to remove the NULL dereferences.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      57ea34d1
    • T
      eCryptfs: Copy lower inode attrs before dentry instantiation · ae6e8459
      Tyler Hicks 提交于
      Copies the lower inode attributes to the upper inode before passing the
      upper inode to d_instantiate().  This is important for
      security_d_instantiate().
      
      The problem was discovered by a user seeing SELinux denials like so:
      
      type=AVC msg=audit(1236812817.898:47): avc:  denied  { 0x100000 } for
      pid=3584 comm="httpd" name="testdir" dev=ecryptfs ino=943872
      scontext=root:system_r:httpd_t:s0
      tcontext=root:object_r:httpd_sys_content_t:s0 tclass=file
      
      Notice target class is file while testdir is really a directory,
      confusing the permission translation (0x100000) due to the wrong i_mode.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      ae6e8459
    • T
      bio: use bio_kmalloc() in copy/map functions · a9e9dc24
      Tejun Heo 提交于
      Impact: remove possible deadlock condition
      
      There is no reason to use mempool backed allocation for map functions.
      Also, because kern mapping is used inside LLDs (e.g. for EH), using
      mempool backed allocation can lead to deadlock under extreme
      conditions (mempool already consumed by the time a request reached EH
      and requests are blocked on EH).
      
      Switch copy/map functions to bio_kmalloc().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a9e9dc24
    • T
      bio: fix bio_kmalloc() · 451a9ebf
      Tejun Heo 提交于
      Impact: fix bio_kmalloc() and its destruction path
      
      bio_kmalloc() was broken in two ways.
      
      * bvec_alloc_bs() first allocates bvec using kmalloc() and then
        ignores it and allocates again like non-kmalloc bvecs.
      
      * bio_kmalloc_destructor() didn't check for and free bio integrity
        data.
      
      This patch fixes the above problems.  kmalloc patch is separated out
      from bio_alloc_bioset() and allocates the requested number of bvecs as
      inline bvecs.
      
      * bio_alloc_bioset() no longer takes NULL @bs.  None other than
        bio_kmalloc() used it and outside users can't know how it was
        allocated anyway.
      
      * Define and use BIO_POOL_NONE so that pool index check in
        bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
        there.
      
      * Relocate destructors on top of each allocation function so that how
        they're used is more clear.
      
      Jens Axboe suggested allocating bvecs inline.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      451a9ebf
    • A
      hugetlbfs: return negative error code for bad mount option · c12ddba0
      Akinobu Mita 提交于
      This fixes the following BUG:
      
        # mount -o size=MM -t hugetlbfs none /huge
        hugetlbfs: Bad value 'MM' for mount option 'size=MM'
        ------------[ cut here ]------------
        kernel BUG at fs/super.c:996!
      
      Due to
      
      	BUG_ON(!mnt->mnt_sb);
      
      in vfs_kern_mount().
      
      Also, remove unused #include <linux/quotaops.h>
      
      Cc: William Irwin <wli@holomorphy.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c12ddba0
    • C
      Btrfs: fix btrfs fallocate oops and deadlock · 546888da
      Chris Mason 提交于
      Btrfs fallocate was incorrectly starting a transaction with a lock held
      on the extent_io tree for the file, which could deadlock.  Strictly
      speaking it was using join_transaction which would be safe, but it is better
      to move the transaction outside of the lock.
      
      When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
      being called on an unlocked buffer.  This was triggering an assertion and
      oops because the lock is supposed to be held.
      
      The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
      been run.  btrfs_del_item takes care of dirtying things, so the solution is a
      to skip the btrfs_mark_buffer_dirty call in this case.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      546888da
  4. 21 4月, 2009 22 次提交