1. 14 12月, 2016 1 次提交
    • N
      block_dev: don't test bdev->bd_contains when it is not stable · bcc7f5b4
      NeilBrown 提交于
      bdev->bd_contains is not stable before calling __blkdev_get().
      When __blkdev_get() is called on a parition with ->bd_openers == 0
      it sets
        bdev->bd_contains = bdev;
      which is not correct for a partition.
      After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
      and then ->bd_contains is stable.
      
      When FMODE_EXCL is used, blkdev_get() calls
         bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()
      
      This call happens before __blkdev_get() is called, so ->bd_contains
      is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
      It currently tries to use it, and this can lead to a BUG_ON().
      
      This happens when a whole device is already open with a bd_holder (in
      use by dm in my particular example) and two threads race to open a
      partition of that device for the first time, one opening with O_EXCL and
      one without.
      
      The thread that doesn't use O_EXCL gets through blkdev_get() to
      __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
      
      Immediately thereafter the other thread, using FMODE_EXCL, calls
      bd_start_claiming() from blkdev_get().  This should fail because the
      whole device has a holder, but because bdev->bd_contains == bdev
      bd_may_claim() incorrectly reports success.
      This thread continues and blocks on bd_mutex.
      
      The first thread then sets bdev->bd_contains correctly and drops the mutex.
      The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
      again in:
      			BUG_ON(!bd_may_claim(bdev, whole, holder));
      The BUG_ON fires.
      
      Fix this by removing the dependency on ->bd_contains in
      bd_may_claim().  As bd_may_claim() has direct access to the whole
      device, it can simply test if the target bdev is the whole device.
      
      Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block")
      Cc: stable@vger.kernel.org (v2.6.35+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bcc7f5b4
  2. 13 12月, 2016 24 次提交
  3. 08 12月, 2016 1 次提交
    • J
      ceph: don't set req->r_locked_dir in ceph_d_revalidate · c3f4688a
      Jeff Layton 提交于
      This function sets req->r_locked_dir which is supposed to indicate to
      ceph_fill_trace that the parent's i_rwsem is locked for write.
      Unfortunately, there is no guarantee that the dir will be locked when
      d_revalidate is called, so we really don't want ceph_fill_trace to do
      any dcache manipulation from this context. Clear req->r_locked_dir since
      it's clearly not safe to do that.
      
      What we really want to know with d_revalidate is whether the dentry
      still points to the same inode. ceph_fill_trace installs a pointer to
      the inode in req->r_target_inode, so we can just compare that to
      d_inode(dentry) to see if it's the same one after the lookup.
      
      Also, since we aren't generally interested in the parent here, we can
      switch to using a GETATTR to hint that to the MDS, which also means that
      we only need to reserve one cap.
      
      Finally, just remove the d_unhashed check. That's really outside the
      purview of a filesystem's d_revalidate. If the thing became unhashed
      while we're checking it, then that's up to the VFS to handle anyway.
      
      Fixes: 200fd27c ("ceph: use lookup request to revalidate dentry")
      Link: http://tracker.ceph.com/issues/18041Reported-by: NDonatas Abraitis <donatas.abraitis@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c3f4688a
  4. 06 12月, 2016 1 次提交
    • M
      fuse: fix clearing suid, sgid for chown() · c01638f5
      Miklos Szeredi 提交于
      Basically, the pjdfstests set the ownership of a file to 06555, and then
      chowns it (as root) to a new uid/gid. Prior to commit a09f99ed ("fuse:
      fix killing s[ug]id in setattr"), fuse would send down a setattr with both
      the uid/gid change and a new mode.  Now, it just sends down the uid/gid
      change.
      
      Technically this is NOTABUG, since POSIX doesn't _require_ that we clear
      these bits for a privileged process, but Linux (wisely) has done that and I
      think we don't want to change that behavior here.
      
      This is caused by the use of should_remove_suid(), which will always return
      0 when the process has CAP_FSETID.
      
      In fact we really don't need to be calling should_remove_suid() at all,
      since we've already been indicated that we should remove the suid, we just
      don't want to use a (very) stale mode for that.
      
      This patch should fix the above as well as simplify the logic.
      
      Reported-by: Jeff Layton <jlayton@redhat.com> 
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: a09f99ed ("fuse: fix killing s[ug]id in setattr")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      c01638f5
  5. 01 12月, 2016 2 次提交
    • R
      block: protect iterate_bdevs() against concurrent close · af309226
      Rabin Vincent 提交于
      If a block device is closed while iterate_bdevs() is handling it, the
      following NULL pointer dereference occurs because bdev->b_disk is NULL
      in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
      turn called by the mapping_cap_writeback_dirty() call in
      __filemap_fdatawrite_range()):
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
       IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
       PGD 9e62067 PUD 9ee8067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
       task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
       RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
       RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
       RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
       RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
       R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
       FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
       Stack:
        ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
        0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
        ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
       Call Trace:
        [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
        [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
        [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
        [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
        [<ffffffff811b2763>] sys_sync+0x63/0x90
        [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
       Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
       RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
        RSP <ffff880009f5fe68>
       CR2: 0000000000000508
       ---[ end trace 2487336ceb3de62d ]---
      
      The crash is easily reproducible by running the following command, if an
      msleep(100) is inserted before the call to func() in iterate_devs():
      
       while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
      
      Fix it by holding the bd_mutex across the func() call and only calling
      func() if the bdev is opened.
      
      Cc: stable@vger.kernel.org
      Fixes: 5c0d6b60 ("vfs: Create function for iterating over block devices")
      Reported-and-tested-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: NRabin Vincent <rabinv@axis.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      af309226
    • M
      isofs: add KERN_CONT to printing of ER records · a107bf8b
      Mike Rapoport 提交于
      The ER records are printed without explicit log level presuming line
      continuation until "\n".  After the commit 4bcc595c (printk:
      reinstate KERN_CONT for printing continuation lines), the ER records are
      printed a character per line.
      
      Adding KERN_CONT to appropriate printk statements restores the printout
      behavior.
      Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a107bf8b
  6. 30 11月, 2016 1 次提交
  7. 29 11月, 2016 4 次提交
    • M
      ovl: fix d_real() for stacked fs · c4fcfc16
      Miklos Szeredi 提交于
      Handling of recursion in d_real() is completely broken.  Recursion is only
      done in the 'inode != NULL' case.  But when opening the file we have
      'inode == NULL' hence d_real() will return an overlay dentry.  This won't
      work since overlayfs doesn't define its own file operations, so all file
      ops will fail.
      
      Fix by doing the recursion first and the check against the inode second.
      
      Bash script to reproduce the issue written by Quentin:
      
       - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
      tmpdir=$(mktemp -d)
      pushd ${tmpdir}
      
      mkdir -p {upper,lower,work}
      echo -n 'rocks' > lower/ksplice
      mount -t overlay level_zero upper -o lowerdir=lower,upperdir=upper,workdir=work
      cat upper/ksplice
      
      tmpdir2=$(mktemp -d)
      pushd ${tmpdir2}
      
      mkdir -p {upper,work}
      mount -t overlay level_one upper -o lowerdir=${tmpdir}/upper,upperdir=upper,workdir=work
      ls -l upper/ksplice
      cat upper/ksplice
       - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - 
      Reported-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 2d902671 ("vfs: merge .d_select_inode() into .d_real()")
      Cc: <stable@vger.kernel.org> # v4.8+
      c4fcfc16
    • E
      CIFS: iterate over posix acl xattr entry correctly in ACL_to_cifs_posix() · ae9ebe7c
      Eryu Guan 提交于
      Commit 2211d5ba ("posix_acl: xattr representation cleanups")
      removes the typedefs and the zero-length a_entries array in struct
      posix_acl_xattr_header, and uses bare struct posix_acl_xattr_header
      and struct posix_acl_xattr_entry directly.
      
      But it failed to iterate over posix acl slots when converting posix
      acls to CIFS format, which results in several test failures in
      xfstests (generic/053 generic/105) when testing against a samba v1
      server, starting from v4.9-rc1 kernel. e.g.
      
        [root@localhost xfstests]# diff -u tests/generic/105.out /root/xfstests/results//generic/105.out.bad
        --- tests/generic/105.out       2016-09-19 16:33:28.577962575 +0800
        +++ /root/xfstests/results//generic/105.out.bad 2016-10-22 15:41:15.201931110 +0800
        @@ -1,3 +1,4 @@
         QA output created by 105
         -rw-r--r-- root
        +setfacl: subdir: Invalid argument
         -rw-r--r-- root
      
      Fix it by introducing a new "ace" var, like what
      cifs_copy_posix_acl() does, and iterating posix acl xattr entries
      over it in the for loop.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      ae9ebe7c
    • S
      Call echo service immediately after socket reconnect · b8c60012
      Sachin Prabhu 提交于
      Commit 4fcd1813 ("Fix reconnect to not defer smb3 session reconnect
      long after socket reconnect") changes the behaviour of the SMB2 echo
      service and causes it to renegotiate after a socket reconnect. However
      under default settings, the echo service could take up to 120 seconds to
      be scheduled.
      
      The patch forces the echo service to be called immediately resulting a
      negotiate call being made immediately on reconnect.
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      b8c60012
    • S
      CIFS: Fix BUG() in calc_seckey() · 5f4b5569
      Sachin Prabhu 提交于
      Andy Lutromirski's new virtually mapped kernel stack allocations moves
      kernel stacks the vmalloc area. This triggers the bug
       kernel BUG at ./include/linux/scatterlist.h:140!
      at calc_seckey()->sg_init()
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      5f4b5569
  8. 27 11月, 2016 1 次提交
  9. 23 11月, 2016 1 次提交
    • A
      NFSv4.x: hide array-bounds warning · d55b352b
      Arnd Bergmann 提交于
      A correct bugfix introduced a harmless warning that shows up with gcc-7:
      
      fs/nfs/callback.c: In function 'nfs_callback_up':
      fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
      
      What happens here is that the 'minorversion == 0' check tells the
      compiler that we assume minorversion can be something other than 0,
      but when CONFIG_NFS_V4_1 is disabled that would be invalid and
      result in an out-of-bounds access.
      
      The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
      really can't happen, which makes the code slightly smaller and also
      avoids the warning.
      
      The bugfix that introduced the warning is marked for stable backports,
      we want this one backported to the same releases.
      
      Fixes: 98b0f80c ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
      Cc: stable@vger.kernel.org # v3.7+
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d55b352b
  10. 22 11月, 2016 4 次提交