1. 27 7月, 2018 7 次提交
  2. 25 7月, 2018 5 次提交
    • K
      cachefiles: Wait rather than BUG'ing on "Unexpected object collision" · c2412ac4
      Kiran Kumar Modukuri 提交于
      If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
      the active object tree, we have been emitting a BUG after logging
      information about it and the new object.
      
      Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
      on the old object (or return an error).  The ACTIVE flag should be cleared
      after it has been removed from the active object tree.  A timeout of 60s is
      used in the wait, so we shouldn't be able to get stuck there.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c2412ac4
    • K
      cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag · 5ce83d4b
      Kiran Kumar Modukuri 提交于
      In cachefiles_mark_object_active(), the new object is marked active and
      then we try to add it to the active object tree.  If a conflicting object
      is already present, we want to wait for that to go away.  After the wait,
      we go round again and try to re-mark the object as being active - but it's
      already marked active from the first time we went through and a BUG is
      issued.
      
      Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
      
      Analysis from Kiran Kumar Modukuri:
      
      [Impact]
      Oops during heavy NFS + FSCache + Cachefiles
      
      CacheFiles: Error: Overlong wait for old active object to go away.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      
      CacheFiles: Error: Object already active kernel BUG at
      fs/cachefiles/namei.c:163!
      
      [Cause]
      In a heavily loaded system with big files being read and truncated, an
      fscache object for a cookie is being dropped and a new object being
      looked. The new object being looked for has to wait for the old object
      to go away before the new object is moved to active state.
      
      [Fix]
      Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
      retrying the object lookup.
      
      [Testcase]
      Have run ~100 hours of NFS stress tests and have not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5ce83d4b
    • K
      fscache: Fix reference overput in fscache_attach_object() error handling · f29507ce
      Kiran Kumar Modukuri 提交于
      When a cookie is allocated that causes fscache_object structs to be
      allocated, those objects are initialised with the cookie pointer, but
      aren't blessed with a ref on that cookie unless the attachment is
      successfully completed in fscache_attach_object().
      
      If attachment fails because the parent object was dying or there was a
      collision, fscache_attach_object() returns without incrementing the cookie
      counter - but upon failure of this function, the object is released which
      then puts the cookie, whether or not a ref was taken on the cookie.
      
      Fix this by taking a ref on the cookie when it is assigned in
      fscache_object_init(), even when we're creating a root object.
      
      
      Analysis from Kiran Kumar:
      
      This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
      
      BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
      
      fscache cookie ref count updated incorrectly during fscache object
      allocation resulting in following Oops.
      
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
      
      [Cause]
      Two threads are trying to do operate on a cookie and two objects.
      
      (1) One thread tries to unmount the filesystem and in process goes over a
          huge list of objects marking them dead and deleting the objects.
          cookie->usage is also decremented in following path:
      
            nfs_fscache_release_super_cookie
             -> __fscache_relinquish_cookie
              ->__fscache_cookie_put
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      (2) A second thread tries to lookup an object for reading data in following
          path:
      
          fscache_alloc_object
          1) cachefiles_alloc_object
              -> fscache_object_init
                 -> assign cookie, but usage not bumped.
          2) fscache_attach_object -> fails in cant_attach_object because the
               cookie's backing object or cookie's->parent object are going away
          3) fscache_put_object
              -> cachefiles_put_object
                ->fscache_object_destroy
                  ->fscache_cookie_put
                     ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      [NOTE from dhowells] It's unclear as to the circumstances in which (2) can
      take place, given that thread (1) is in nfs_kill_super(), however a
      conflicting NFS mount with slightly different parameters that creates a
      different superblock would do it.  A backtrace from Kiran seems to show
      that this is a possibility:
      
          kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
          ...
          RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
          Call Trace:
           __fscache_relinquish_cookie+0x87/0x120 [fscache]
           nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
           nfs_kill_super+0x29/0x40 [nfs]
           deactivate_locked_super+0x48/0x80
           deactivate_super+0x5c/0x60
           cleanup_mnt+0x3f/0x90
           __cleanup_mnt+0x12/0x20
           task_work_run+0x86/0xb0
           exit_to_usermode_loop+0xc2/0xd0
           syscall_return_slowpath+0x4e/0x60
           int_ret_from_sys_call+0x25/0x9f
      
      [Fix] Bump up the cookie usage in fscache_object_init, when it is first
      being assigned a cookie atomically such that the cookie is added and bumped
      up if its refcount is not zero.  Remove the assignment in
      fscache_attach_object().
      
      [Testcase]
      I have run ~100 hours of NFS stress tests and not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: ccc4fc3d ("FS-Cache: Implement the cookie management part of the netfs API")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f29507ce
    • K
      cachefiles: Fix refcounting bug in backing-file read monitoring · 934140ab
      Kiran Kumar Modukuri 提交于
      cachefiles_read_waiter() has the right to access a 'monitor' object by
      virtue of being called under the waitqueue lock for one of the pages in its
      purview.  However, it has no ref on that monitor object or on the
      associated operation.
      
      What it is allowed to do is to move the monitor object to the operation's
      to_do list, but once it drops the work_lock, it's actually no longer
      permitted to access that object.  However, it is trying to enqueue the
      retrieval operation for processing - but it can only do this via a pointer
      in the monitor object, something it shouldn't be doing.
      
      If it doesn't enqueue the operation, the operation may not get processed.
      If the order is flipped so that the enqueue is first, then it's possible
      for the work processor to look at the to_do list before the monitor is
      enqueued upon it.
      
      Fix this by getting a ref on the operation so that we can trust that it
      will still be there once we've added the monitor to the to_do list and
      dropped the work_lock.  The op can then be enqueued after the lock is
      dropped.
      
      The bug can manifest in one of a couple of ways.  The first manifestation
      looks like:
      
       FS-Cache:
       FS-Cache: Assertion failed
       FS-Cache: 6 == 5 is false
       ------------[ cut here ]------------
       kernel BUG at fs/fscache/operation.c:494!
       RIP: 0010:fscache_put_operation+0x1e3/0x1f0
       ...
       fscache_op_work_func+0x26/0x50
       process_one_work+0x131/0x290
       worker_thread+0x45/0x360
       kthread+0xf8/0x130
       ? create_worker+0x190/0x190
       ? kthread_cancel_work_sync+0x10/0x10
       ret_from_fork+0x1f/0x30
      
      This is due to the operation being in the DEAD state (6) rather than
      INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
      fscache_put_operation().
      
      The bug can also manifest like the following:
      
       kernel BUG at fs/fscache/operation.c:69!
       ...
          [exception RIP: fscache_enqueue_operation+246]
       ...
       #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
       #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
       #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
      
      I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
      entirely clear which assertion failed.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: NLei Xue <carmark.dlut@gmail.com>
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Reported-by: NAnthony DeRobertis <aderobertis@metrics.net>
      Reported-by: NNeilBrown <neilb@suse.com>
      Reported-by: NDaniel Axtens <dja@axtens.net>
      Reported-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      934140ab
    • K
      fscache: Allow cancelled operations to be enqueued · d0eb06af
      Kiran Kumar Modukuri 提交于
      Alter the state-check assertion in fscache_enqueue_operation() to allow
      cancelled operations to be given processing time so they can be cleaned up.
      
      Also fix a debugging statement that was requiring such operations to have
      an object assigned.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d0eb06af
  3. 22 7月, 2018 3 次提交
  4. 19 7月, 2018 1 次提交
    • F
      Btrfs: fix file data corruption after cloning a range and fsync · bd3599a0
      Filipe Manana 提交于
      When we clone a range into a file we can end up dropping existing
      extent maps (or trimming them) and replacing them with new ones if the
      range to be cloned overlaps with a range in the destination inode.
      When that happens we add the new extent maps to the list of modified
      extents in the inode's extent map tree, so that a "fast" fsync (the flag
      BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
      and log corresponding extent items. However, at the end of range cloning
      operation we do truncate all the pages in the affected range (in order to
      ensure future reads will not get stale data). Sometimes this truncation
      will release the corresponding extent maps besides the pages from the page
      cache. If this happens, then a "fast" fsync operation will miss logging
      some extent items, because it relies exclusively on the extent maps being
      present in the inode's extent tree, leading to data loss/corruption if
      the fsync ends up using the same transaction used by the clone operation
      (that transaction was not committed in the meanwhile). An extent map is
      released through the callback btrfs_invalidatepage(), which gets called by
      truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
      later ends up calling try_release_extent_mapping() which will release the
      extent map if some conditions are met, like the file size being greater
      than 16Mb, gfp flags allow blocking and the range not being locked (which
      is the case during the clone operation) nor being the extent map flagged
      as pinned (also the case for cloning).
      
      The following example, turned into a test for fstests, reproduces the
      issue:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
        $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
      
        $ xfs_io -c "fsync" /mnt/bar
        # reflink destination offset corresponds to the size of file bar,
        # 2728Kb minus 4Kb.
        $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
        $ xfs_io -c "fsync" /mnt/bar
      
        $ md5sum /mnt/bar
        95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar
      
        <power fail>
      
        $ mount /dev/sdb /mnt
        $ md5sum /mnt/bar
        207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
        # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
        # power failure
      
      In the above example, the destination offset of the clone operation
      corresponds to the size of the "bar" file minus 4Kb. So during the clone
      operation, the extent map covering the range from 2572Kb to 2728Kb gets
      trimmed so that it ends at offset 2724Kb, and a new extent map covering
      the range from 2724Kb to 11724Kb is created. So at the end of the clone
      operation when we ask to truncate the pages in the range from 2724Kb to
      2724Kb + 15908Kb, the page invalidation callback ends up removing the new
      extent map (through try_release_extent_mapping()) when the page at offset
      2724Kb is passed to that callback.
      
      Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
      map is removed at try_release_extent_mapping(), forcing the next fsync to
      search for modified extents in the fs/subvolume tree instead of relying on
      the presence of extent maps in memory. This way we can continue doing a
      "fast" fsync if the destination range of a clone operation does not
      overlap with an existing range or if any of the criteria necessary to
      remove an extent map at try_release_extent_mapping() is not met (file
      size not bigger then 16Mb or gfp flags do not allow blocking).
      
      CC: stable@vger.kernel.org # 3.16+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bd3599a0
  5. 18 7月, 2018 1 次提交
  6. 17 7月, 2018 1 次提交
    • Q
      btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() · 665d4953
      Qu Wenruo 提交于
      In commit ac0b4145 ("btrfs: scrub: Don't use inode pages for device
      replace") we removed the branch of copy_nocow_pages() to avoid
      corruption for compressed nodatasum extents.
      
      However above commit only solves the problem in scrub_extent(), if
      during scrub_pages() we failed to read some pages,
      sctx->no_io_error_seen will be non-zero and we go to fixup function
      scrub_handle_errored_block().
      
      In scrub_handle_errored_block(), for sctx without csum (no matter if
      we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
      which does the similar thing with copy_nocow_pages(), but does it
      without the extra check in copy_nocow_pages() routine.
      
      So for test cases like btrfs/100, where we emulate read errors during
      replace/scrub, we could corrupt compressed extent data again.
      
      This patch will fix it just by avoiding any "optimization" for
      nodatasum, just falls back to the normal fixup routine by try read from
      any good copy.
      
      This also solves WARN_ON() or dead lock caused by lame backref iteration
      in scrub_fixup_nodatasum() routine.
      
      The deadlock or WARN_ON() won't be triggered before commit ac0b4145
      ("btrfs: scrub: Don't use inode pages for device replace") since
      copy_nocow_pages() have better locking and extra check for data extent,
      and it's already doing the fixup work by try to read data from any good
      copy, so it won't go scrub_fixup_nodatasum() anyway.
      
      This patch disables the faulty code and will be removed completely in a
      followup patch.
      
      Fixes: ac0b4145 ("btrfs: scrub: Don't use inode pages for device replace")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      665d4953
  7. 15 7月, 2018 4 次提交
  8. 13 7月, 2018 2 次提交
  9. 11 7月, 2018 1 次提交
  10. 06 7月, 2018 10 次提交
    • L
      Fix up non-directory creation in SGID directories · 0fa3ecd8
      Linus Torvalds 提交于
      sgid directories have special semantics, making newly created files in
      the directory belong to the group of the directory, and newly created
      subdirectories will also become sgid.  This is historically used for
      group-shared directories.
      
      But group directories writable by non-group members should not imply
      that such non-group members can magically join the group, so make sure
      to clear the sgid bit on non-directories for non-members (but remember
      that sgid without group execute means "mandatory locking", just to
      confuse things even more).
      Reported-by: NJann Horn <jannh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fa3ecd8
    • S
      cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf() · 729c0c9d
      Stefano Brivio 提交于
      smb{2,3}_create_lease_buf() store a lease key in the lease
      context for later usage on a lease break.
      
      In most paths, the key is currently sourced from data that
      happens to be on the stack near local variables for oplock in
      SMB2_open() callers, e.g. from open_shroot(), whereas
      smb2_open_file() properly allocates space on its stack for it.
      
      The address of those local variables holding the oplock is then
      passed to create_lease_buf handlers via SMB2_open(), and 16
      bytes near oplock are used. This causes a stack out-of-bounds
      access as reported by KASAN on SMB2.1 and SMB3 mounts (first
      out-of-bounds access is shown here):
      
      [  111.528823] BUG: KASAN: stack-out-of-bounds in smb3_create_lease_buf+0x399/0x3b0 [cifs]
      [  111.530815] Read of size 8 at addr ffff88010829f249 by task mount.cifs/985
      [  111.532838] CPU: 3 PID: 985 Comm: mount.cifs Not tainted 4.18.0-rc3+ #91
      [  111.534656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [  111.536838] Call Trace:
      [  111.537528]  dump_stack+0xc2/0x16b
      [  111.540890]  print_address_description+0x6a/0x270
      [  111.542185]  kasan_report+0x258/0x380
      [  111.544701]  smb3_create_lease_buf+0x399/0x3b0 [cifs]
      [  111.546134]  SMB2_open+0x1ef8/0x4b70 [cifs]
      [  111.575883]  open_shroot+0x339/0x550 [cifs]
      [  111.591969]  smb3_qfs_tcon+0x32c/0x1e60 [cifs]
      [  111.617405]  cifs_mount+0x4f3/0x2fc0 [cifs]
      [  111.674332]  cifs_smb3_do_mount+0x263/0xf10 [cifs]
      [  111.677915]  mount_fs+0x55/0x2b0
      [  111.679504]  vfs_kern_mount.part.22+0xaa/0x430
      [  111.684511]  do_mount+0xc40/0x2660
      [  111.698301]  ksys_mount+0x80/0xd0
      [  111.701541]  do_syscall_64+0x14e/0x4b0
      [  111.711807]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  111.713665] RIP: 0033:0x7f372385b5fa
      [  111.715311] Code: 48 8b 0d 99 78 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 66 78 2c 00 f7 d8 64 89 01 48
      [  111.720330] RSP: 002b:00007ffff27049d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  111.722601] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f372385b5fa
      [  111.724842] RDX: 000055c2ecdc73b2 RSI: 000055c2ecdc73f9 RDI: 00007ffff270580f
      [  111.727083] RBP: 00007ffff2705804 R08: 000055c2ee976060 R09: 0000000000001000
      [  111.729319] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f3723f4d000
      [  111.731615] R13: 000055c2ee976060 R14: 00007f3723f4f90f R15: 0000000000000000
      
      [  111.735448] The buggy address belongs to the page:
      [  111.737420] page:ffffea000420a7c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  111.739890] flags: 0x17ffffc0000000()
      [  111.741750] raw: 0017ffffc0000000 0000000000000000 dead000000000200 0000000000000000
      [  111.744216] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [  111.746679] page dumped because: kasan: bad access detected
      
      [  111.750482] Memory state around the buggy address:
      [  111.752562]  ffff88010829f100: 00 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
      [  111.754991]  ffff88010829f180: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
      [  111.757401] >ffff88010829f200: 00 00 00 00 00 f1 f1 f1 f1 01 f2 f2 f2 f2 f2 f2
      [  111.759801]                                               ^
      [  111.762034]  ffff88010829f280: f2 02 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
      [  111.764486]  ffff88010829f300: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  111.766913] ==================================================================
      
      Lease keys are however already generated and stored in fid data
      on open and create paths: pass them down to the lease context
      creation handlers and use them.
      Suggested-by: NAurélien Aptel <aaptel@suse.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Fixes: b8c32dbb ("CIFS: Request SMB2.1 leases")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      729c0c9d
    • P
      cifs: Fix infinite loop when using hard mount option · 7ffbe655
      Paulo Alcantara 提交于
      For every request we send, whether it is SMB1 or SMB2+, we attempt to
      reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying
      out the request.
      
      So, while server->tcpStatus != CifsNeedReconnect, we wait for the
      reconnection to succeed on wait_event_interruptible_timeout(). If it
      returns, that means that either the condition was evaluated to true, or
      timeout elapsed, or it was interrupted by a signal.
      
      Since we're not handling the case where the process woke up due to a
      received signal (-ERESTARTSYS), the next call to
      wait_event_interruptible_timeout() will _always_ fail and we end up
      looping forever inside either cifs_reconnect_tcon() or smb2_reconnect().
      
      Here's an example of how to trigger that:
      
      $ mount.cifs //foo/share /mnt/test -o
      username=foo,password=foo,vers=1.0,hard
      
      (break connection to server before executing bellow cmd)
      $ stat -f /mnt/test & sleep 140
      [1] 2511
      
      $ ps -aux -q 2511
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      root      2511  0.0  0.0  12892  1008 pts/0    S    12:24   0:00 stat -f
      /mnt/test
      
      $ kill -9 2511
      
      (wait for a while; process is stuck in the kernel)
      $ ps -aux -q 2511
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      root      2511 83.2  0.0  12892  1008 pts/0    R    12:24  30:01 stat -f
      /mnt/test
      
      By using 'hard' mount point means that cifs.ko will keep retrying
      indefinitely, however we must allow the process to be killed otherwise
      it would hang the system.
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Cc: stable@vger.kernel.org
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      7ffbe655
    • S
      cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting · f46ecbd9
      Stefano Brivio 提交于
      A "small" CIFS buffer is not big enough in general to hold a
      setacl request for SMB2, and we end up overflowing the buffer in
      send_set_info(). For instance:
      
       # mount.cifs //127.0.0.1/test /mnt/test -o username=test,password=test,nounix,cifsacl
       # touch /mnt/test/acltest
       # getcifsacl /mnt/test/acltest
       REVISION:0x1
       CONTROL:0x9004
       OWNER:S-1-5-21-2926364953-924364008-418108241-1000
       GROUP:S-1-22-2-1001
       ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff
       ACL:S-1-22-2-1001:ALLOWED/0x0/R
       ACL:S-1-22-2-1001:ALLOWED/0x0/R
       ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff
       ACL:S-1-1-0:ALLOWED/0x0/R
       # setcifsacl -a "ACL:S-1-22-2-1004:ALLOWED/0x0/R" /mnt/test/acltest
      
      this setacl will cause the following KASAN splat:
      
      [  330.777927] BUG: KASAN: slab-out-of-bounds in send_set_info+0x4dd/0xc20 [cifs]
      [  330.779696] Write of size 696 at addr ffff88010d5e2860 by task setcifsacl/1012
      
      [  330.781882] CPU: 1 PID: 1012 Comm: setcifsacl Not tainted 4.18.0-rc2+ #2
      [  330.783140] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [  330.784395] Call Trace:
      [  330.784789]  dump_stack+0xc2/0x16b
      [  330.786777]  print_address_description+0x6a/0x270
      [  330.787520]  kasan_report+0x258/0x380
      [  330.788845]  memcpy+0x34/0x50
      [  330.789369]  send_set_info+0x4dd/0xc20 [cifs]
      [  330.799511]  SMB2_set_acl+0x76/0xa0 [cifs]
      [  330.801395]  set_smb2_acl+0x7ac/0xf30 [cifs]
      [  330.830888]  cifs_xattr_set+0x963/0xe40 [cifs]
      [  330.840367]  __vfs_setxattr+0x84/0xb0
      [  330.842060]  __vfs_setxattr_noperm+0xe6/0x370
      [  330.843848]  vfs_setxattr+0xc2/0xd0
      [  330.845519]  setxattr+0x258/0x320
      [  330.859211]  path_setxattr+0x15b/0x1b0
      [  330.864392]  __x64_sys_setxattr+0xc0/0x160
      [  330.866133]  do_syscall_64+0x14e/0x4b0
      [  330.876631]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  330.878503] RIP: 0033:0x7ff2e507db0a
      [  330.880151] Code: 48 8b 0d 89 93 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 bc 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 93 2c 00 f7 d8 64 89 01 48
      [  330.885358] RSP: 002b:00007ffdc4903c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
      [  330.887733] RAX: ffffffffffffffda RBX: 000055d1170de140 RCX: 00007ff2e507db0a
      [  330.890067] RDX: 000055d1170de7d0 RSI: 000055d115b39184 RDI: 00007ffdc4904818
      [  330.892410] RBP: 0000000000000001 R08: 0000000000000000 R09: 000055d1170de7e4
      [  330.894785] R10: 00000000000002b8 R11: 0000000000000246 R12: 0000000000000007
      [  330.897148] R13: 000055d1170de0c0 R14: 0000000000000008 R15: 000055d1170de550
      
      [  330.901057] Allocated by task 1012:
      [  330.902888]  kasan_kmalloc+0xa0/0xd0
      [  330.904714]  kmem_cache_alloc+0xc8/0x1d0
      [  330.906615]  mempool_alloc+0x11e/0x380
      [  330.908496]  cifs_small_buf_get+0x35/0x60 [cifs]
      [  330.910510]  smb2_plain_req_init+0x4a/0xd60 [cifs]
      [  330.912551]  send_set_info+0x198/0xc20 [cifs]
      [  330.914535]  SMB2_set_acl+0x76/0xa0 [cifs]
      [  330.916465]  set_smb2_acl+0x7ac/0xf30 [cifs]
      [  330.918453]  cifs_xattr_set+0x963/0xe40 [cifs]
      [  330.920426]  __vfs_setxattr+0x84/0xb0
      [  330.922284]  __vfs_setxattr_noperm+0xe6/0x370
      [  330.924213]  vfs_setxattr+0xc2/0xd0
      [  330.926008]  setxattr+0x258/0x320
      [  330.927762]  path_setxattr+0x15b/0x1b0
      [  330.929592]  __x64_sys_setxattr+0xc0/0x160
      [  330.931459]  do_syscall_64+0x14e/0x4b0
      [  330.933314]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  330.936843] Freed by task 0:
      [  330.938588] (stack is not available)
      
      [  330.941886] The buggy address belongs to the object at ffff88010d5e2800
       which belongs to the cache cifs_small_rq of size 448
      [  330.946362] The buggy address is located 96 bytes inside of
       448-byte region [ffff88010d5e2800, ffff88010d5e29c0)
      [  330.950722] The buggy address belongs to the page:
      [  330.952789] page:ffffea0004357880 count:1 mapcount:0 mapping:ffff880108fdca80 index:0x0 compound_mapcount: 0
      [  330.955665] flags: 0x17ffffc0008100(slab|head)
      [  330.957760] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880108fdca80
      [  330.960356] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
      [  330.963005] page dumped because: kasan: bad access detected
      
      [  330.967039] Memory state around the buggy address:
      [  330.969255]  ffff88010d5e2880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  330.971833]  ffff88010d5e2900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  330.974397] >ffff88010d5e2980: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      [  330.976956]                                            ^
      [  330.979226]  ffff88010d5e2a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  330.981755]  ffff88010d5e2a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  330.984225] ==================================================================
      
      Fix this by allocating a regular CIFS buffer in
      smb2_plain_req_init() if the request command is SMB2_SET_INFO.
      Reported-by: NJianhong Yin <jiyin@redhat.com>
      Fixes: 366ed846 ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function")
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-and-tested-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f46ecbd9
    • P
      cifs: Fix memory leak in smb2_set_ea() · 6aa0c114
      Paulo Alcantara 提交于
      This patch fixes a memory leak when doing a setxattr(2) in SMB2+.
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      6aa0c114
    • R
      cifs: fix SMB1 breakage · 81f39f95
      Ronnie Sahlberg 提交于
      SMB1 mounting broke in commit 35e2cc1b
      ("cifs: Use correct packet length in SMB2_TRANSFORM header")
      Fix it and also rename smb2_rqst_len to smb_rqst_len
      to make it less unobvious that the function is also called from
      CIFS/SMB1
      
      Good job by Paulo reviewing and cleaning up Ronnie's original patch.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPaulo Alcantara <palcantara@suse.de>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      81f39f95
    • P
      cifs: Fix validation of signed data in smb2 · 8de8c460
      Paulo Alcantara 提交于
      Fixes: c713c877 ("cifs: push rfc1002 generation down the stack")
      
      We failed to validate signed data returned by the server because
      __cifs_calc_signature() now expects to sign the actual data in iov but
      we were also passing down the rfc1002 length.
      
      Fix smb3_calc_signature() to calculate signature of rfc1002 length prior
      to passing only the actual data iov[1-N] to __cifs_calc_signature(). In
      addition, there are a few cases where no rfc1002 length is passed so we
      make sure there's one (iov_len == 4).
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      8de8c460
    • P
      cifs: Fix validation of signed data in smb3+ · 27c32b49
      Paulo Alcantara 提交于
      Fixes: c713c877 ("cifs: push rfc1002 generation down the stack")
      
      We failed to validate signed data returned by the server because
      __cifs_calc_signature() now expects to sign the actual data in iov but
      we were also passing down the rfc1002 length.
      
      Fix smb3_calc_signature() to calculate signature of rfc1002 length prior
      to passing only the actual data iov[1-N] to __cifs_calc_signature(). In
      addition, there are a few cases where no rfc1002 length is passed so we
      make sure there's one (iov_len == 4).
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      27c32b49
    • L
      cifs: Fix use after free of a mid_q_entry · 696e420b
      Lars Persson 提交于
      With protocol version 2.0 mounts we have seen crashes with corrupt mid
      entries. Either the server->pending_mid_q list becomes corrupt with a
      cyclic reference in one element or a mid object fetched by the
      demultiplexer thread becomes overwritten during use.
      
      Code review identified a race between the demultiplexer thread and the
      request issuing thread. The demultiplexer thread seems to be written
      with the assumption that it is the sole user of the mid object until
      it calls the mid callback which either wakes the issuer task or
      deletes the mid.
      
      This assumption is not true because the issuer task can be woken up
      earlier by a signal. If the demultiplexer thread has proceeded as far
      as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
      thread will happily end up calling cifs_delete_mid while the
      demultiplexer thread still is using the mid object.
      
      Inserting a delay in the cifs demultiplexer thread widens the race
      window and makes reproduction of the race very easy:
      
      		if (server->large_buf)
      			buf = server->bigbuf;
      
      +		usleep_range(500, 4000);
      
      		server->lstrp = jiffies;
      
      To resolve this I think the proper solution involves putting a
      reference count on the mid object. This patch makes sure that the
      demultiplexer thread holds a reference until it has finished
      processing the transaction.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NLars Persson <larper@axis.com>
      Acked-by: NPaulo Alcantara <palcantara@suse.de>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      696e420b
    • L
      autofs: rename 'autofs' module back to 'autofs4' · d02d21ea
      Linus Torvalds 提交于
      It turns out that systemd has a bug: it wants to load the autofs module
      early because of some initialization ordering with udev, and it doesn't
      do that correctly.  Everywhere else it does the proper "look up module
      name" that does the proper alias resolution, but in that early code, it
      just uses a hardcoded "autofs4" for the module name.
      
      The result of that is that as of commit a2225d93 ("autofs: remove
      left-over autofs4 stubs"), you get
      
          systemd[1]: Failed to insert module 'autofs4': No such file or directory
      
      in the system logs, and a lack of module loading.  All this despite the
      fact that we had very clearly marked 'autofs4' as an alias for this
      module.
      
      What's so ridiculous about this is that literally everything else does
      the module alias handling correctly, including really old versions of
      systemd (that just used 'modprobe' to do this), and even all the other
      systemd module loading code.
      
      Only that special systemd early module load code is broken, hardcoding
      the module names for not just 'autofs4', but also "ipv6", "unix",
      "ip_tables" and "virtio_rng".  Very annoying.
      
      Instead of creating an _additional_ separate compatibility 'autofs4'
      module, just rely on the fact that everybody else gets this right, and
      just call the module 'autofs4' for compatibility reasons, with 'autofs'
      as the alias name.
      
      That will allow the systemd people to fix their bugs, adding the proper
      alias handling, and maybe even fix the name of the module to be just
      "autofs" (so that they can _test_ the alias handling).  And eventually,
      we can revert this silly compatibility hack.
      
      See also
      
          https://github.com/systemd/systemd/issues/9501
          https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902946
      
      for the systemd bug reports upstream and in the Debian bug tracker
      respectively.
      
      Fixes: a2225d93 ("autofs: remove left-over autofs4 stubs")
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Reported-by: NMichael Biebl <biebl@debian.org>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d02d21ea
  11. 04 7月, 2018 1 次提交
  12. 03 7月, 2018 1 次提交
  13. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  14. 28 6月, 2018 2 次提交
    • F
      Btrfs: fix mount failure when qgroup rescan is in progress · e4e7ede7
      Filipe Manana 提交于
      If a power failure happens while the qgroup rescan kthread is running,
      the next mount operation will always fail. This is because of a recent
      regression that makes qgroup_rescan_init() incorrectly return -EINVAL
      when we are mounting the filesystem (through btrfs_read_qgroup_config()).
      This causes the -EINVAL error to be returned regardless of any qgroup
      flags being set instead of returning the error only when neither of
      the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON
      are set.
      
      A test case for fstests follows up soon.
      
      Fixes: 9593bf49 ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e4e7ede7
    • C
      Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion · 717beb96
      Chris Mason 提交于
      The vm_fault_t conversion commit introduced a ret2 variable for tracking
      the integer return values from internal btrfs functions.  It was
      sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
      and had been removed from the radix.  Something like this:
      
          ret2 = btrfs_delalloc_reserve_space() // returns zero on success
      
          lock_page(page)
          if (page->mapping != inode->i_mapping)
      	goto out_unlock;
      
      ...
      
      out_unlock:
          if (!ret2) {
      	    ...
      	    return VM_FAULT_LOCKED;
          }
      
      This ends up triggering this WARNING in btrfs_destroy_inode()
          WARN_ON(BTRFS_I(inode)->block_rsv.size);
      
      xfstests generic/095 was able to reliably reproduce the errors.
      
      Since out_unlock: is only used for errors, this fix moves it below the
      if (!ret2) check we use to return VM_FAULT_LOCKED for success.
      
      Fixes: a528a241 (btrfs: change return type of btrfs_page_mkwrite to vm_fault_t)
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      717beb96