1. 27 7月, 2021 1 次提交
    • S
      SMB3: fix readpage for large swap cache · f2a26a3c
      Steve French 提交于
      readpage was calculating the offset of the page incorrectly
      for the case of large swapcaches.
      
          loff_t offset = (loff_t)page->index << PAGE_SHIFT;
      
      As pointed out by Matthew Wilcox, this needs to use
      page_file_offset() to calculate the offset instead.
      Pages coming from the swap cache have page->index set
      to their index within the swapcache, not within the backing
      file.  For a sufficiently large swapcache, we could have
      overlapping values of page->index within the same backing file.
      
      Suggested by: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org> # v5.7+
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f2a26a3c
  2. 21 6月, 2021 1 次提交
  3. 21 5月, 2021 1 次提交
    • R
      Fix KASAN identified use-after-free issue. · 9687c85d
      Rohith Surabattula 提交于
      [  612.157429] ==================================================================
      [  612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0
      [  612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382
      
      [  612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G
      OE     5.13.0-rc2+ #98
      [  612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-1.fc33 04/01/2014
      [  612.159640] Workqueue:  0x0 (deferredclose)
      [  612.159669] Call Trace:
      [  612.159685]  dump_stack+0xbb/0x107
      [  612.159711]  print_address_description.constprop.0+0x18/0x140
      [  612.159733]  ? process_one_work+0x90/0x9b0
      [  612.159743]  ? process_one_work+0x90/0x9b0
      [  612.159754]  kasan_report.cold+0x7c/0xd8
      [  612.159778]  ? lock_is_held_type+0x80/0x130
      [  612.159789]  ? process_one_work+0x90/0x9b0
      [  612.159812]  kasan_check_range+0x145/0x1a0
      [  612.159834]  process_one_work+0x90/0x9b0
      [  612.159877]  ? pwq_dec_nr_in_flight+0x110/0x110
      [  612.159914]  ? spin_bug+0x90/0x90
      [  612.159967]  worker_thread+0x3b6/0x6c0
      [  612.160023]  ? process_one_work+0x9b0/0x9b0
      [  612.160038]  kthread+0x1dc/0x200
      [  612.160051]  ? kthread_create_worker_on_cpu+0xd0/0xd0
      [  612.160092]  ret_from_fork+0x1f/0x30
      
      [  612.160399] Allocated by task 2358:
      [  612.160757]  kasan_save_stack+0x1b/0x40
      [  612.160768]  __kasan_kmalloc+0x9b/0xd0
      [  612.160778]  cifs_new_fileinfo+0xb0/0x960 [cifs]
      [  612.161170]  cifs_open+0xadf/0xf20 [cifs]
      [  612.161421]  do_dentry_open+0x2aa/0x6b0
      [  612.161432]  path_openat+0xbd9/0xfa0
      [  612.161441]  do_filp_open+0x11d/0x230
      [  612.161450]  do_sys_openat2+0x115/0x240
      [  612.161460]  __x64_sys_openat+0xce/0x140
      
      When mod_delayed_work is called to modify the delay of pending work,
      it might return false and queue a new work when pending work is
      already scheduled or when try to grab pending work failed.
      
      So, Increase the reference count when new work is scheduled to
      avoid use-after-free.
      Signed-off-by: NRohith Surabattula <rohiths@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      9687c85d
  4. 20 5月, 2021 2 次提交
  5. 05 5月, 2021 1 次提交
  6. 04 5月, 2021 2 次提交
    • S
      cifs: add shutdown support · 087f757b
      Steve French 提交于
      Various filesystem support the shutdown ioctl which is used by various
      xfstests. The shutdown ioctl sets a flag on the superblock which
      prevents open, unlink, symlink, hardlink, rmdir, create etc.
      on the file system until unmount and remounted. The two flags supported
      in this patch are:
      
        FSOP_GOING_FLAGS_LOGFLUSH and FSOP_GOING_FLAGS_NOLOGFLUSH
      
      which require very little other than blocking new operations (since
      we do not cache writes to metadata on the client with cifs.ko).
      FSOP_GOING_FLAGS_DEFAULT is not supported yet, but could be added in
      the future but would need to call syncfs or equivalent to write out
      pending data on the mount.
      
      With this patch various xfstests now work including tests 043 through
      046 for example.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      087f757b
    • R
      cifs: Deferred close for files · c3f207ab
      Rohith Surabattula 提交于
      When file is closed, SMB2 close request is not sent to server
      immediately and is deferred for acregmax defined interval. When file is
      reopened by same process for read or write, the file handle
      is reused if an oplock is held.
      
      When client receives a oplock/lease break, file is closed immediately
      if reference count is zero, else oplock is downgraded.
      Signed-off-by: NRohith Surabattula <rohiths@microsoft.com>
      Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      c3f207ab
  7. 26 4月, 2021 3 次提交
    • A
      cifs: allocate buffer in the caller of build_path_from_dentry() · f6a9bc33
      Al Viro 提交于
      build_path_from_dentry() open-codes dentry_path_raw().  The reason
      we can't use dentry_path_raw() in there (and postprocess the
      result as needed) is that the callers of build_path_from_dentry()
      expect that the object to be freed on cleanup and the string to
      be used are at the same address.  That's painful, since the path
      is naturally built end-to-beginning - we start at the leaf and
      go through the ancestors, accumulating the pathname.
      
      Life would be easier if we left the buffer allocation to callers.
      It wouldn't be exact-sized buffer, but none of the callers keep
      the result for long - it's always freed before the caller returns.
      So there's no need to do exact-sized allocation; better use
      __getname()/__putname(), same as we do for pathname arguments
      of syscalls.  What's more, there's no need to do allocation under
      spinlocks, so GFP_ATOMIC is not needed.
      
      Next patch will replace the open-coded dentry_path_raw() (in
      build_path_from_dentry_optional_prefix()) with calling the real
      thing.  This patch only introduces wrappers for allocating/freeing
      the buffers and switches to new calling conventions:
      	build_path_from_dentry(dentry, buf)
      expects buf to be address of a page-sized object or NULL,
      return value is a pathname built inside that buffer on success,
      ERR_PTR(-ENOMEM) if buf is NULL and ERR_PTR(-ENAMETOOLONG) if
      the pathname won't fit into page.  Note that we don't need to
      check for failure when allocating the buffer in the caller -
      build_path_from_dentry() will do the right thing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f6a9bc33
    • A
      cifs: make build_path_from_dentry() return const char * · 8e33cf20
      Al Viro 提交于
      ... and adjust the callers.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      8e33cf20
    • A
      f6f1f179
  8. 27 3月, 2021 1 次提交
  9. 13 3月, 2021 1 次提交
  10. 23 2月, 2021 1 次提交
  11. 14 12月, 2020 3 次提交
  12. 08 7月, 2020 1 次提交
  13. 24 6月, 2020 1 次提交
    • Z
      cifs: Fix double add page to memcg when cifs_readpages · 95a3d8f3
      Zhang Xiaoxu 提交于
      When xfstests generic/451, there is an BUG at mm/memcontrol.c:
        page:ffffea000560f2c0 refcount:2 mapcount:0 mapping:000000008544e0ea
             index:0xf
        mapping->aops:cifs_addr_ops dentry name:"tst-aio-dio-cycle-write.451"
        flags: 0x2fffff80000001(locked)
        raw: 002fffff80000001 ffffc90002023c50 ffffea0005280088 ffff88815cda0210
        raw: 000000000000000f 0000000000000000 00000002ffffffff ffff88817287d000
        page dumped because: VM_BUG_ON_PAGE(page->mem_cgroup)
        page->mem_cgroup:ffff88817287d000
        ------------[ cut here ]------------
        kernel BUG at mm/memcontrol.c:2659!
        invalid opcode: 0000 [#1] SMP
        CPU: 2 PID: 2038 Comm: xfs_io Not tainted 5.8.0-rc1 #44
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_
          073836-buildvm-ppc64le-16.ppc.4
        RIP: 0010:commit_charge+0x35/0x50
        Code: 0d 48 83 05 54 b2 02 05 01 48 89 77 38 c3 48 c7
              c6 78 4a ea ba 48 83 05 38 b2 02 05 01 e8 63 0d9
        RSP: 0018:ffffc90002023a50 EFLAGS: 00010202
        RAX: 0000000000000000 RBX: ffff88817287d000 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffff88817ac97ea0 RDI: ffff88817ac97ea0
        RBP: ffffea000560f2c0 R08: 0000000000000203 R09: 0000000000000005
        R10: 0000000000000030 R11: ffffc900020237a8 R12: 0000000000000000
        R13: 0000000000000001 R14: 0000000000000001 R15: ffff88815a1272c0
        FS:  00007f5071ab0800(0000) GS:ffff88817ac80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000055efcd5ca000 CR3: 000000015d312000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         mem_cgroup_charge+0x166/0x4f0
         __add_to_page_cache_locked+0x4a9/0x710
         add_to_page_cache_locked+0x15/0x20
         cifs_readpages+0x217/0x1270
         read_pages+0x29a/0x670
         page_cache_readahead_unbounded+0x24f/0x390
         __do_page_cache_readahead+0x3f/0x60
         ondemand_readahead+0x1f1/0x470
         page_cache_async_readahead+0x14c/0x170
         generic_file_buffered_read+0x5df/0x1100
         generic_file_read_iter+0x10c/0x1d0
         cifs_strict_readv+0x139/0x170
         new_sync_read+0x164/0x250
         __vfs_read+0x39/0x60
         vfs_read+0xb5/0x1e0
         ksys_pread64+0x85/0xf0
         __x64_sys_pread64+0x22/0x30
         do_syscall_64+0x69/0x150
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f5071fcb1af
        Code: Bad RIP value.
        RSP: 002b:00007ffde2cdb8e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
        RAX: ffffffffffffffda RBX: 00007ffde2cdb990 RCX: 00007f5071fcb1af
        RDX: 0000000000001000 RSI: 000055efcd5ca000 RDI: 0000000000000003
        RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000001
        R13: 000000000009f000 R14: 0000000000000000 R15: 0000000000001000
        Modules linked in:
        ---[ end trace 725fa14a3e1af65c ]---
      
      Since commit 3fea5a49 ("mm: memcontrol: convert page cache to a new
      mem_cgroup_charge() API") not cancel the page charge, the pages maybe
      double add to pagecache:
      thread1                       | thread2
      cifs_readpages
      readpages_get_pages
       add_to_page_cache_locked(head,index=n)=0
                                    | readpages_get_pages
                                    | add_to_page_cache_locked(head,index=n+1)=0
       add_to_page_cache_locked(head, index=n+1)=-EEXIST
       then, will next loop with list head page's
       index=n+1 and the page->mapping not NULL
      readpages_get_pages
      add_to_page_cache_locked(head, index=n+1)
       commit_charge
        VM_BUG_ON_PAGE
      
      So, we should not do the next loop when any page add to page cache
      failed.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
      95a3d8f3
  14. 12 6月, 2020 1 次提交
  15. 05 6月, 2020 2 次提交
    • A
      cifs: multichannel: move channel selection above transport layer · 352d96f3
      Aurelien Aptel 提交于
      Move the channel (TCP_Server_Info*) selection from the tranport
      layer to higher in the call stack so that:
      
      - credit handling is done with the server that will actually be used
        to send.
        * ->wait_mtu_credit
        * ->set_credits / set_credits
        * ->add_credits / add_credits
        * add_credits_and_wake_if
      
      - potential reconnection (smb2_reconnect) done when initializing a
        request is checked and done with the server that will actually be
        used to send.
      
      To do this:
      
      - remove the cifs_pick_channel() call out of compound_send_recv()
      
      - select channel and pass it down by adding a cifs_pick_channel(ses)
        call in:
        - smb311_posix_mkdir
        - SMB2_open
        - SMB2_ioctl
        - __SMB2_close
        - query_info
        - SMB2_change_notify
        - SMB2_flush
        - smb2_async_readv  (if none provided in context param)
        - SMB2_read         (if none provided in context param)
        - smb2_async_writev (if none provided in context param)
        - SMB2_write        (if none provided in context param)
        - SMB2_query_directory
        - send_set_info
        - SMB2_oplock_break
        - SMB311_posix_qfs_info
        - SMB2_QFS_info
        - SMB2_QFS_attr
        - smb2_lockv
        - SMB2_lease_break
          - smb2_compound_op
        - smb2_set_ea
        - smb2_ioctl_query_info
        - smb2_query_dir_first
        - smb2_query_info_comound
        - smb2_query_symlink
        - cifs_writepages
        - cifs_write_from_iter
        - cifs_send_async_read
        - cifs_read
        - cifs_readpages
      
      - add TCP_Server_Info *server param argument to:
        - cifs_send_recv
        - compound_send_recv
        - SMB2_open_init
        - SMB2_query_info_init
        - SMB2_set_info_init
        - SMB2_close_init
        - SMB2_ioctl_init
        - smb2_iotcl_req_init
        - SMB2_query_directory_init
        - SMB2_notify_init
        - SMB2_flush_init
        - build_qfs_info_req
        - smb2_hdr_assemble
        - smb2_reconnect
        - fill_small_buf
        - smb2_plain_req_init
        - __smb2_plain_req_init
      
      The read/write codepath is different than the rest as it is using
      pages, io iterators and async calls. To deal with those we add a
      server pointer in the cifs_writedata/cifs_readdata/cifs_io_parms
      context struct and set it in:
      
      - cifs_writepages      (wdata)
      - cifs_write_from_iter (wdata)
      - cifs_readpages       (rdata)
      - cifs_send_async_read (rdata)
      
      The [rw]data->server pointer is eventually copied to
      cifs_io_parms->server to pass it down to SMB2_read/SMB2_write.
      If SMB2_read/SMB2_write is called from a different place that doesn't
      set the server field it will pick a channel.
      
      Some places do not pick a channel and just use ses->server or
      cifs_ses_server(ses). All cifs_ses_server(ses) calls are in codepaths
      involving negprot/sess.setup.
      
      - SMB2_negotiate         (binding channel)
      - SMB2_sess_alloc_buffer (binding channel)
      - SMB2_echo              (uses provided one)
      - SMB2_logoff            (uses master)
      - SMB2_tdis              (uses master)
      
      (list not exhaustive)
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      352d96f3
    • A
      cifs: multichannel: always zero struct cifs_io_parms · 7c06514a
      Aurelien Aptel 提交于
      SMB2_read/SMB2_write check and use cifs_io_parms->server, which might
      be uninitialized memory.
      
      This change makes all callers zero-initialize the struct.
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      7c06514a
  16. 04 6月, 2020 1 次提交
  17. 01 6月, 2020 1 次提交
    • J
      cifs: Standardize logging output · a0a3036b
      Joe Perches 提交于
      Use pr_fmt to standardize all logging for fs/cifs.
      
      Some logging output had no CIFS: specific prefix.
      
      Now all output has one of three prefixes:
      
      o CIFS:
      o CIFS: VFS:
      o Root-CIFS:
      
      Miscellanea:
      
      o Convert printks to pr_<level>
      o Neaten macro definitions
      o Remove embedded CIFS: prefixes from formats
      o Convert "illegal" to "invalid"
      o Coalesce formats
      o Add missing '\n' format terminations
      o Consolidate multiple cifs_dbg continuations into single calls
      o More consistent use of upper case first word output logging
      o Multiline statement argument alignment and wrapping
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      a0a3036b
  18. 14 5月, 2020 1 次提交
  19. 11 4月, 2020 1 次提交
    • S
      smb3: enable swap on SMB3 mounts · 4e8aea30
      Steve French 提交于
      Add experimental support for allowing a swap file to be on an SMB3
      mount.  There are use cases where swapping over a secure network
      filesystem is preferable. In some cases there are no local
      block devices large enough, and network block devices can be
      hard to setup and secure.  And in some cases there are no
      local block devices at all (e.g. with the recent addition of
      remote boot over SMB3 mounts).
      
      There are various enhancements that can be added later e.g.:
      - doing a mandatory byte range lock over the swapfile (until
      the Linux VFS is modified to notify the file system that an open
      is for a swapfile, when the file can be opened "DENY_ALL" to prevent
      others from opening it).
      - pinning more buffers in the underlying transport to minimize memory
      allocations in the TCP stack under the fs
      - documenting how to create ACLs (on the server) to secure the
      swapfile (or adding additional tools to cifs-utils to make it easier)
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      4e8aea30
  20. 23 3月, 2020 1 次提交
    • Y
      CIFS: Fix bug which the return value by asynchronous read is error · 97adda8b
      Yilu Lin 提交于
      This patch is used to fix the bug in collect_uncached_read_data()
      that rc is automatically converted from a signed number to an
      unsigned number when the CIFS asynchronous read fails.
      It will cause ctx->rc is error.
      
      Example:
      Share a directory and create a file on the Windows OS.
      Mount the directory to the Linux OS using CIFS.
      On the CIFS client of the Linux OS, invoke the pread interface to
      deliver the read request.
      
      The size of the read length plus offset of the read request is greater
      than the maximum file size.
      
      In this case, the CIFS server on the Windows OS returns a failure
      message (for example, the return value of
      smb2.nt_status is STATUS_INVALID_PARAMETER).
      
      After receiving the response message, the CIFS client parses
      smb2.nt_status to STATUS_INVALID_PARAMETER
      and converts it to the Linux error code (rdata->result=-22).
      
      Then the CIFS client invokes the collect_uncached_read_data function to
      assign the value of rdata->result to rc, that is, rc=rdata->result=-22.
      
      The type of the ctx->total_len variable is unsigned integer,
      the type of the rc variable is integer, and the type of
      the ctx->rc variable is ssize_t.
      
      Therefore, during the ternary operation, the value of rc is
      automatically converted to an unsigned number. The final result is
      ctx->rc=4294967274. However, the expected result is ctx->rc=-22.
      Signed-off-by: NYilu Lin <linyilu@huawei.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
      97adda8b
  21. 19 3月, 2020 1 次提交
  22. 25 2月, 2020 1 次提交
    • A
      cifs: fix rename() by ensuring source handle opened with DELETE bit · 86f740f2
      Aurelien Aptel 提交于
      To rename a file in SMB2 we open it with the DELETE access and do a
      special SetInfo on it. If the handle is missing the DELETE bit the
      server will fail the SetInfo with STATUS_ACCESS_DENIED.
      
      We currently try to reuse any existing opened handle we have with
      cifs_get_writable_path(). That function looks for handles with WRITE
      access but doesn't check for DELETE, making rename() fail if it finds
      a handle to reuse. Simple reproducer below.
      
      To select handles with the DELETE bit, this patch adds a flag argument
      to cifs_get_writable_path() and find_writable_file() and the existing
      'bool fsuid_only' argument is converted to a flag.
      
      The cifsFileInfo struct only stores the UNIX open mode but not the
      original SMB access flags. Since the DELETE bit is not mapped in that
      mode, this patch stores the access mask in cifs_fid on file open,
      which is accessible from cifsFileInfo.
      
      Simple reproducer:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <fcntl.h>
      	#include <unistd.h>
      	#define E(s) perror(s), exit(1)
      
      	int main(int argc, char *argv[])
      	{
      		int fd, ret;
      		if (argc != 3) {
      			fprintf(stderr, "Usage: %s A B\n"
      			"create&open A in write mode, "
      			"rename A to B, close A\n", argv[0]);
      			return 0;
      		}
      
      		fd = openat(AT_FDCWD, argv[1], O_WRONLY|O_CREAT|O_SYNC, 0666);
      		if (fd == -1) E("openat()");
      
      		ret = rename(argv[1], argv[2]);
      		if (ret) E("rename()");
      
      		ret = close(fd);
      		if (ret) E("close()");
      
      		return ret;
      	}
      
      $ gcc -o bugrename bugrename.c
      $ ./bugrename /mnt/a /mnt/b
      rename(): Permission denied
      
      Fixes: 8de9e86c ("cifs: create a helper to find a writeable handle by path name")
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      86f740f2
  23. 07 2月, 2020 1 次提交
  24. 06 2月, 2020 1 次提交
  25. 04 2月, 2020 1 次提交
  26. 27 1月, 2020 1 次提交
  27. 04 12月, 2019 1 次提交
    • S
      smb3: query attributes on file close · 43f8a6a7
      Steve French 提交于
      Since timestamps on files on most servers can be updated at
      close, and since timestamps on our dentries default to one
      second we can have stale timestamps in some common cases
      (e.g. open, write, close, stat, wait one second, stat - will
      show different mtime for the first and second stat).
      
      The SMB2/SMB3 protocol allows querying timestamps at close
      so add the code to request timestamp and attr information
      (which is cheap for the server to provide) to be returned
      when a file is closed (it is not needed for the many
      paths that call SMB2_close that are from compounded
      query infos and close nor is it needed for some of
      the cases where a directory close immediately follows a
      directory open.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      43f8a6a7
  28. 03 12月, 2019 1 次提交
    • P
      CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks · 6f582b27
      Pavel Shilovsky 提交于
      Currently when the client creates a cifsFileInfo structure for
      a newly opened file, it allocates a list of byte-range locks
      with a pointer to the new cfile and attaches this list to the
      inode's lock list. The latter happens before initializing all
      other fields, e.g. cfile->tlink. Thus a partially initialized
      cifsFileInfo structure becomes available to other threads that
      walk through the inode's lock list. One example of such a thread
      may be an oplock break worker thread that tries to push all
      cached byte-range locks. This causes NULL-pointer dereference
      in smb2_push_mandatory_locks() when accessing cfile->tlink:
      
      [598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
      ...
      [598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
      [598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
      ...
      [598428.945834] Call Trace:
      [598428.945870]  ? cifs_revalidate_mapping+0x45/0x90 [cifs]
      [598428.945901]  cifs_oplock_break+0x13d/0x450 [cifs]
      [598428.945909]  process_one_work+0x1db/0x380
      [598428.945914]  worker_thread+0x4d/0x400
      [598428.945921]  kthread+0x104/0x140
      [598428.945925]  ? process_one_work+0x380/0x380
      [598428.945931]  ? kthread_park+0x80/0x80
      [598428.945937]  ret_from_fork+0x35/0x40
      
      Fix this by reordering initialization steps of the cifsFileInfo
      structure: initialize all the fields first and then add the new
      byte-range lock list to the inode's lock list.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      6f582b27
  29. 25 11月, 2019 5 次提交
    • P
      CIFS: Properly process SMB3 lease breaks · 9bd45408
      Pavel Shilovsky 提交于
      Currenly we doesn't assume that a server may break a lease
      from RWH to RW which causes us setting a wrong lease state
      on a file and thus mistakenly flushing data and byte-range
      locks and purging cached data on the client. This leads to
      performance degradation because subsequent IOs go directly
      to the server.
      
      Fix this by propagating new lease state and epoch values
      to the oplock break handler through cifsFileInfo structure
      and removing the use of cifsInodeInfo flags for that. It
      allows to avoid some races of several lease/oplock breaks
      using those flags in parallel.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      9bd45408
    • R
      cifs: move cifsFileInfo_put logic into a work-queue · 32546a95
      Ronnie Sahlberg 提交于
      This patch moves the final part of the cifsFileInfo_put() logic where we
      need a write lock on lock_sem to be processed in a separate thread that
      holds no other locks.
      This is to prevent deadlocks like the one below:
      
      > there are 6 processes looping to while trying to down_write
      > cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
      > cifs_new_fileinfo
      >
      > and there are 5 other processes which are blocked, several of them
      > waiting on either PG_writeback or PG_locked (which are both set), all
      > for the same page of the file
      >
      > 2 inode_lock() (inode->i_rwsem) for the file
      > 1 wait_on_page_writeback() for the page
      > 1 down_read(inode->i_rwsem) for the inode of the directory
      > 1 inode_lock()(inode->i_rwsem) for the inode of the directory
      > 1 __lock_page
      >
      >
      > so processes are blocked waiting on:
      >   page flags PG_locked and PG_writeback for one specific page
      >   inode->i_rwsem for the directory
      >   inode->i_rwsem for the file
      >   cifsInodeInflock_sem
      >
      >
      >
      > here are the more gory details (let me know if I need to provide
      > anything more/better):
      >
      > [0 00:48:22.765] [UN]  PID: 8863   TASK: ffff8c691547c5c0  CPU: 3
      > COMMAND: "reopen_file"
      >  #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
      >  #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
      >  #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
      >  #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
      >  #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
      >  #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
      >  #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
      >  #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
      > * (I think legitimize_path is bogus)
      >
      > in path_openat
      >         } else {
      >                 const char *s = path_init(nd, flags);
      >                 while (!(error = link_path_walk(s, nd)) &&
      >                         (error = do_last(nd, file, op)) > 0) {  <<<<
      >
      > do_last:
      >         if (open_flag & O_CREAT)
      >                 inode_lock(dir->d_inode);  <<<<
      >         else
      > so it's trying to take inode->i_rwsem for the directory
      >
      >      DENTRY           INODE           SUPERBLK     TYPE PATH
      > ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR  /mnt/vm1_smb/
      > inode.i_rwsem is ffff8c691158efc0
      >
      > <struct rw_semaphore 0xffff8c691158efc0>:
      >         owner: <struct task_struct 0xffff8c6914275d00> (UN -   8856 -
      > reopen_file), counter: 0x0000000000000003
      >         waitlist: 2
      >         0xffff9965007e3c90     8863   reopen_file      UN 0  1:29:22.926
      >   RWSEM_WAITING_FOR_WRITE
      >         0xffff996500393e00     9802   ls               UN 0  1:17:26.700
      >   RWSEM_WAITING_FOR_READ
      >
      >
      > the owner of the inode.i_rwsem of the directory is:
      >
      > [0 00:00:00.109] [UN]  PID: 8856   TASK: ffff8c6914275d00  CPU: 3
      > COMMAND: "reopen_file"
      >  #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
      >  #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
      >  #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
      >  #3 [ffff99650065b940] msleep at ffffffff9af573a9
      >  #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
      >  #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
      >  #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
      >  #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
      >  #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
      >  #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
      > #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
      > #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
      > #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
      > #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
      > #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
      >
      > cifs_launder_page is for page 0xffffd1e2c07d2480
      >
      > crash> page.index,mapping,flags 0xffffd1e2c07d2480
      >       index = 0x8
      >       mapping = 0xffff8c68f3cd0db0
      >   flags = 0xfffffc0008095
      >
      >   PAGE-FLAG       BIT  VALUE
      >   PG_locked         0  0000001
      >   PG_uptodate       2  0000004
      >   PG_lru            4  0000010
      >   PG_waiters        7  0000080
      >   PG_writeback     15  0008000
      >
      >
      > inode is ffff8c68f3cd0c40
      > inode.i_rwsem is ffff8c68f3cd0ce0
      >      DENTRY           INODE           SUPERBLK     TYPE PATH
      > ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
      > /mnt/vm1_smb/testfile.8853
      >
      >
      > this process holds the inode->i_rwsem for the parent directory, is
      > laundering a page attached to the inode of the file it's opening, and in
      > _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
      > for the file itself.
      >
      >
      > <struct rw_semaphore 0xffff8c68f3cd0ce0>:
      >         owner: <struct task_struct 0xffff8c6914272e80> (UN -   8854 -
      > reopen_file), counter: 0x0000000000000003
      >         waitlist: 1
      >         0xffff9965005dfd80     8855   reopen_file      UN 0  1:29:22.912
      >   RWSEM_WAITING_FOR_WRITE
      >
      > this is the inode.i_rwsem for the file
      >
      > the owner:
      >
      > [0 00:48:22.739] [UN]  PID: 8854   TASK: ffff8c6914272e80  CPU: 2
      > COMMAND: "reopen_file"
      >  #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
      >  #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
      >  #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
      >  #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
      >  #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
      >  #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
      >  #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
      >  #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
      >  #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
      >  #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
      > #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
      > #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
      > #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
      > #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
      >
      > the process holds the inode->i_rwsem for the file to which it's writing,
      > and is trying to __lock_page for the same page as in the other processes
      >
      >
      > the other tasks:
      > [0 00:00:00.028] [UN]  PID: 8859   TASK: ffff8c6915479740  CPU: 2
      > COMMAND: "reopen_file"
      >  #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
      >  #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
      >  #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
      >  #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
      >  #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
      >  #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
      >  #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
      >  #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
      >  #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
      >  #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
      > #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
      >
      > this is opening the file, and is trying to down_write cinode->lock_sem
      >
      >
      > [0 00:00:00.041] [UN]  PID: 8860   TASK: ffff8c691547ae80  CPU: 2
      > COMMAND: "reopen_file"
      > [0 00:00:00.057] [UN]  PID: 8861   TASK: ffff8c6915478000  CPU: 3
      > COMMAND: "reopen_file"
      > [0 00:00:00.059] [UN]  PID: 8858   TASK: ffff8c6914271740  CPU: 2
      > COMMAND: "reopen_file"
      > [0 00:00:00.109] [UN]  PID: 8862   TASK: ffff8c691547dd00  CPU: 6
      > COMMAND: "reopen_file"
      >  #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
      >  #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
      >  #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
      >  #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
      >  #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
      >  #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
      >  #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
      >  #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
      >  #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
      >  #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
      >
      > closing the file, and trying to down_write cifsi->lock_sem
      >
      >
      > [0 00:48:22.839] [UN]  PID: 8857   TASK: ffff8c6914270000  CPU: 7
      > COMMAND: "reopen_file"
      >  #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
      >  #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
      >  #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
      >  #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
      >  #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
      >  #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
      >  #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
      >  #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
      >  #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
      >  #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
      >
      > in __filemap_fdatawait_range
      >                         wait_on_page_writeback(page);
      > for the same page of the file
      >
      >
      >
      > [0 00:48:22.718] [UN]  PID: 8855   TASK: ffff8c69142745c0  CPU: 7
      > COMMAND: "reopen_file"
      >  #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
      >  #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
      >  #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
      >  #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
      >  #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
      >  #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
      >  #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
      >  #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
      >
      >         inode_lock(inode);
      >
      >
      > and one 'ls' later on, to see whether the rest of the mount is available
      > (the test file is in the root, so we get blocked up on the directory
      > ->i_rwsem), so the entire mount is unavailable
      >
      > [0 00:36:26.473] [UN]  PID: 9802   TASK: ffff8c691436ae80  CPU: 4
      > COMMAND: "ls"
      >  #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
      >  #1 [ffff996500393db8] schedule at ffffffff9b6e64df
      >  #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
      >  #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
      >  #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
      >  #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
      >  #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
      >  #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
      >
      > in iterate_dir:
      >         if (shared)
      >                 res = down_read_killable(&inode->i_rwsem);  <<<<
      >         else
      >                 res = down_write_killable(&inode->i_rwsem);
      >
      Reported-by: NFrank Sorenson <sorenson@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      32546a95
    • P
      CIFS: Respect O_SYNC and O_DIRECT flags during reconnect · 44805b0e
      Pavel Shilovsky 提交于
      Currently the client translates O_SYNC and O_DIRECT flags
      into corresponding SMB create options when openning a file.
      The problem is that on reconnect when the file is being
      re-opened the client doesn't set those flags and it causes
      a server to reject re-open requests because create options
      don't match. The latter means that any subsequent system
      call against that open file fail until a share is re-mounted.
      
      Fix this by properly setting SMB create options when
      re-openning files after reconnects.
      
      Fixes: 1013e760: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      44805b0e
    • L
      cifs: smbd: Invalidate and deregister memory registration on re-send for direct I/O · b7a55bbd
      Long Li 提交于
      On re-send, there might be a reconnect and all prevoius memory registrations
      need to be invalidated and deregistered.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      b7a55bbd
    • Y
      CIFS: remove set but not used variables 'cinode' and 'netfid' · f28a2e5e
      YueHaibing 提交于
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      fs/cifs/file.c: In function 'cifs_flock':
      fs/cifs/file.c:1704:8: warning:
       variable 'netfid' set but not used [-Wunused-but-set-variable]
      
      fs/cifs/file.c:1702:24: warning:
       variable 'cinode' set but not used [-Wunused-but-set-variable]
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f28a2e5e