1. 24 3月, 2019 1 次提交
  2. 23 3月, 2019 11 次提交
    • Z
      ext4: cleanup bh release code in ext4_ind_remove_space() · 5e86bdda
      zhangyi (F) 提交于
      Currently, we are releasing the indirect buffer where we are done with
      it in ext4_ind_remove_space(), so we can see the brelse() and
      BUFFER_TRACE() everywhere.  It seems fragile and hard to read, and we
      may probably forget to release the buffer some day.  This patch cleans
      up the code by putting of the code which releases the buffers to the
      end of the function.
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5e86bdda
    • Z
      ext4: brelse all indirect buffer in ext4_ind_remove_space() · 674a2b27
      zhangyi (F) 提交于
      All indirect buffers get by ext4_find_shared() should be released no
      mater the branch should be freed or not. But now, we forget to release
      the lower depth indirect buffers when removing space from the same
      higher depth indirect block. It will lead to buffer leak and futher
      more, it may lead to quota information corruption when using old quota,
      consider the following case.
      
       - Create and mount an empty ext4 filesystem without extent and quota
         features,
       - quotacheck and enable the user & group quota,
       - Create some files and write some data to them, and then punch hole
         to some files of them, it may trigger the buffer leak problem
         mentioned above.
       - Disable quota and run quotacheck again, it will create two new
         aquota files and write the checked quota information to them, which
         probably may reuse the freed indirect block(the buffer and page
         cache was not freed) as data block.
       - Enable quota again, it will invoke
         vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
         buffers and pagecache. Unfortunately, because of the buffer of quota
         data block is still referenced, quota code cannot read the up to date
         quota info from the device and lead to quota information corruption.
      
      This problem can be reproduced by xfstests generic/231 on ext3 file
      system or ext4 file system without extent and quota features.
      
      This patch fix this problem by releasing the missing indirect buffers,
      in ext4_ind_remove_space().
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      674a2b27
    • K
      x86/gart: Exclude GART aperture from kcore · ffc8599a
      Kairui Song 提交于
      On machines where the GART aperture is mapped over physical RAM,
      /proc/kcore contains the GART aperture range. Accessing the GART range via
      /proc/kcore results in a kernel crash.
      
      vmcore used to have the same issue, until it was fixed with commit
      2a3e83c6 ("x86/gart: Exclude GART aperture from vmcore")', leveraging
      existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
      when attempting to read the aperture region, and so it won't read from the
      actual memory.
      
      Apply the same workaround for kcore. First implement the same hook
      infrastructure for kcore, then reuse the hook functions introduced in the
      previous vmcore fix. Just with some minor adjustment, rename some functions
      for more general usage, and simplify the hook infrastructure a bit as there
      is no module usage yet.
      Suggested-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJiri Bohac <jbohac@suse.cz>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Dave Young <dyoung@redhat.com>
      Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
      
      ffc8599a
    • S
      cifs: update internal module version number · cf7d624f
      Steve French 提交于
      To 2.19
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      cf7d624f
    • S
      SMB3: Fix SMB3.1.1 guest mounts to Samba · 8c11a607
      Steve French 提交于
      Workaround problem with Samba responses to SMB3.1.1
      null user (guest) mounts.  The server doesn't set the
      expected flag in the session setup response so we have
      to do a similar check to what is done in smb3_validate_negotiate
      where we also check if the user is a null user (but not sec=krb5
      since username might not be passed in on mount for Kerberos case).
      
      Note that the commit below tightened the conditions and forced signing
      for the SMB2-TreeConnect commands as per MS-SMB2.
      However, this should only apply to normal user sessions and not for
      cases where there is no user (even if server forgets to set the flag
      in the response) since we don't have anything useful to sign with.
      This is especially important now that the more secure SMB3.1.1 protocol
      is in the default dialect list.
      
      An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed
      the guest mounts to Windows.
      
          Fixes: 6188f28b ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPaulo Alcantara <palcantara@suse.de>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      8c11a607
    • P
      cifs: Fix slab-out-of-bounds when tracing SMB tcon · 68ddb496
      Paulo Alcantara (SUSE) 提交于
      This patch fixes the following KASAN report:
      
      [  779.044746] BUG: KASAN: slab-out-of-bounds in string+0xab/0x180
      [  779.044750] Read of size 1 at addr ffff88814f327968 by task trace-cmd/2812
      
      [  779.044756] CPU: 1 PID: 2812 Comm: trace-cmd Not tainted 5.1.0-rc1+ #62
      [  779.044760] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
      [  779.044761] Call Trace:
      [  779.044769]  dump_stack+0x5b/0x90
      [  779.044775]  ? string+0xab/0x180
      [  779.044781]  print_address_description+0x6c/0x23c
      [  779.044787]  ? string+0xab/0x180
      [  779.044792]  ? string+0xab/0x180
      [  779.044797]  kasan_report.cold.3+0x1a/0x32
      [  779.044803]  ? string+0xab/0x180
      [  779.044809]  string+0xab/0x180
      [  779.044816]  ? widen_string+0x160/0x160
      [  779.044822]  ? vsnprintf+0x5bf/0x7f0
      [  779.044829]  vsnprintf+0x4e7/0x7f0
      [  779.044836]  ? pointer+0x4a0/0x4a0
      [  779.044841]  ? seq_buf_vprintf+0x79/0xc0
      [  779.044848]  seq_buf_vprintf+0x62/0xc0
      [  779.044855]  trace_seq_printf+0x113/0x210
      [  779.044861]  ? trace_seq_puts+0x110/0x110
      [  779.044867]  ? trace_raw_output_prep+0xd8/0x110
      [  779.044876]  trace_raw_output_smb3_tcon_class+0x9f/0xc0
      [  779.044882]  print_trace_line+0x377/0x890
      [  779.044888]  ? tracing_buffers_read+0x300/0x300
      [  779.044893]  ? ring_buffer_read+0x58/0x70
      [  779.044899]  s_show+0x6e/0x140
      [  779.044906]  seq_read+0x505/0x6a0
      [  779.044913]  vfs_read+0xaf/0x1b0
      [  779.044919]  ksys_read+0xa1/0x130
      [  779.044925]  ? kernel_write+0xa0/0xa0
      [  779.044931]  ? __do_page_fault+0x3d5/0x620
      [  779.044938]  do_syscall_64+0x63/0x150
      [  779.044944]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  779.044949] RIP: 0033:0x7f62c2c2db31
      [ 779.044955] Code: fe ff ff 48 8d 3d 17 9e 09 00 48 83 ec 08 e8 96 02
      02 00 66 0f 1f 44 00 00 8b 05 fa fc 2c 00 48 63 ff 85 c0 75 13 31 c0
      0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48
      89
      [  779.044958] RSP: 002b:00007ffd6e116678 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  779.044964] RAX: ffffffffffffffda RBX: 0000560a38be9260 RCX: 00007f62c2c2db31
      [  779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003
      [  779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003
      [  779.044969] RBP: 00007f62c2ef5420 R08: 0000000000000000 R09: 0000000000000003
      [  779.044972] R10: ffffffffffffffa8 R11: 0000000000000246 R12: 00007ffd6e116710
      [  779.044975] R13: 0000000000002000 R14: 0000000000000d68 R15: 0000000000002000
      
      [  779.044981] Allocated by task 1257:
      [  779.044987]  __kasan_kmalloc.constprop.5+0xc1/0xd0
      [  779.044992]  kmem_cache_alloc+0xad/0x1a0
      [  779.044997]  getname_flags+0x6c/0x2a0
      [  779.045003]  user_path_at_empty+0x1d/0x40
      [  779.045008]  do_faccessat+0x12a/0x330
      [  779.045012]  do_syscall_64+0x63/0x150
      [  779.045017]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  779.045019] Freed by task 1257:
      [  779.045023]  __kasan_slab_free+0x12e/0x180
      [  779.045029]  kmem_cache_free+0x85/0x1b0
      [  779.045034]  filename_lookup.part.70+0x176/0x250
      [  779.045039]  do_faccessat+0x12a/0x330
      [  779.045043]  do_syscall_64+0x63/0x150
      [  779.045048]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  779.045052] The buggy address belongs to the object at ffff88814f326600
      which belongs to the cache names_cache of size 4096
      [  779.045057] The buggy address is located 872 bytes to the right of
      4096-byte region [ffff88814f326600, ffff88814f327600)
      [  779.045058] The buggy address belongs to the page:
      [  779.045062] page:ffffea00053cc800 count:1 mapcount:0 mapping:ffff88815b191b40 index:0x0 compound_mapcount: 0
      [  779.045067] flags: 0x200000000010200(slab|head)
      [  779.045075] raw: 0200000000010200 dead000000000100 dead000000000200 ffff88815b191b40
      [  779.045081] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  779.045083] page dumped because: kasan: bad access detected
      
      [  779.045085] Memory state around the buggy address:
      [  779.045089]  ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045093]  ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045099]                                                           ^
      [  779.045103]  ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045107]  ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045109] ==================================================================
      [  779.045110] Disabling lock debugging due to kernel taint
      
      Correctly assign tree name str for smb3_tcon event.
      Signed-off-by: NPaulo Alcantara (SUSE) <paulo@paulo.ac>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      68ddb496
    • R
      cifs: allow guest mounts to work for smb3.11 · e71ab2aa
      Ronnie Sahlberg 提交于
      Fix Guest/Anonymous sessions so that they work with SMB 3.11.
      
      The commit noted below tightened the conditions and forced signing for
      the SMB2-TreeConnect commands as per MS-SMB2.
      However, this should only apply to normal user sessions and not for
      Guest/Anonumous sessions.
      
      Fixes: 6188f28b ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      e71ab2aa
    • S
      fix incorrect error code mapping for OBJECTID_NOT_FOUND · 85f9987b
      Steve French 提交于
      It was mapped to EIO which can be confusing when user space
      queries for an object GUID for an object for which the server
      file system doesn't support (or hasn't saved one).
      
      As Amir Goldstein suggested this is similar to ENOATTR
      (equivalently ENODATA in Linux errno definitions) so
      changing NT STATUS code mapping for OBJECTID_NOT_FOUND
      to ENODATA.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Amir Goldstein <amir73il@gmail.com>
      85f9987b
    • X
      cifs: fix that return -EINVAL when do dedupe operation · b073a080
      Xiaoli Feng 提交于
      dedupe_file_range operations is combiled into remap_file_range.
      But it's always skipped for dedupe operations in function
      cifs_remap_file_range.
      
      Example to test:
      Before this patch:
        # dd if=/dev/zero of=cifs/file bs=1M count=1
        # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
        XFS_IOC_FILE_EXTENT_SAME: Invalid argument
      
      After this patch:
        # dd if=/dev/zero of=cifs/file bs=1M count=1
        # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
        XFS_IOC_FILE_EXTENT_SAME: Operation not supported
      
      Influence for xfstests:
      generic/091
      generic/112
      generic/127
      generic/263
      These tests report this error "do_copy_range:: Invalid
      argument" instead of "FIDEDUPERANGE: Invalid argument".
      Because there are still two bugs cause these test failed.
      https://bugzilla.kernel.org/show_bug.cgi?id=202935
      https://bugzilla.kernel.org/show_bug.cgi?id=202785Signed-off-by: NXiaoli Feng <fengxiaoli0714@gmail.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      b073a080
    • L
      CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN · 0b0dfd59
      Long Li 提交于
      When sending a rdata, transport may return -EAGAIN. In this case
      we should re-obtain credits because the session may have been
      reconnected.
      
      Change in v2: adjust_credits before re-sending
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      0b0dfd59
    • L
      CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN · d53e292f
      Long Li 提交于
      When sending a wdata, transport may return -EAGAIN. In this case
      we should re-obtain credits because the session may have been
      reconnected.
      
      Change in v2: adjust_credits before re-sending
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      d53e292f
  3. 19 3月, 2019 4 次提交
    • J
      fanotify: Allow copying of file handle to userspace · b2d22b6b
      Jan Kara 提交于
      When file handle is embedded inside fanotify_event and usercopy checks
      are enabled, we get a warning like:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected
      from SLAB object 'fanotify_event' (offset 40, size 8)!
      WARNING: CPU: 1 PID: 7649 at mm/usercopy.c:78 usercopy_warn+0xeb/0x110
      mm/usercopy.c:78
      
      Annotate handling in fanotify_event properly to mark copying it to
      userspace is fine.
      
      Reported-by: syzbot+2c49971e251e36216d1f@syzkaller.appspotmail.com
      Fixes: a8b13aa2 ("fanotify: enable FAN_REPORT_FID init flag")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b2d22b6b
    • J
      block: add BIO_NO_PAGE_REF flag · 399254aa
      Jens Axboe 提交于
      If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
      with NO_REF, then we don't need to add a page reference for the pages
      that we add.
      
      Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
      not to drop a reference to these pages.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      399254aa
    • J
      iov_iter: add ITER_BVEC_FLAG_NO_REF flag · 875f1d07
      Jens Axboe 提交于
      For ITER_BVEC, if we're holding on to kernel pages, the caller
      doesn't need to grab a reference to the bvec pages, and drop that
      same reference on IO completion. This is essentially safe for any
      ITER_BVEC, but some use cases end up reusing pages and uncondtionally
      dropping a page reference on completion. And example of that is
      sendfile(2), that ends up being a splice_in + splice_out on the
      pipe pages.
      
      Add a flag that tells us it's fine to not grab a page reference
      to the bvec pages, since that caller knows not to drop a reference
      when it's done with the pages.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      875f1d07
    • J
      io_uring: retry bulk slab allocs as single allocs · fd6fab2c
      Jens Axboe 提交于
      I've seen cases where bulk alloc fails, since the bulk alloc API
      is all-or-nothing - either we get the number we ask for, or it
      returns 0 as number of entries.
      
      If we fail a batch bulk alloc, retry a "normal" kmem_cache_alloc()
      and just use that instead of failing with -EAGAIN.
      
      While in there, ensure we use GFP_KERNEL. That was an oversight in
      the original code, when we switched away from GFP_ATOMIC.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd6fab2c
  4. 18 3月, 2019 2 次提交
  5. 16 3月, 2019 3 次提交
    • A
      fix sysfs_init_fs_context() in !CONFIG_NET_NS case · ab81dabd
      Al Viro 提交于
      Permission checks on current's netns should be done only when
      netns are enabled.
      Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Fixes: 23bf1b6bSigned-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ab81dabd
    • J
      io_uring: fix poll races · 8c838788
      Jens Axboe 提交于
      This is a straight port of Al's fix for the aio poll implementation,
      since the io_uring version is heavily based on that. The below
      description is almost straight from that patch, just modified to
      fit the io_uring situation.
      
      io_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for io poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's io_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      Fixes: 221c5eb2 ("io_uring: add support for IORING_OP_POLL")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8c838788
    • J
      io_uring: fix fget/fput handling · 09bb8394
      Jens Axboe 提交于
      This isn't a straight port of commit 84c4e1f8 for aio.c, since
      io_uring doesn't use files in exactly the same way. But it's pretty
      close. See the commit message for that commit.
      
      This essentially fixes a use-after-free with the poll command
      handling, but it takes cue from Linus's approach to just simplifying
      the file handling. We move the setup of the file into a higher level
      location, so the individual commands don't have to deal with it. And
      then we release the reference when we free the associated io_kiocb.
      
      Fixes: 221c5eb2 ("io_uring: add support for IORING_OP_POLL")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09bb8394
  6. 15 3月, 2019 19 次提交
    • J
      io_uring: add prepped flag · d530a402
      Jens Axboe 提交于
      We currently use the fact that if ->ki_filp is already set, then we've
      done the prep. In preparation for moving the file assignment earlier,
      use a separate flag to tell whether the request has been prepped for
      IO or not.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d530a402
    • J
      io_uring: make io_read/write return an integer · e0c5c576
      Jens Axboe 提交于
      The callers all convert to an integer, and we only return 0/-ERROR
      anyway.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e0c5c576
    • L
      ext4: report real fs size after failed resize · 6c732840
      Lukas Czerner 提交于
      Currently when the file system resize using ext4_resize_fs() fails it
      will report into log that "resized filesystem to <requested block
      count>".  However this may not be true in the case of failure.  Use the
      current block count as returned by ext4_blocks_count() to report the
      block count.
      
      Additionally, report a warning that "error occurred during file system
      resize"
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      6c732840
    • L
      ext4: add missing brelse() in add_new_gdb_meta_bg() · d64264d6
      Lukas Czerner 提交于
      Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh
      in case ext4_journal_get_write_access() fails.
      Additionally kvfree() is missing in the same error path. Fix it by
      moving the ext4_journal_get_write_access() before the ext4 sb update as
      Ted suggested and release n_group_desc and gdb_bh in case it fails.
      
      Fixes: 61a9c11e ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path")
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d64264d6
    • J
      io_uring: use regular request ref counts · e65ef56d
      Jens Axboe 提交于
      Get rid of the special casing of "normal" requests not having
      any references to the io_kiocb. We initialize the ref count to 2,
      one for the submission side, and one or the completion side.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e65ef56d
    • J
      ext4: remove useless ext4_pin_inode() · 7cf77140
      Jason Yan 提交于
      This function is never used from the beginning (and is commented out);
      let's remove it.
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7cf77140
    • J
      ext4: avoid panic during forced reboot · 1dc1097f
      Jan Kara 提交于
      When admin calls "reboot -f" - i.e., does a hard system reboot by
      directly calling reboot(2) - ext4 filesystem mounted with errors=panic
      can panic the system. This happens because the underlying device gets
      disabled without unmounting the filesystem and thus some syscall running
      in parallel to reboot(2) can result in the filesystem getting IO errors.
      
      This is somewhat surprising to the users so try improve the behavior by
      switching to errors=remount-ro behavior when the system is running
      reboot(2).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1dc1097f
    • L
      ext4: fix data corruption caused by unaligned direct AIO · 372a03e0
      Lukas Czerner 提交于
      Ext4 needs to serialize unaligned direct AIO because the zeroing of
      partial blocks of two competing unaligned AIOs can result in data
      corruption.
      
      However it decides not to serialize if the potentially unaligned aio is
      past i_size with the rationale that no pending writes are possible past
      i_size. Unfortunately if the i_size is not block aligned and the second
      unaligned write lands past i_size, but still into the same block, it has
      the potential of corrupting the previous unaligned write to the same
      block.
      
      This is (very simplified) reproducer from Frank
      
          // 41472 = (10 * 4096) + 512
          // 37376 = 41472 - 4096
      
          ftruncate(fd, 41472);
          io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
          io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
      
          io_submit(io_ctx, 1, &iocbs[1]);
          io_submit(io_ctx, 1, &iocbs[2]);
      
          io_getevents(io_ctx, 2, 2, events, NULL);
      
      Without this patch the 512B range from 40960 up to the start of the
      second unaligned write (41472) is going to be zeroed overwriting the data
      written by the first write. This is a data corruption.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      
      With this patch the data corruption is avoided because we will recognize
      the unaligned_aio and wait for the unwritten extent conversion.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      *
      0000b200
      Reported-by: NFrank Sorenson <fsorenso@redhat.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Fixes: e9e3bcec ("ext4: serialize unaligned asynchronous DIO")
      Cc: stable@vger.kernel.org
      372a03e0
    • J
      ext4: fix NULL pointer dereference while journal is aborted · fa30dde3
      Jiufei Xue 提交于
      We see the following NULL pointer dereference while running xfstests
      generic/475:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
      RIP: 0010:ext4_do_update_inode+0x4ec/0x760
      ...
      Call Trace:
      ? jbd2_journal_get_write_access+0x42/0x50
      ? __ext4_journal_get_write_access+0x2c/0x70
      ? ext4_truncate+0x186/0x3f0
      ext4_mark_iloc_dirty+0x61/0x80
      ext4_mark_inode_dirty+0x62/0x1b0
      ext4_truncate+0x186/0x3f0
      ? unmap_mapping_pages+0x56/0x100
      ext4_setattr+0x817/0x8b0
      notify_change+0x1df/0x430
      do_truncate+0x5e/0x90
      ? generic_permission+0x12b/0x1a0
      
      This is triggered because the NULL pointer handle->h_transaction was
      dereferenced in function ext4_update_inode_fsync_trans().
      I found that the h_transaction was set to NULL in jbd2__journal_restart
      but failed to attached to a new transaction while the journal is aborted.
      
      Fix this by checking the handle before updating the inode.
      
      Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: stable@kernel.org
      fa30dde3
    • A
      CIFS: fix POSIX lock leak and invalid ptr deref · bc31d0cd
      Aurelien Aptel 提交于
      We have a customer reporting crashes in lock_get_status() with many
      "Leaked POSIX lock" messages preceeding the crash.
      
       Leaked POSIX lock on dev=0x0:0x56 ...
       Leaked POSIX lock on dev=0x0:0x56 ...
       Leaked POSIX lock on dev=0x0:0x56 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709
       Leaked POSIX lock on dev=0x0:0x4b ino...
       Leaked locks on dev=0x0:0x4b ino=0xf911400000029:
       POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592
       stack segment: 0000 [#1] SMP
       Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
      
       Supported: Yes
       CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1
       Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
       task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000
       RIP: 0010:[<ffffffff8125dcab>]  [<ffffffff8125dcab>] lock_get_status+0x9b/0x3b0
       RSP: 0018:ffff88c74005fd90  EFLAGS: 00010202
       RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50
       RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae
       RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff
       R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340
       R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340
       FS:  00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18
        ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340
        ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28
       Call Trace:
        [<ffffffff8125e08f>] locks_show+0x2f/0x70
        [<ffffffff81230ad1>] seq_read+0x251/0x3a0
        [<ffffffff81275bbc>] proc_reg_read+0x3c/0x70
        [<ffffffff8120e456>] __vfs_read+0x26/0x140
        [<ffffffff8120e9da>] vfs_read+0x7a/0x120
        [<ffffffff8120faf2>] SyS_read+0x42/0xa0
        [<ffffffff8161cbc3>] entry_SYSCALL_64_fastpath+0x1e/0xb7
      
      When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls
      filp_close() which also removes all posix locks.
      
      The lock struct is initialized like so in filp_close() and passed
      down to cifs
      
      	...
              lock.fl_type = F_UNLCK;
              lock.fl_flags = FL_POSIX | FL_CLOSE;
              lock.fl_start = 0;
              lock.fl_end = OFFSET_MAX;
      	...
      
      Note the FL_CLOSE flag, which hints the VFS code that this unlocking
      is done for closing the fd.
      
      filp_close()
        locks_remove_posix(filp, id);
          vfs_lock_file(filp, F_SETLK, &lock, NULL);
            return filp->f_op->lock(filp, cmd, fl) => cifs_lock()
              rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid);
                rc = server->ops->mand_unlock_range(cfile, flock, xid);
                if (flock->fl_flags & FL_POSIX && !rc)
                        rc = locks_lock_file_wait(file, flock)
      
      Notice how we don't call locks_lock_file_wait() which does the
      generic VFS lock/unlock/wait work on the inode if rc != 0.
      
      If we are closing the handle, the SMB server is supposed to remove any
      locks associated with it. Similarly, cifs.ko frees and wakes up any
      lock and lock waiter when closing the file:
      
      cifs_close()
        cifsFileInfo_put(file->private_data)
      	/*
      	 * Delete any outstanding lock records. We'll lose them when the file
      	 * is closed anyway.
      	 */
      	down_write(&cifsi->lock_sem);
      	list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
      		list_del(&li->llist);
      		cifs_del_lock_waiters(li);
      		kfree(li);
      	}
      	list_del(&cifs_file->llist->llist);
      	kfree(cifs_file->llist);
      	up_write(&cifsi->lock_sem);
      
      So we can safely ignore unlocking failures in cifs_lock() if they
      happen with the FL_CLOSE flag hint set as both the server and the
      client take care of it during the actual closing.
      
      This is not a proper fix for the unlocking failure but it's safe and
      it seems to prevent the lock leakages and crashes the customer
      experiences.
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NNeilBrown <neil@brown.name>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      bc31d0cd
    • R
      SMB3: Allow SMB3 FSCTL queries to be sent to server from tools · f5778c39
      Ronnie Sahlberg 提交于
      For debugging purposes we often have to be able to query
      additional information only available via SMB3 FSCTL
      from the server from user space tools (e.g. like
      cifs-utils's smbinfo).  See MS-FSCC and MS-SMB2 protocol
      specifications for more details.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f5778c39
    • R
      cifs: fix incorrect handling of smb2_set_sparse() return in smb3_simple_falloc · f1699479
      Ronnie Sahlberg 提交于
      smb2_set_sparse does not return -errno, it returns a boolean where
      true means success.
      Change this to just ignore the return value just like the other callsites.
      
      Additionally add code to handle the case where we must set the file sparse
      and possibly also extending it.
      
      Fixes xfstests: generic/236 generic/350 generic/420
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f1699479
    • S
      smb2: fix typo in definition of a few error flags · dd0ac2d2
      Steve French 提交于
      As Sergey Senozhatsky pointed out __constant_cpu_to_le32()
      is misspelled in a few definitions in the list of status
      codes smb2status.h as __constanst_cpu_to_le32()
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      dd0ac2d2
    • A
      CIFS: make mknod() an smb_version_op · c847dccf
      Aurelien Aptel 提交于
      This cleanup removes cifs specific code from SMB2/SMB3 code paths
      which is cleaner and easier to maintain as the code to handle
      special files is improved.  Below is an example creating special files
      using 'sfu' mount option over SMB3 to Windows (with this patch)
      (Note that to Samba server, support for saving dos attributes
      has to be enabled for the SFU mount option to work).
      
      In the future this will also make implementation of creating
      special files as reparse points easier (as Windows NFS server does
      for example).
      
         root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/char
         character special file
      
         root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/block
         block special file
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      c847dccf
    • S
      cifs: minor documentation updates · 65525802
      Steve French 提交于
      Also updated a comment describing use of the GlobalMid_Lock
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      65525802
    • S
      cifs: remove unused value pointed out by Coverity · d44d1372
      Steve French 提交于
      Detected by CoverityScan CID#1438719 ("Unused Value")
      
      buf is reset again before being used so these two lines of code
      are useless.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      d44d1372
    • S
      SMB3: passthru query info doesn't check for SMB3 FSCTL passthru · 31ba4331
      Steve French 提交于
      The passthrough queries from user space tools like smbinfo can be either
      SMB3 QUERY_INFO or SMB3 FSCTL, but we are not checking for the latter.
      Temporarily we return EOPNOTSUPP for SMB3 FSCTL passthrough requests
      but once compounding fsctls is fixed can enable.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      31ba4331
    • S
      smb3: add dynamic tracepoints for simple fallocate and zero range · 779ede04
      Steve French 提交于
      Can be helpful in debugging various xfstests that are currently
      skipped or failing due to missing features in our current
      implementation of fallocate.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      779ede04
    • R
      cifs: fix smb3_zero_range so it can expand the file-size when required · 72c419d9
      Ronnie Sahlberg 提交于
      This allows fallocate -z to work against a Windows2016 share.
      
      This is due to the SMB3 ZERO_RANGE command does not modify the filesize.
      To address this we will now append a compounded SET-INFO to update the
      end-of-file information.
      
      This brings xfstests generic/469 closer to working against a windows share.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      72c419d9