1. 13 12月, 2019 27 次提交
    • D
      iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) · 4c67dbea
      Darrick J. Wong 提交于
      [ Upstream commit 8f67b5adc030553fbc877124306f3f3bdab89aa8 ]
      
      In commit 4721a601099, we tried to fix a problem wherein directio reads
      into a splice pipe will bounce EFAULT/EAGAIN all the way out to
      userspace by simulating a zero-byte short read.  This happens because
      some directio read implementations (xfs) will call
      bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
      reads, but as soon as we run out of pipe buffers that _get_pages call
      returns EFAULT, which the splice code translates to EAGAIN and bounces
      out to userspace.
      
      In that commit, the iomap code catches the EFAULT and simulates a
      zero-byte read, but that causes assertion errors on regular splice reads
      because xfs doesn't allow short directio reads.  This causes infinite
      splice() loops and assertion failures on generic/095 on overlayfs
      because xfs only permit total success or total failure of a directio
      operation.  The underlying issue in the pipe splice code has now been
      fixed by changing the pipe splice loop to avoid avoid reading more data
      than there is space in the pipe.
      
      Therefore, it's no longer necessary to simulate the short directio, so
      remove the hack from iomap.
      
      Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
      Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Ranted-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4c67dbea
    • D
      splice: don't read more than available pipe space · 019b6325
      Darrick J. Wong 提交于
      [ Upstream commit 17614445576b6af24e9cf36607c6448164719c96 ]
      
      In commit 4721a601099, we tried to fix a problem wherein directio reads
      into a splice pipe will bounce EFAULT/EAGAIN all the way out to
      userspace by simulating a zero-byte short read.  This happens because
      some directio read implementations (xfs) will call
      bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
      reads, but as soon as we run out of pipe buffers that _get_pages call
      returns EFAULT, which the splice code translates to EAGAIN and bounces
      out to userspace.
      
      In that commit, the iomap code catches the EFAULT and simulates a
      zero-byte read, but that causes assertion errors on regular splice reads
      because xfs doesn't allow short directio reads.
      
      The brokenness is compounded by splice_direct_to_actor immediately
      bailing on do_splice_to returning <= 0 without ever calling ->actor
      (which empties out the pipe), so if userspace calls back we'll EFAULT
      again on the full pipe, and nothing ever gets copied.
      
      Therefore, teach splice_direct_to_actor to clamp its requests to the
      amount of free space in the pipe and remove the simulated short read
      from the iomap directio code.
      
      Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
      Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Ranted-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      019b6325
    • J
      iomap: Fix pipe page leakage during splicing · b59116ff
      Jan Kara 提交于
      commit 419e9c38aa075ed0cd3c13d47e15954b686bcdb6 upstream.
      
      When splicing using iomap_dio_rw() to a pipe, we may leak pipe pages
      because bio_iov_iter_get_pages() records that the pipe will have full
      extent worth of data however if file size is not block size aligned
      iomap_dio_rw() returns less than what bio_iov_iter_get_pages() set up
      and splice code gets confused leaking a pipe page with the file tail.
      
      Handle the situation similarly to the old direct IO implementation and
      revert iter to actually returned read amount which makes iter consistent
      with value returned from iomap_dio_rw() and thus the splice code is
      happy.
      
      Fixes: ff6a9292 ("iomap: implement direct I/O")
      CC: stable@vger.kernel.org
      Reported-by: syzbot+991400e8eba7e00a26e1@syzkaller.appspotmail.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b59116ff
    • T
      kernfs: fix ino wrap-around detection · 18493bac
      Tejun Heo 提交于
      commit e23f568aa63f64cd6b355094224cc9356c0f696b upstream.
      
      When the 32bit ino wraps around, kernfs increments the generation
      number to distinguish reused ino instances.  The wrap-around detection
      tests whether the allocated ino is lower than what the cursor but the
      cursor is pointing to the next ino to allocate so the condition never
      triggers.
      
      Fix it by remembering the last ino and comparing against that.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Fixes: 4a3ef68a ("kernfs: implement i_generation")
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18493bac
    • P
      CIFS: Fix SMB2 oplock break processing · d4785d88
      Pavel Shilovsky 提交于
      commit fa9c2362497fbd64788063288dc4e74daf977ebb upstream.
      
      Even when mounting modern protocol version the server may be
      configured without supporting SMB2.1 leases and the client
      uses SMB2 oplock to optimize IO performance through local caching.
      
      However there is a problem in oplock break handling that leads
      to missing a break notification on the client who has a file
      opened. It latter causes big latencies to other clients that
      are trying to open the same file.
      
      The problem reproduces when there are multiple shares from the
      same server mounted on the client. The processing code tries to
      match persistent and volatile file ids from the break notification
      with an open file but it skips all share besides the first one.
      Fix this by looking up in all shares belonging to the server that
      issued the oplock break.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4785d88
    • P
      CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks · df871e55
      Pavel Shilovsky 提交于
      commit 6f582b273ec23332074d970a7fb25bef835df71f upstream.
      
      Currently when the client creates a cifsFileInfo structure for
      a newly opened file, it allocates a list of byte-range locks
      with a pointer to the new cfile and attaches this list to the
      inode's lock list. The latter happens before initializing all
      other fields, e.g. cfile->tlink. Thus a partially initialized
      cifsFileInfo structure becomes available to other threads that
      walk through the inode's lock list. One example of such a thread
      may be an oplock break worker thread that tries to push all
      cached byte-range locks. This causes NULL-pointer dereference
      in smb2_push_mandatory_locks() when accessing cfile->tlink:
      
      [598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
      ...
      [598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
      [598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
      ...
      [598428.945834] Call Trace:
      [598428.945870]  ? cifs_revalidate_mapping+0x45/0x90 [cifs]
      [598428.945901]  cifs_oplock_break+0x13d/0x450 [cifs]
      [598428.945909]  process_one_work+0x1db/0x380
      [598428.945914]  worker_thread+0x4d/0x400
      [598428.945921]  kthread+0x104/0x140
      [598428.945925]  ? process_one_work+0x380/0x380
      [598428.945931]  ? kthread_park+0x80/0x80
      [598428.945937]  ret_from_fork+0x35/0x40
      
      Fix this by reordering initialization steps of the cifsFileInfo
      structure: initialize all the fields first and then add the new
      byte-range lock list to the inode's lock list.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df871e55
    • M
      fuse: verify attributes · 710c33ad
      Miklos Szeredi 提交于
      commit eb59bd17d2fa6e5e84fba61a5ebdea984222e6d5 upstream.
      
      If a filesystem returns negative inode sizes, future reads on the file were
      causing the cpu to spin on truncate_pagecache.
      
      Create a helper to validate the attributes.  This now does two things:
      
       - check the file mode
       - check if the file size fits in i_size without overflowing
      Reported-by: NArijit Banerjee <arijit@rubrik.com>
      Fixes: d8a5ba45 ("[PATCH] FUSE - core")
      Cc: <stable@vger.kernel.org> # v2.6.14
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      710c33ad
    • M
      fuse: verify nlink · 9f435a5e
      Miklos Szeredi 提交于
      commit c634da718db9b2fac201df2ae1b1b095344ce5eb upstream.
      
      When adding a new hard link, make sure that i_nlink doesn't overflow.
      
      Fixes: ac45d613 ("fuse: fix nlink after unlink")
      Cc: <stable@vger.kernel.org> # v3.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f435a5e
    • Z
      nfsd: Return EPERM, not EACCES, in some SETATTR cases · 1ff89e6d
      zhengbin 提交于
      [ Upstream commit 255fbca65137e25b12bced18ec9a014dc77ecda0 ]
      
      As the man(2) page for utime/utimes states, EPERM is returned when the
      second parameter of utime or utimes is not NULL, the caller's effective UID
      does not match the owner of the file, and the caller is not privileged.
      
      However, in a NFS directory mounted from knfsd, it will return EACCES
      (from nfsd_setattr-> fh_verify->nfsd_permission).  This patch fixes
      that.
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1ff89e6d
    • K
      pstore/ram: Avoid NULL deref in ftrace merging failure path · 6b6f6030
      Kees Cook 提交于
      [ Upstream commit 8665569e97dd52920713b95675409648986b5b0d ]
      
      Given corruption in the ftrace records, it might be possible to allocate
      tmp_prz without assigning prz to it, but still marking it as needing to
      be freed, which would cause at least a NULL dereference.
      
      smatch warnings:
      fs/pstore/ram.c:340 ramoops_pstore_read() error: we previously assumed 'prz' could be null (see line 255)
      
      https://lists.01.org/pipermail/kbuild-all/2018-December/055528.htmlReported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 2fbea82b ("pstore: Merge per-CPU ftrace records into one")
      Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6b6f6030
    • D
      dlm: fix invalid cluster name warning · 446a04d8
      David Teigland 提交于
      [ Upstream commit 3595c559326d0b660bb088a88e22e0ca630a0e35 ]
      
      The warning added in commit 3b0e761b
        "dlm: print log message when cluster name is not set"
      
      did not account for the fact that lockspaces created
      from userland do not supply a cluster name, so bogus
      warnings are printed every time a userland lockspace
      is created.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      446a04d8
    • S
      nfsd: fix a warning in __cld_pipe_upcall() · 3ff6af8e
      Scott Mayhew 提交于
      [ Upstream commit b493fd31c0b89d9453917e977002de58bebc3802 ]
      
      __cld_pipe_upcall() emits a "do not call blocking ops when
      !TASK_RUNNING" warning due to the dput() call in rpc_queue_upcall().
      Fix it by using a completion instead of hand coding the wait.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3ff6af8e
    • W
      dlm: NULL check before kmem_cache_destroy is not needed · 3b0107ca
      Wen Yang 提交于
      [ Upstream commit f31a89692830061bceba8469607e4e4b0f900159 ]
      
      kmem_cache_destroy(NULL) is safe, so removes NULL check before
      freeing the mem. This patch also fix ifnullfree.cocci warnings.
      Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3b0107ca
    • J
      lockd: fix decoding of TEST results · a2b7010f
      J. Bruce Fields 提交于
      [ Upstream commit b8db159239b3f51e2b909859935cc25cb3ff3eed ]
      
      We fail to advance the read pointer when reading the stat.oh field that
      identifies the lock-holder in a TEST result.
      
      This turns out not to matter if the server is knfsd, which always
      returns a zero-length field.  But other servers (Ganesha is an example)
      may not do this.  The result is bad values in fcntl F_GETLK results.
      
      Fix this.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a2b7010f
    • S
      f2fs: fix to allow node segment for GC by ioctl path · c8aa27cf
      Sahitya Tummala 提交于
      [ Upstream commit 08ac9a3870f6babb2b1fff46118536ca8a71ef19 ]
      
      Allow node type segments also to be GC'd via f2fs ioctl
      F2FS_IOC_GARBAGE_COLLECT_RANGE.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c8aa27cf
    • Y
      f2fs: change segment to section in f2fs_ioc_gc_range · 313f1fef
      Yunlong Song 提交于
      [ Upstream commit 67b0e42b768c9ddc3fd5ca1aee3db815cfaa635c ]
      
      f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves
      blocks of section each time, so fix it from segment to section.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      313f1fef
    • Y
      f2fs: fix count of seg_freed to make sec_freed correct · 859c93a0
      Yunlong Song 提交于
      [ Upstream commit d6c66cd19ef322fe0d51ba09ce1b7f386acab04a ]
      
      When sbi->segs_per_sec > 1, and if some segno has 0 valid blocks before
      gc starts, do_garbage_collect will skip counting seg_freed++, and this
      will cause seg_freed < sbi->segs_per_sec and finally skip sec_freed++.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      859c93a0
    • C
      f2fs: fix to account preflush command for noflush_merge mode · c1054aeb
      Chao Yu 提交于
      [ Upstream commit a8075dc484cf10ebdb07bee2b17322fb0a846309 ]
      
      Previously, we only account preflush command for flush_merge mode,
      so for noflush_merge mode, we can not know in-flight preflush
      command count, fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c1054aeb
    • D
      iomap: readpages doesn't zero page tail beyond EOF · c3a62b65
      Dave Chinner 提交于
      [ Upstream commit 8c110d43c6bca4b24dd13272a9d4e0ba6f2ec957 ]
      
      When we read the EOF page of the file via readpages, we need
      to zero the region beyond EOF that we either do not read or
      should not contain data so that mmap does not expose stale data to
      user applications.
      
      However, iomap_adjust_read_range() fails to detect EOF correctly,
      and so fsx on 1k block size filesystems fails very quickly with
      mapreads exposing data beyond EOF. There are two problems here.
      
      Firstly, when calculating the end block of the EOF byte, we have
      to round the size by one to avoid a block aligned EOF from reporting
      a block too large. i.e. a size of 1024 bytes is 1 block, which in
      index terms is block 0. Therefore we have to calculate the end block
      from (isize - 1), not isize.
      
      The second bug is determining if the current page spans EOF, and so
      whether we need split it into two half, one for the IO, and the
      other for zeroing. Unfortunately, the code that checks whether
      we should split the block doesn't actually check if we span EOF, it
      just checks if the read spans the /offset in the page/ that EOF
      sits on. So it splits every read into two if EOF is not page
      aligned, regardless of whether we are reading the EOF block or not.
      
      Hence we need to restrict the "does the read span EOF" check to
      just the page that spans EOF, not every page we read.
      
      This patch results in correct EOF detection through readpages:
      
      xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
      xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
      iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
      xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
      iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
      iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864
      
      As you can see, it now does full page reads until the last one which
      is split correctly at the block aligned EOF, reading 3072 bytes and
      zeroing the last 1024 bytes. The original version of the patch got
      this right, but it got another case wrong.
      
      The EOF detection crossing really needs to the the original length
      as plen, while it starts at the end of the block, will be shortened
      as up-to-date blocks are found on the page. This means "orig_pos +
      plen" no longer points to the end of the page, and so will not
      correctly detect EOF crossing. Hence we have to use the length
      passed in to detect this partial page case:
      
      xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
      xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
      xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
      iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
      xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
      iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296
      
      Heere we see a trace where the first block on the EOF page is up to
      date, hence poff = 1024 bytes. The offset into the page of EOF is
      3072, so the range we want to read is 1024 - 3071, and the range we
      want to zero is 3072 - 4095. You can see this is split correctly
      now.
      
      This fixes the stale data beyond EOF problem that fsx quickly
      uncovers on 1k block size filesystems.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c3a62b65
    • D
      iomap: dio data corruption and spurious errors when pipes fill · 807a5972
      Dave Chinner 提交于
      [ Upstream commit 4721a6010990971440b4ffefbdf014976b8eda2f ]
      
      When doing direct IO to a pipe for do_splice_direct(), then pipe is
      trivial to fill up and overflow as it can only hold 16 pages. At
      this point bio_iov_iter_get_pages() then returns -EFAULT, and we
      abort the IO submission process. Unfortunately, iomap_dio_rw()
      propagates the error back up the stack.
      
      The error is converted from the EFAULT to EAGAIN in
      generic_file_splice_read() to tell the splice layers that the pipe
      is full. do_splice_direct() completely fails to handle EAGAIN errors
      (it aborts on error) and returns EAGAIN to the caller.
      
      copy_file_write() then completely fails to handle EAGAIN as well,
      and so returns EAGAIN to userspace, having failed to copy the data
      it was asked to.
      
      Avoid this whole steaming pile of fail by having iomap_dio_rw()
      silently swallow EFAULT errors and so do short reads.
      
      To make matters worse, iomap_dio_actor() has a stale data exposure
      bug bio_iov_iter_get_pages() fails - it does not zero the tail block
      that it may have been left uncovered by partial IO. Fix the error
      handling case to drop to the sub-block zeroing rather than
      immmediately returning the -EFAULT error.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      807a5972
    • D
      iomap: sub-block dio needs to zeroout beyond EOF · b61fdcdb
      Dave Chinner 提交于
      [ Upstream commit b450672fb66b4a991a5b55ee24209ac7ae7690ce ]
      
      If we are doing sub-block dio that extends EOF, we need to zero
      the unused tail of the block to initialise the data in it it. If we
      do not zero the tail of the block, then an immediate mmap read of
      the EOF block will expose stale data beyond EOF to userspace. Found
      with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations.
      
      Fix this by detecting if the end of the DIO write is beyond EOF
      and zeroing the tail if necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b61fdcdb
    • D
      iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents · ac3ec5a4
      Dave Chinner 提交于
      [ Upstream commit 0929d8580071c6a1cec1a7916a8f674c243ceee1 ]
      
      When we write into an unwritten extent via direct IO, we dirty
      metadata on IO completion to convert the unwritten extent to
      written. However, when we do the FUA optimisation checks, the inode
      may be clean and so we issue a FUA write into the unwritten extent.
      This means we then bypass the generic_write_sync() call after
      unwritten extent conversion has ben done and we don't force the
      modified metadata to stable storage.
      
      This violates O_DSYNC semantics. The window of exposure is a single
      IO, as the next DIO write will see the inode has dirty metadata and
      hence will not use the FUA optimisation. Calling
      generic_write_sync() after completion of the second IO will also
      sync the first write and it's metadata.
      
      Fix this by avoiding the FUA optimisation when writing to unwritten
      extents.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ac3ec5a4
    • D
      xfs: extent shifting doesn't fully invalidate page cache · 249279c6
      Dave Chinner 提交于
      [ Upstream commit 7f9f71be84bcab368e58020a42f6d0dd97adf0ce ]
      
      The extent shifting code uses a flush and invalidate mechainsm prior
      to shifting extents around. This is similar to what
      xfs_free_file_space() does, but it doesn't take into account things
      like page cache vs block size differences, and it will fail if there
      is a page that it currently busy.
      
      xfs_flush_unmap_range() handles all of these cases, so just convert
      xfs_prepare_shift() to us that mechanism rather than having it's own
      special sauce.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      249279c6
    • D
      dlm: fix missing idr_destroy for recover_idr · 1bcac298
      David Teigland 提交于
      [ Upstream commit 8fc6ed9a3508a0435b9270c313600799d210d319 ]
      
      Which would leak memory for the idr internals.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1bcac298
    • D
      dlm: fix possible call to kfree() for non-initialized pointer · ed224150
      Denis V. Lunev 提交于
      [ Upstream commit 58a923adf4d9aca8bf7205985c9c8fc531c65d72 ]
      
      Technically dlm_config_nodes() could return error and keep nodes
      uninitialized. After that on the fail path of we'll call kfree()
      for that uninitialized value.
      
      The patch is simple - we should just initialize nodes with NULL.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ed224150
    • A
      exportfs_decode_fh(): negative pinned may become positive without the parent locked · af17e1fc
      Al Viro 提交于
      [ Upstream commit a2ece088882666e1dc7113744ac912eb161e3f87 ]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      af17e1fc
    • A
      autofs: fix a leak in autofs_expire_indirect() · d4bc855a
      Al Viro 提交于
      [ Upstream commit 03ad0d703df75c43f78bd72e16124b5b94a95188 ]
      
      if the second call of should_expire() in there ends up
      grabbing and returning a new reference to dentry, we need
      to drop it before continuing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d4bc855a
  2. 05 12月, 2019 13 次提交