1. 18 1月, 2020 6 次提交
    • E
      f2fs: remove unneeded check for error allocating bio_post_read_ctx · e8ce5749
      Eric Biggers 提交于
      Since allocating an object from a mempool never fails when
      __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check
      for failure to allocate a bio_post_read_ctx is unnecessary.  Remove it.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e8ce5749
    • J
      f2fs: convert inline_dir early before starting rename · b06af2af
      Jaegeuk Kim 提交于
      If we hit an error during rename, we'll get two dentries in different
      directories.
      
      Chao adds to check the room in inline_dir which can avoid needless
      inversion. This should be done by inode_lock(&old_dir).
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b06af2af
    • C
      f2fs: fix memleak of kobject · fe396ad8
      Chao Yu 提交于
      If kobject_init_and_add() failed, caller needs to invoke kobject_put()
      to release kobject explicitly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe396ad8
    • C
      f2fs: fix to add swap extent correctly · 3e5e479a
      Chao Yu 提交于
      As Youling reported in mailing list:
      
      https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/
      
      https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/
      
      There is a test case can corrupt f2fs image:
      - dd if=/dev/zero of=/swapfile bs=1M count=4096
      - chmod 600 /swapfile
      - mkswap /swapfile
      - swapon --discard /swapfile
      
      The root cause is f2fs_swap_activate() intends to return zero value
      to setup_swap_extents() to enable SWP_FS mode (swap file goes through
      fs), in this flow, setup_swap_extents() setups swap extent with wrong
      block address range, result in discard_swap() erasing incorrect address.
      
      Because f2fs_swap_activate() has pinned swapfile, its data block
      address will not change, it's safe to let swap to handle IO through
      raw device, so we can get rid of SWAP_FS mode and initial swap extents
      inside f2fs_swap_activate(), by this way, later discard_swap() can trim
      in right address range.
      
      Fixes: 4969c06a ("f2fs: support swap file w/ DIO")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3e5e479a
    • J
      f2fs: run fsck when getting bad inode during GC · 4eea93e3
      Jaegeuk Kim 提交于
      This is to avoid inifinite GC when trying to disable checkpoint.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4eea93e3
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  2. 16 1月, 2020 9 次提交
    • J
      f2fs: free sysfs kobject · 820d3667
      Jaegeuk Kim 提交于
      Detected kmemleak.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      820d3667
    • J
      f2fs: declare nested quota_sem and remove unnecessary sems · 2c4e0c52
      Jaegeuk Kim 提交于
      1.
      f2fs_quota_sync
       -> down_read(&sbi->quota_sem)
       -> dquot_writeback_dquots
        -> f2fs_dquot_commit
         -> down_read(&sbi->quota_sem)
      
      2.
      f2fs_quota_sync
       -> down_read(&sbi->quota_sem)
        -> f2fs_write_data_pages
         -> f2fs_write_single_data_page
          -> down_write(&F2FS_I(inode)->i_sem)
      
      f2fs_mkdir
       -> f2fs_do_add_link
         -> down_write(&F2FS_I(inode)->i_sem)
         -> f2fs_init_inode_metadata
          -> f2fs_new_node_page
           -> dquot_alloc_inode
            -> f2fs_dquot_mark_dquot_dirty
             -> down_read(&sbi->quota_sem)
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2c4e0c52
    • J
      f2fs: don't put new_page twice in f2fs_rename · 762e4db5
      Jaegeuk Kim 提交于
      In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries
      to put again when whiteout is failed and jumped to put_out_dir.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      762e4db5
    • J
      f2fs: set I_LINKABLE early to avoid wrong access by vfs · 5b1dbb08
      Jaegeuk Kim 提交于
      This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the
      below warning.
      
      [ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40
      [ 3189.246979] Call Trace:
      [ 3189.248707]  f2fs_init_inode_metadata+0x2d6/0x440 [f2fs]
      [ 3189.251399]  f2fs_add_inline_entry+0x162/0x8c0 [f2fs]
      [ 3189.254010]  f2fs_add_dentry+0x69/0xe0 [f2fs]
      [ 3189.256353]  f2fs_do_add_link+0xc5/0x100 [f2fs]
      [ 3189.258774]  f2fs_rename2+0xabf/0x1010 [f2fs]
      [ 3189.261079]  vfs_rename+0x3f8/0xaa0
      [ 3189.263056]  ? tomoyo_path_rename+0x44/0x60
      [ 3189.265283]  ? do_renameat2+0x49b/0x550
      [ 3189.267324]  do_renameat2+0x49b/0x550
      [ 3189.269316]  __x64_sys_renameat2+0x20/0x30
      [ 3189.271441]  do_syscall_64+0x5a/0x230
      [ 3189.273410]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 3189.275848] RIP: 0033:0x7f270b4d9a49
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5b1dbb08
    • E
      f2fs: don't keep META_MAPPING pages used for moving verity file blocks · 542989b6
      Eric Biggers 提交于
      META_MAPPING is used to move blocks for both encrypted and verity files.
      So the META_MAPPING invalidation condition in do_checkpoint() should
      consider verity too, not just encrypt.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      542989b6
    • C
      f2fs: introduce private bioset · f543805f
      Chao Yu 提交于
      In low memory scenario, we can allocate multiple bios without
      submitting any of them.
      
      - f2fs_write_checkpoint()
       - block_operations()
        - f2fs_sync_node_pages()
         step 1) flush cold nodes, allocate new bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 2) flush hot nodes, allocate a bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 3) flush warm nodes, be stuck in below call path
         - bio_alloc()
          - mempool_alloc()
           - loop to wait mempool element release, as we only
             reserved memory for two bio allocation, however above
             allocated two bios may never be submitted.
      
      So we need avoid using default bioset, in this patch we introduce a
      private bioset, in where we enlarg mempool element count to total
      number of log header, so that we can make sure we have enough
      backuped memory pool in scenario of allocating/holding multiple
      bios.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f543805f
    • S
      f2fs: cleanup duplicate stats for atomic files · 0e6d0164
      Sahitya Tummala 提交于
      Remove duplicate sbi->aw_cnt stats counter that tracks
      the number of atomic files currently opened (it also shows
      incorrect value sometimes). Use more relit lable sbi->atomic_files
      to show in the stats.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0e6d0164
    • S
      f2fs: Check write pointer consistency of non-open zones · d508c94e
      Shin'ichiro Kawasaki 提交于
      To catch f2fs bugs in write pointer handling code for zoned block
      devices, check write pointers of non-open zones that current segments do
      not point to. Do this check at mount time, after the fsync data recovery
      and current segments' write pointer consistency fix. Or when fsync data
      recovery is disabled by mount option, do the check when there is no fsync
      data.
      
      Check two items comparing write pointers with valid block maps in SIT.
      The first item is check for zones with no valid blocks. When there is no
      valid blocks in a zone, the write pointer should be at the start of the
      zone. If not, next write operation to the zone will cause unaligned write
      error. If write pointer is not at the zone start, reset the write pointer
      to place at the zone start.
      
      The second item is check between the write pointer position and the last
      valid block in the zone. It is unexpected that the last valid block
      position is beyond the write pointer. In such a case, report as a bug.
      Fix is not required for such zone, because the zone is not selected for
      next write operation until the zone get discarded.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d508c94e
    • S
      f2fs: Check write pointer consistency of open zones · c426d991
      Shin'ichiro Kawasaki 提交于
      On sudden f2fs shutdown, write pointers of zoned block devices can go
      further but f2fs meta data keeps current segments at positions before the
      write operations. After remounting the f2fs, this inconsistency causes
      write operations not at write pointers and "Unaligned write command"
      error is reported.
      
      To avoid the error, compare current segments with write pointers of open
      zones the current segments point to, during mount operation. If the write
      pointer position is not aligned with the current segment position, assign
      a new zone to the current segment. Also check the newly assigned zone has
      write pointer at zone start. If not, reset write pointer of the zone.
      
      Perform the consistency check during fsync recovery. Not to lose the
      fsync data, do the check after fsync data gets restored and before
      checkpoint commit which flushes data at current segment positions. Not to
      cause conflict with kworker's dirfy data/node flush, do the fix within
      SBI_POR_DOING protection.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c426d991
  3. 13 12月, 2019 3 次提交
  4. 11 12月, 2019 1 次提交
  5. 10 12月, 2019 2 次提交
  6. 08 12月, 2019 8 次提交
    • S
      smb3: improve check for when we send the security descriptor context on create · 231e2a0b
      Steve French 提交于
      We had cases in the previous patch where we were sending the security
      descriptor context on SMB3 open (file create) in cases when we hadn't
      mounted with with "modefromsid" mount option.
      
      Add check for that mount flag before calling ad_sd_context in
      open init.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      231e2a0b
    • L
      pipe: don't use 'pipe_wait() for basic pipe IO · 85190d15
      Linus Torvalds 提交于
      pipe_wait() may be simple, but since it relies on the pipe lock, it
      means that we have to do the wakeup while holding the lock.  That's
      unfortunate, because the very first thing the waked entity will want to
      do is to get the pipe lock for itself.
      
      So get rid of the pipe_wait() usage by simply releasing the pipe lock,
      doing the wakeup (if required) and then using wait_event_interruptible()
      to wait on the right condition instead.
      
      wait_event_interruptible() handles races on its own by comparing the
      wakeup condition before and after adding itself to the wait queue, so
      you can use an optimistic unlocked condition for it.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85190d15
    • L
      pipe: remove 'waiting_writers' merging logic · a28c8b9d
      Linus Torvalds 提交于
      This code is ancient, and goes back to when we only had a single page
      for the pipe buffers.  The exact history is hidden in the mists of time
      (ie "before git", and in fact predates the BK repository too).
      
      At that long-ago point in time, it actually helped to try to merge big
      back-and-forth pipe reads and writes, and not limit pipe reads to the
      single pipe buffer in length just because that was all we had at a time.
      
      However, since then we've expanded the pipe buffers to multiple pages,
      and this logic really doesn't seem to make sense.  And a lot of it is
      somewhat questionable (ie "hmm, the user asked for a non-blocking read,
      but we see that there's a writer pending, so let's wait anyway to get
      the extra data that the writer will have").
      
      But more importantly, it makes the "go to sleep" logic much less
      obvious, and considering the wakeup issues we've had, I want to make for
      less of those kinds of things.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a28c8b9d
    • L
      pipe: fix and clarify pipe read wakeup logic · f467a6a6
      Linus Torvalds 提交于
      This is the read side version of the previous commit: it simplifies the
      logic to only wake up waiting writers when necessary, and makes sure to
      use a synchronous wakeup.  This time not so much for GNU make jobserver
      reasons (that pipe never fills up), but simply to get the writer going
      quickly again.
      
      A bit less verbose commentary this time, if only because I assume that
      the write side commentary isn't going to be ignored if you touch this
      code.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f467a6a6
    • L
      pipe: fix and clarify pipe write wakeup logic · 1b6b26ae
      Linus Torvalds 提交于
      The pipe rework ends up having been extra painful, partly becaused of
      actual bugs with ordering and caching of the pipe state, but also
      because of subtle performance issues.
      
      In particular, the pipe rework caused the kernel build to inexplicably
      slow down.
      
      The reason turns out to be that the GNU make jobserver (which limits the
      parallelism of the build) uses a pipe to implement a "token" system: a
      parallel submake will read a character from the pipe to get the job
      token before starting a new job, and will write a character back to the
      pipe when it is done.  The overall job limit is thus easily controlled
      by just writing the appropriate number of initial token characters into
      the pipe.
      
      But to work well, that really means that the old behavior of write
      wakeups being synchronous (WF_SYNC) is very important - when the pipe
      writer wakes up a reader, we want the reader to actually get scheduled
      immediately.  Otherwise you lose the parallelism of the build.
      
      The pipe rework lost that synchronous wakeup on write, and we had
      clearly all forgotten the reasons and rules for it.
      
      This rewrites the pipe write wakeup logic to do the required Wsync
      wakeups, but also clarifies the logic and avoids extraneous wakeups.
      
      It also ends up addign a number of comments about what oit does and why,
      so that we hopefully don't end up forgetting about this next time we
      change this code.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b6b26ae
    • L
      pipe: fix poll/select race introduced by the pipe rework · ad910e36
      Linus Torvalds 提交于
      The kernel wait queues have a basic rule to them: you add yourself to
      the wait-queue first, and then you check the things that you're going to
      wait on.  That avoids the races with the event you're waiting for.
      
      The same goes for poll/select logic: the "poll_wait()" goes first, and
      then you check the things you're polling for.
      
      Of course, if you use locking, the ordering doesn't matter since the
      lock will serialize with anything that changes the state you're looking
      at. That's not the case here, though.
      
      So move the poll_wait() first in pipe_poll(), before you start looking
      at the pipe state.
      
      Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad910e36
    • P
      nfsd: depend on CRYPTO_MD5 for legacy client tracking · 38a2204f
      Patrick Steinhardt 提交于
      The legacy client tracking infrastructure of nfsd makes use of MD5 to
      derive a client's recovery directory name. As the nfsd module doesn't
      declare any dependency on CRYPTO_MD5, though, it may fail to allocate
      the hash if the kernel was compiled without it. As a result, generation
      of client recovery directories will fail with the following error:
      
          NFSD: unable to generate recoverydir name
      
      The explicit dependency on CRYPTO_MD5 was removed as redundant back in
      6aaa67b5 (NFSD: Remove redundant "select" clauses in fs/Kconfig
      2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
      This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
      df486a25 (NFS: Fix the selection of security flavours in Kconfig) at
      a later point.
      
      Fix the issue by adding back an explicit dependency on CRYPTO_MD5.
      
      Fixes: df486a25 (NFS: Fix the selection of security flavours in Kconfig)
      Signed-off-by: NPatrick Steinhardt <ps@pks.im>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      38a2204f
    • O
      NFSD fixing possible null pointer derefering in copy offload · 18f428d4
      Olga Kornievskaia 提交于
      Static checker revealed possible error path leading to possible
      NULL pointer dereferencing.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: e0639dc5: ("NFSD introduce async copy feature")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      18f428d4
  7. 07 12月, 2019 3 次提交
  8. 06 12月, 2019 2 次提交
    • D
      pipe: Fix missing mask update after pipe_wait() · 8f868d68
      David Howells 提交于
      Fix pipe_write() to not cache the ring index mask and max_usage as their
      values are invalidated by calling pipe_wait() because the latter
      function drops the pipe lock, thereby allowing F_SETPIPE_SZ change them.
      Without this, pipe_write() may subsequently miscalculate the array
      indices and pipe fullness, leading to an oops like the following:
      
        BUG: KASAN: slab-out-of-bounds in pipe_write+0xc25/0xe10 fs/pipe.c:481
        Write of size 8 at addr ffff8880771167a8 by task syz-executor.3/7987
        ...
        CPU: 1 PID: 7987 Comm: syz-executor.3 Not tainted 5.4.0-rc2-syzkaller #0
        ...
        Call Trace:
          pipe_write+0xc25/0xe10 fs/pipe.c:481
          call_write_iter include/linux/fs.h:1895 [inline]
          new_sync_write+0x3fd/0x7e0 fs/read_write.c:483
          __vfs_write+0x94/0x110 fs/read_write.c:496
          vfs_write+0x18a/0x520 fs/read_write.c:558
          ksys_write+0x105/0x220 fs/read_write.c:611
          __do_sys_write fs/read_write.c:623 [inline]
          __se_sys_write fs/read_write.c:620 [inline]
          __x64_sys_write+0x6e/0xb0 fs/read_write.c:620
          do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is not a problem for pipe_read() as the mask is recalculated on
      each pass of the loop, after pipe_wait() has been called.
      
      Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
      Reported-by: syzbot+838eb0878ffd51f27c41@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      [ Changed it to use a temporary variable 'mask' to avoid long lines -Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f868d68
    • D
      pipe: Remove assertion from pipe_poll() · 8c7b8c34
      David Howells 提交于
      An assertion check was added to pipe_poll() to make sure that the ring
      occupancy isn't seen to overflow the ring size.  However, since no locks
      are held when the three values are read, it is possible for F_SETPIPE_SZ
      to intervene and muck up the calculation, thereby causing the oops.
      
      Fix this by simply removing the assertion and accepting that the
      calculation might be approximate.
      
      Note that the previous code also had a similar issue, though there was
      no assertion check, since the occupancy counter and the ring size were
      not read with a lock held, so it's possible that the poll check might
      have malfunctioned then too.
      
      Also wake up all the waiters so that they can reissue their checks if
      there was a competing read or write.
      
      Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
      Reported-by: syzbot+d37abaade33a934f16f2@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Eric Biggers <ebiggers@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c7b8c34
  9. 05 12月, 2019 6 次提交
    • Z
      iomap: stop using ioend after it's been freed in iomap_finish_ioend() · c275779f
      Zorro Lang 提交于
      This patch fixes the following KASAN report. The @ioend has been
      freed by dio_put(), but the iomap_finish_ioend() still trys to access
      its data.
      
      [20563.631624] BUG: KASAN: use-after-free in iomap_finish_ioend+0x58c/0x5c0
      [20563.638319] Read of size 8 at addr fffffc0c54a36928 by task kworker/123:2/22184
      
      [20563.647107] CPU: 123 PID: 22184 Comm: kworker/123:2 Not tainted 5.4.0+ #1
      [20563.653887] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
      [20563.664499] Workqueue: xfs-conv/sda5 xfs_end_io [xfs]
      [20563.669547] Call trace:
      [20563.671993]  dump_backtrace+0x0/0x370
      [20563.675648]  show_stack+0x1c/0x28
      [20563.678958]  dump_stack+0x138/0x1b0
      [20563.682455]  print_address_description.isra.9+0x60/0x378
      [20563.687759]  __kasan_report+0x1a4/0x2a8
      [20563.691587]  kasan_report+0xc/0x18
      [20563.694985]  __asan_report_load8_noabort+0x18/0x20
      [20563.699769]  iomap_finish_ioend+0x58c/0x5c0
      [20563.703944]  iomap_finish_ioends+0x110/0x270
      [20563.708396]  xfs_end_ioend+0x168/0x598 [xfs]
      [20563.712823]  xfs_end_io+0x1e0/0x2d0 [xfs]
      [20563.716834]  process_one_work+0x7f0/0x1ac8
      [20563.720922]  worker_thread+0x334/0xae0
      [20563.724664]  kthread+0x2c4/0x348
      [20563.727889]  ret_from_fork+0x10/0x18
      
      [20563.732941] Allocated by task 83403:
      [20563.736512]  save_stack+0x24/0xb0
      [20563.739820]  __kasan_kmalloc.isra.9+0xc4/0xe0
      [20563.744169]  kasan_slab_alloc+0x14/0x20
      [20563.747998]  slab_post_alloc_hook+0x50/0xa8
      [20563.752173]  kmem_cache_alloc+0x154/0x330
      [20563.756185]  mempool_alloc_slab+0x20/0x28
      [20563.760186]  mempool_alloc+0xf4/0x2a8
      [20563.763845]  bio_alloc_bioset+0x2d0/0x448
      [20563.767849]  iomap_writepage_map+0x4b8/0x1740
      [20563.772198]  iomap_do_writepage+0x200/0x8d0
      [20563.776380]  write_cache_pages+0x8a4/0xed8
      [20563.780469]  iomap_writepages+0x4c/0xb0
      [20563.784463]  xfs_vm_writepages+0xf8/0x148 [xfs]
      [20563.788989]  do_writepages+0xc8/0x218
      [20563.792658]  __writeback_single_inode+0x168/0x18f8
      [20563.797441]  writeback_sb_inodes+0x370/0xd30
      [20563.801703]  wb_writeback+0x2d4/0x1270
      [20563.805446]  wb_workfn+0x344/0x1178
      [20563.808928]  process_one_work+0x7f0/0x1ac8
      [20563.813016]  worker_thread+0x334/0xae0
      [20563.816757]  kthread+0x2c4/0x348
      [20563.819979]  ret_from_fork+0x10/0x18
      
      [20563.825028] Freed by task 22184:
      [20563.828251]  save_stack+0x24/0xb0
      [20563.831559]  __kasan_slab_free+0x10c/0x180
      [20563.835648]  kasan_slab_free+0x10/0x18
      [20563.839389]  slab_free_freelist_hook+0xb4/0x1c0
      [20563.843912]  kmem_cache_free+0x8c/0x3e8
      [20563.847745]  mempool_free_slab+0x20/0x28
      [20563.851660]  mempool_free+0xd4/0x2f8
      [20563.855231]  bio_free+0x33c/0x518
      [20563.858537]  bio_put+0xb8/0x100
      [20563.861672]  iomap_finish_ioend+0x168/0x5c0
      [20563.865847]  iomap_finish_ioends+0x110/0x270
      [20563.870328]  xfs_end_ioend+0x168/0x598 [xfs]
      [20563.874751]  xfs_end_io+0x1e0/0x2d0 [xfs]
      [20563.878755]  process_one_work+0x7f0/0x1ac8
      [20563.882844]  worker_thread+0x334/0xae0
      [20563.886584]  kthread+0x2c4/0x348
      [20563.889804]  ret_from_fork+0x10/0x18
      
      [20563.894855] The buggy address belongs to the object at fffffc0c54a36900
                      which belongs to the cache bio-1 of size 248
      [20563.906844] The buggy address is located 40 bytes inside of
                      248-byte region [fffffc0c54a36900, fffffc0c54a369f8)
      [20563.918485] The buggy address belongs to the page:
      [20563.923269] page:ffffffff82f528c0 refcount:1 mapcount:0 mapping:fffffc8e4ba31900 index:0xfffffc0c54a33300
      [20563.932832] raw: 17ffff8000000200 ffffffffa3060100 0000000700000007 fffffc8e4ba31900
      [20563.940567] raw: fffffc0c54a33300 0000000080aa0042 00000001ffffffff 0000000000000000
      [20563.948300] page dumped because: kasan: bad access detected
      
      [20563.955345] Memory state around the buggy address:
      [20563.960129]  fffffc0c54a36800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
      [20563.967342]  fffffc0c54a36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [20563.974554] >fffffc0c54a36900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [20563.981766]                                   ^
      [20563.986288]  fffffc0c54a36980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
      [20563.993501]  fffffc0c54a36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [20564.000713] ==================================================================
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205703Signed-off-by: NZorro Lang <zlang@redhat.com>
      Fixes: 9cd0ed63 ("iomap: enhance writeback error message")
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c275779f
    • L
      io_uring: fix a typo in a comment · 0b4295b5
      LimingWu 提交于
      thatn -> than.
      Signed-off-by: NLiming Wu <19092205@suning.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b4295b5
    • P
      io_uring: hook all linked requests via link_list · 4493233e
      Pavel Begunkov 提交于
      Links are created by chaining requests through req->list with an
      exception that head uses req->link_list. (e.g. link_list->list->list)
      Because of that, io_req_link_next() needs complex splicing to advance.
      
      Link them all through list_list. Also, it seems to be simpler and more
      consistent IMHO.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4493233e
    • P
      io_uring: fix error handling in io_queue_link_head · 2e6e1fde
      Pavel Begunkov 提交于
      In case of an error io_submit_sqe() drops a request and continues
      without it, even if the request was a part of a link. Not only it
      doesn't cancel links, but also may execute wrong sequence of actions.
      
      Stop consuming sqes, and let the user handle errors.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2e6e1fde
    • A
      fs/binfmt_elf.c: extract elf_read() function · 658c0335
      Alexey Dobriyan 提交于
      ELF reads done by the kernel have very complicated error detection code
      which better live in one place.
      
      Link: http://lkml.kernel.org/r/20191005165215.GB26927@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      658c0335
    • A