1. 07 9月, 2019 2 次提交
  2. 23 8月, 2019 3 次提交
  3. 17 8月, 2019 1 次提交
    • J
      f2fs: fix livelock in swapfile writes · 75a037f3
      Jaegeuk Kim 提交于
      This patch fixes livelock in the below call path when writing swap pages.
      
      [46374.617256] c2    701  __switch_to+0xe4/0x100
      [46374.617265] c2    701  __schedule+0x80c/0xbc4
      [46374.617273] c2    701  schedule+0x74/0x98
      [46374.617281] c2    701  rwsem_down_read_failed+0x190/0x234
      [46374.617291] c2    701  down_read+0x58/0x5c
      [46374.617300] c2    701  f2fs_map_blocks+0x138/0x9a8
      [46374.617310] c2    701  get_data_block_dio_write+0x74/0x104
      [46374.617320] c2    701  __blockdev_direct_IO+0x1350/0x3930
      [46374.617331] c2    701  f2fs_direct_IO+0x55c/0x8bc
      [46374.617341] c2    701  __swap_writepage+0x1d0/0x3e8
      [46374.617351] c2    701  swap_writepage+0x44/0x54
      [46374.617360] c2    701  shrink_page_list+0x140/0xe80
      [46374.617371] c2    701  shrink_inactive_list+0x510/0x918
      [46374.617381] c2    701  shrink_node_memcg+0x2d4/0x804
      [46374.617391] c2    701  shrink_node+0x10c/0x2f8
      [46374.617400] c2    701  do_try_to_free_pages+0x178/0x38c
      [46374.617410] c2    701  try_to_free_pages+0x348/0x4b8
      [46374.617419] c2    701  __alloc_pages_nodemask+0x7f8/0x1014
      [46374.617429] c2    701  pagecache_get_page+0x184/0x2cc
      [46374.617438] c2    701  f2fs_new_node_page+0x60/0x41c
      [46374.617449] c2    701  f2fs_new_inode_page+0x50/0x7c
      [46374.617460] c2    701  f2fs_init_inode_metadata+0x128/0x530
      [46374.617472] c2    701  f2fs_add_inline_entry+0x138/0xd64
      [46374.617480] c2    701  f2fs_do_add_link+0xf4/0x178
      [46374.617488] c2    701  f2fs_create+0x1e4/0x3ac
      [46374.617497] c2    701  path_openat+0xdc0/0x1308
      [46374.617507] c2    701  do_filp_open+0x78/0x124
      [46374.617516] c2    701  do_sys_open+0x134/0x248
      [46374.617525] c2    701  SyS_openat+0x14/0x20
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      75a037f3
  4. 19 7月, 2019 1 次提交
  5. 10 7月, 2019 1 次提交
  6. 03 7月, 2019 2 次提交
  7. 29 5月, 2019 2 次提交
    • E
      fscrypt: support encrypting multiple filesystem blocks per page · 53bc1d85
      Eric Biggers 提交于
      Rename fscrypt_encrypt_page() to fscrypt_encrypt_pagecache_blocks() and
      redefine its behavior to encrypt all filesystem blocks from the given
      region of the given page, rather than assuming that the region consists
      of just one filesystem block.  Also remove the 'inode' and 'lblk_num'
      parameters, since they can be retrieved from the page as it's already
      assumed to be a pagecache page.
      
      This is in preparation for allowing encryption on ext4 filesystems with
      blocksize != PAGE_SIZE.
      
      This is based on work by Chandan Rajendra.
      Reviewed-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      53bc1d85
    • E
      fscrypt: simplify bounce page handling · d2d0727b
      Eric Biggers 提交于
      Currently, bounce page handling for writes to encrypted files is
      unnecessarily complicated.  A fscrypt_ctx is allocated along with each
      bounce page, page_private(bounce_page) points to this fscrypt_ctx, and
      fscrypt_ctx::w::control_page points to the original pagecache page.
      
      However, because writes don't use the fscrypt_ctx for anything else,
      there's no reason why page_private(bounce_page) can't just point to the
      original pagecache page directly.
      
      Therefore, this patch makes this change.  In the process, it also cleans
      up the API exposed to filesystems that allows testing whether a page is
      a bounce page, getting the pagecache page from a bounce page, and
      freeing a bounce page.
      Reviewed-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      d2d0727b
  8. 23 5月, 2019 2 次提交
    • C
      f2fs: fix to avoid deadloop if data_flush is on · 040d2bb3
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      [  615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
      [  615.697825]       Not tainted 5.0.15-gentoo-f2fslog #4
      [  615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  615.697827] kworker/u16:5   D    0   344      2 0x80000000
      [  615.697831] Workqueue: writeback wb_workfn (flush-259:0)
      [  615.697832] Call Trace:
      [  615.697836]  ? __schedule+0x2c5/0x8b0
      [  615.697839]  schedule+0x32/0x80
      [  615.697841]  schedule_preempt_disabled+0x14/0x20
      [  615.697842]  __mutex_lock.isra.8+0x2ba/0x4d0
      [  615.697845]  ? log_store+0xf5/0x260
      [  615.697848]  f2fs_write_data_pages+0x133/0x320
      [  615.697851]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697854]  do_writepages+0x41/0xd0
      [  615.697857]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697859]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697861]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697863]  ? up_read+0x5/0x20
      [  615.697865]  ? f2fs_do_write_data_page+0x2cb/0x940
      [  615.697867]  f2fs_balance_fs+0xe5/0x2c0
      [  615.697869]  __write_data_page+0x1c8/0x6e0
      [  615.697873]  f2fs_write_cache_pages+0x1e0/0x450
      [  615.697878]  f2fs_write_data_pages+0x14b/0x320
      [  615.697880]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697883]  do_writepages+0x41/0xd0
      [  615.697885]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697887]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697889]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697891]  f2fs_write_node_pages+0x51/0x220
      [  615.697894]  do_writepages+0x41/0xd0
      [  615.697897]  __writeback_single_inode+0x3d/0x3d0
      [  615.697899]  writeback_sb_inodes+0x1e8/0x410
      [  615.697902]  __writeback_inodes_wb+0x5d/0xb0
      [  615.697904]  wb_writeback+0x28f/0x340
      [  615.697906]  ? cpumask_next+0x16/0x20
      [  615.697908]  wb_workfn+0x33e/0x420
      [  615.697911]  process_one_work+0x1a1/0x3d0
      [  615.697913]  worker_thread+0x30/0x380
      [  615.697915]  ? process_one_work+0x3d0/0x3d0
      [  615.697916]  kthread+0x116/0x130
      [  615.697918]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  615.697921]  ret_from_fork+0x3a/0x50
      
      There is still deadloop in below condition:
      
      d A
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - f2fs_write_cache_pages
           - mutex_lock(&sbi->writepages)	-- lock once
           - __write_data_page
            - f2fs_balance_fs_bg
             - f2fs_sync_dirty_inodes
              - f2fs_write_data_pages
               - mutex_lock(&sbi->writepages)	-- lock again
      
      Thread A			Thread B
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - .cp_task = current
      				- f2fs_sync_dirty_inodes
      				 - .cp_task = current
      				 - filemap_fdatawrite
      				 - .cp_task = NULL
          - filemap_fdatawrite
           - f2fs_write_cache_pages
            - enter f2fs_balance_fs_bg since .cp_task is NULL
          - .cp_task = NULL
      
      Change as below to avoid this:
      - add condition to avoid holding .writepages mutex lock in path
      of data flush
      - introduce mutex lock sbi.flush_lock to exclude concurrent data
      flush in background.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      040d2bb3
    • C
      f2fs: add bio cache for IPU · 8648de2c
      Chao Yu 提交于
      SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
      commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we
      lost the chance of merging page in inner managed bio cache, result in
      submitting more small-sized IO.
      
      So let's add temporary bio in writepages() to cache mergeable write IO as
      much as possible.
      
      Test case:
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      
      Before:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
      
      After:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8648de2c
  9. 09 5月, 2019 4 次提交
  10. 30 4月, 2019 1 次提交
  11. 17 4月, 2019 1 次提交
  12. 06 4月, 2019 2 次提交
    • C
      f2fs: fix potential recursive call when enabling data_flush · 186857c5
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      Hi, this is a long standing bug that I've hit before on older kernels,
      but I was not able to get the syslog saved because of the nature of
      the bug. This time I had booted form a pen-drive, and was able to save
      the log to it's efi-partition.
      What i did to trigger it was to create a partition and format it f2fs,
      then mount it with options:
      "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
      Then I unpacked a big .tar.xz to the partition (I used a
      gentoo-stage3-tarball as I was in process of installing Gentoo).
      
      Same options just without data_flush gives no problems.
      
      Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
      properly unmounted. Some data may be corrupt. Please run fsck.
      Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
      stack depth: 12064 bytes left
      Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
      00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
      Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel: Call Trace:
      Mar 20 21:06:40 usbgentoo kernel:  read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  ? xas_load+0x8/0x50
      Mar 20 21:06:40 usbgentoo kernel:  __get_node_page+0x73/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_get_dnode_of_data+0x34e/0x580
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_inline_data+0x5e/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x421/0x690
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_cache_pages+0x1cf/0x460
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_data_pages+0x2b3/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_chksum_verify+0x1d/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  ? read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  do_writepages+0x3c/0xd0
      Mar 20 21:06:40 usbgentoo kernel:  __filemap_fdatawrite_range+0x7c/0xb0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_sync_dirty_inodes+0xf2/0x200
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs_bg+0x2a3/0x2c0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_dirtied+0x21/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs+0xd6/0x2b0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x4fb/0x690
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_single_inode+0x2a1/0x340
      Mar 20 21:06:40 usbgentoo kernel:  ? soft_cursor+0x1b4/0x220
      Mar 20 21:06:40 usbgentoo kernel:  writeback_sb_inodes+0x1d5/0x3e0
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_inodes_wb+0x58/0xa0
      Mar 20 21:06:40 usbgentoo kernel:  wb_writeback+0x250/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? 0xffffffff8c000000
      Mar 20 21:06:40 usbgentoo kernel:  ? cpumask_next+0x16/0x20
      Mar 20 21:06:40 usbgentoo kernel:  wb_workfn+0x2f6/0x3b0
      Mar 20 21:06:40 usbgentoo kernel:  ? __switch_to_asm+0x40/0x70
      Mar 20 21:06:40 usbgentoo kernel:  process_one_work+0x1f5/0x3f0
      Mar 20 21:06:40 usbgentoo kernel:  worker_thread+0x28/0x3c0
      Mar 20 21:06:40 usbgentoo kernel:  ? rescuer_thread+0x330/0x330
      Mar 20 21:06:40 usbgentoo kernel:  kthread+0x10e/0x130
      Mar 20 21:06:40 usbgentoo kernel:  ? kthread_create_on_node+0x60/0x60
      Mar 20 21:06:40 usbgentoo kernel:  ret_from_fork+0x35/0x40
      
      The root cause is that we run into an infinite recursive calling in
      between f2fs_balance_fs_bg and writepage() as described below:
      
      - f2fs_write_data_pages		--- A
       - __write_data_page
        - f2fs_balance_fs
         - f2fs_balance_fs_bg		--- B
          - f2fs_sync_dirty_inodes
           - filemap_fdatawrite
            - f2fs_write_data_pages	--- A
      ...
                - f2fs_balance_fs_bg	--- B
      ...
      
      In order to fix this issue, let's detect such condition in __write_data_page()
      and just skip calling f2fs_balance_fs() recursively.
      Reported-by: NHagbard Celine <hagbardcelin@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      186857c5
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  13. 13 3月, 2019 5 次提交
  14. 06 3月, 2019 1 次提交
    • C
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu 提交于
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce
  15. 15 2月, 2019 1 次提交
  16. 24 1月, 2019 1 次提交
  17. 09 1月, 2019 1 次提交
  18. 29 12月, 2018 1 次提交
  19. 27 12月, 2018 2 次提交
    • C
      f2fs: check PageWriteback flag for ordered case · bae0ee7a
      Chao Yu 提交于
      For all ordered cases in f2fs_wait_on_page_writeback(), we need to
      check PageWriteback status, so let's clean up to relocate the check
      into f2fs_wait_on_page_writeback().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bae0ee7a
    • J
      f2fs: use kvmalloc, if kmalloc is failed · 5222595d
      Jaegeuk Kim 提交于
      One report says memalloc failure during mount.
      
       (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
       (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
       (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
       (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
       (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
       (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
       (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
       (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
       (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
       (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
       (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
       (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
       (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
       (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5222595d
  20. 14 12月, 2018 1 次提交
  21. 27 11月, 2018 5 次提交
    • J
      f2fs: fix m_may_create to make OPU DIO write correctly · f4f0b677
      Jia Zhu 提交于
      Previously, we added a parameter @map.m_may_create to trigger OPU
      allocation and call f2fs_balance_fs() correctly.
      
      But in get_more_blocks(), @create has been overwritten by below code.
      So the function f2fs_map_blocks() will not allocate new block address
      but directly go out. Meanwile,there are several functions calling
      f2fs_map_blocks() directly and @map.m_may_create not initialized.
      CODE:
      create = dio->op == REQ_OP_WRITE;
      	if (dio->flags & DIO_SKIP_HOLES) {
      		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
      						i_blkbits))
      			create = 0;
      	}
      
      This patch fixes it.
      Signed-off-by: NJia Zhu <zhujia13@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f4f0b677
    • J
      f2fs: fix to update new block address correctly for OPU · 73c0a927
      Jia Zhu 提交于
      Previously, we allocated a new block address for OPU mode in direct_IO.
      
      But the new address couldn't be assigned to @map->m_pblk correctly.
      
      This patch fix it.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode")
      Signed-off-by: NJia Zhu <zhujia13@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      73c0a927
    • S
      f2fs: fix race between write_checkpoint and write_begin · 2866fb16
      Sheng Yong 提交于
      The following race could lead to inconsistent SIT bitmap:
      
      Task A                          Task B
      ======                          ======
      f2fs_write_checkpoint
        block_operations
          f2fs_lock_all
            down_write(node_change)
            down_write(node_write)
            ... sync ...
            up_write(node_change)
                                      f2fs_file_write_iter
                                        set_inode_flag(FI_NO_PREALLOC)
                                        ......
                                        f2fs_write_begin(index=0, has inline data)
                                          prepare_write_begin
                                            __do_map_lock(AIO) => down_read(node_change)
                                            f2fs_convert_inline_page => update SIT
                                            __do_map_lock(AIO) => up_read(node_change)
        f2fs_flush_sit_entries <= inconsistent SIT
        finish write checkpoint
        sudden-power-off
      
      If SPO occurs after checkpoint is finished, SIT bitmap will be set
      incorrectly.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2866fb16
    • Y
      f2fs: only flush the single temp bio cache which owns the target page · 1e771e83
      Yunlong Song 提交于
      Previously, when f2fs finds which temp bio cache owns the target page,
      it will flush all the three temp bio caches, but we only need to flush
      one single bio cache indeed, which can help to keep bio merged.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e771e83
    • C
      f2fs: fix out-place-update DIO write · f9d6d059
      Chao Yu 提交于
      In get_more_blocks(), we may override @create as below code:
      
      	create = dio->op == REQ_OP_WRITE;
      	if (dio->flags & DIO_SKIP_HOLES) {
      		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
      						i_blkbits))
      			create = 0;
      	}
      
      But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create
      is 1, so in LFS mode, dio overwrite under LFS mode can easily run out
      of free segments, result in below panic.
      
       Call Trace:
        allocate_segment_by_default+0xa8/0x270 [f2fs]
        f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs]
        __allocate_data_block+0x306/0x480 [f2fs]
        f2fs_map_blocks+0x6f6/0x920 [f2fs]
        __get_data_block+0x4f/0xb0 [f2fs]
        get_data_block_dio_write+0x50/0x60 [f2fs]
        do_blockdev_direct_IO+0xcd5/0x21e0
        __blockdev_direct_IO+0x3a/0x3c
        f2fs_direct_IO+0x1ff/0x4a0 [f2fs]
        generic_file_direct_write+0xd9/0x160
        __generic_file_write_iter+0xbb/0x1e0
        f2fs_file_write_iter+0xaf/0x220 [f2fs]
        __vfs_write+0xd0/0x130
        vfs_write+0xb2/0x1b0
        SyS_pwrite64+0x69/0xa0
        ? vtime_user_exit+0x29/0x70
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
       RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8
      
      So this patch introduces a parameter map.m_may_create to indicate that
      f2fs_map_blocks() is called from write or read path, which can give the
      right hint to let f2fs_map_blocks() trigger OPU allocation and call
      f2fs_balanc_fs() correctly.
      
      BTW, it disables physical address preallocation for direct IO in
      f2fs_preallocate_blocks, which is redundant to OPU allocation of
      f2fs_map_blocks.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d6d059