1. 03 7月, 2019 5 次提交
    • J
      f2fs: add a rw_sem to cover quota flag changes · db6ec53b
      Jaegeuk Kim 提交于
      Two paths to update quota and f2fs_lock_op:
      
      1.
       - lock_op
       |  - quota_update
       `- unlock_op
      
      2.
       - quota_update
       - lock_op
       `- unlock_op
      
      But, we need to make a transaction on quota_update + lock_op in #2 case.
      So, this patch introduces:
      1. lock_op
      2. down_write
      3. check __need_flush
      4. up_write
      5. if there is dirty quota entries, flush them
      6. otherwise, good to go
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      db6ec53b
    • C
      f2fs: use generic EFSBADCRC/EFSCORRUPTED · 10f966bb
      Chao Yu 提交于
      f2fs uses EFAULT as error number to indicate filesystem is corrupted
      all the time, but generic filesystems use EUCLEAN for such condition,
      we need to change to follow others.
      
      This patch adds two new macros as below to wrap more generic error
      code macros, and spread them in code.
      
      EFSBADCRC	EBADMSG		/* Bad CRC detected */
      EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      10f966bb
    • G
      f2fs: Use DIV_ROUND_UP() instead of open-coding · f91108b8
      Geert Uytterhoeven 提交于
      Replace the open-coded divisions with round-up by calls to the
      DIV_ROUND_UP() helper macro.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f91108b8
    • J
      f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() · dcbb4c10
      Joe Perches 提交于
      - Add and use f2fs_<level> macros
      - Convert f2fs_msg to f2fs_printk
      - Remove level from f2fs_printk and embed the level in the format
      - Coalesce formats and align multi-line arguments
      - Remove unnecessary duplicate extern f2fs_msg f2fs.h
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dcbb4c10
    • Q
      f2fs: ioctl for removing a range from F2FS · 04f0b2ea
      Qiuyang Sun 提交于
      This ioctl shrinks a given length (aligned to sections) from end of the
      main area. Any cursegs and valid blocks will be moved out before
      invalidating the range.
      
      This feature can be used for adjusting partition sizes online.
      
      History of the patch:
      
      Sahitya Tummala:
       - Add this ioctl for f2fs_compat_ioctl() as well.
       - Fix debugfs status to reflect the online resize changes.
       - Fix potential race between online resize path and allocate new data
         block path or gc path.
      
      Others:
       - Rename some identifiers.
       - Add some error handling branches.
       - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
       - Implement this interface as ext4's, and change the parameter from shrunk
      bytes to new block count of F2FS.
       - During resizing, force to empty sit_journal and forbid adding new
         entries to it, in order to avoid invalid segno in journal after resize.
       - Reduce sbi->user_block_count before resize starts.
       - Commit the updated superblock first, and then update in-memory metadata
         only when the former succeeds.
       - Target block count must align to sections.
       - Write checkpoint before and after committing the new superblock, w/o
      CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
      resize fails after the new superblock is committed.
       - In free_segment_range(), reduce granularity of gc_mutex.
       - Add protection on curseg migration.
       - Add freeze_bdev() and thaw_bdev() for resize fs.
       - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
       - Recover super_block and FS metadata when resize fails.
       - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
       - Clean up the sb and fs metadata update functions for resize_fs.
      
      Geert Uytterhoeven:
       - Use div_u64*() for 64-bit divisions
      
      Arnd Bergmann:
       - Not all architectures support get_user() with a 64-bit argument:
          ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
          Use copy_from_user() here, this will always work.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f0b2ea
  2. 22 6月, 2019 2 次提交
    • W
      f2fs: only set project inherit bit for directory · 5043a964
      Wang Shilong 提交于
      It doesn't make any sense to have project inherit bits
      for regular files, even though this won't cause any
      problem, but it is better fix this.
      
      Cc: Andreas Dilger <adilger@dilger.ca>
      Signed-off-by: NWang Shilong <wshilong@ddn.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5043a964
    • E
      f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags · 36098557
      Eric Biggers 提交于
      f2fs copied all the on-disk i_flags from ext4, and along with it the
      assumption that the on-disk i_flags are the same as the bits used by
      FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.  This is problematic because
      reserving an on-disk inode flag in either filesystem's i_flags or in
      these ioctls effectively reserves it in all the other places too.  In
      fact, most of the "f2fs i_flags" are not used by f2fs at all.
      
      Fix this by separating f2fs's i_flags from the ioctl bits and ext4's
      i_flags.
      
      In the process, un-reserve all "f2fs i_flags" that aren't actually
      supported by f2fs.  This included various flags that were not settable
      at all, as well as various flags that were settable by FS_IOC_SETFLAGS
      but didn't actually do anything.
      
      There's a slight chance we'll need to add some flag(s) back to
      FS_IOC_SETFLAGS in order to avoid breaking users who expect f2fs to
      accept some random flag(s).  But hopefully such users don't exist.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36098557
  3. 04 6月, 2019 2 次提交
  4. 23 5月, 2019 3 次提交
    • C
      f2fs: fix to avoid deadloop if data_flush is on · 040d2bb3
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      [  615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
      [  615.697825]       Not tainted 5.0.15-gentoo-f2fslog #4
      [  615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  615.697827] kworker/u16:5   D    0   344      2 0x80000000
      [  615.697831] Workqueue: writeback wb_workfn (flush-259:0)
      [  615.697832] Call Trace:
      [  615.697836]  ? __schedule+0x2c5/0x8b0
      [  615.697839]  schedule+0x32/0x80
      [  615.697841]  schedule_preempt_disabled+0x14/0x20
      [  615.697842]  __mutex_lock.isra.8+0x2ba/0x4d0
      [  615.697845]  ? log_store+0xf5/0x260
      [  615.697848]  f2fs_write_data_pages+0x133/0x320
      [  615.697851]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697854]  do_writepages+0x41/0xd0
      [  615.697857]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697859]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697861]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697863]  ? up_read+0x5/0x20
      [  615.697865]  ? f2fs_do_write_data_page+0x2cb/0x940
      [  615.697867]  f2fs_balance_fs+0xe5/0x2c0
      [  615.697869]  __write_data_page+0x1c8/0x6e0
      [  615.697873]  f2fs_write_cache_pages+0x1e0/0x450
      [  615.697878]  f2fs_write_data_pages+0x14b/0x320
      [  615.697880]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697883]  do_writepages+0x41/0xd0
      [  615.697885]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697887]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697889]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697891]  f2fs_write_node_pages+0x51/0x220
      [  615.697894]  do_writepages+0x41/0xd0
      [  615.697897]  __writeback_single_inode+0x3d/0x3d0
      [  615.697899]  writeback_sb_inodes+0x1e8/0x410
      [  615.697902]  __writeback_inodes_wb+0x5d/0xb0
      [  615.697904]  wb_writeback+0x28f/0x340
      [  615.697906]  ? cpumask_next+0x16/0x20
      [  615.697908]  wb_workfn+0x33e/0x420
      [  615.697911]  process_one_work+0x1a1/0x3d0
      [  615.697913]  worker_thread+0x30/0x380
      [  615.697915]  ? process_one_work+0x3d0/0x3d0
      [  615.697916]  kthread+0x116/0x130
      [  615.697918]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  615.697921]  ret_from_fork+0x3a/0x50
      
      There is still deadloop in below condition:
      
      d A
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - f2fs_write_cache_pages
           - mutex_lock(&sbi->writepages)	-- lock once
           - __write_data_page
            - f2fs_balance_fs_bg
             - f2fs_sync_dirty_inodes
              - f2fs_write_data_pages
               - mutex_lock(&sbi->writepages)	-- lock again
      
      Thread A			Thread B
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - .cp_task = current
      				- f2fs_sync_dirty_inodes
      				 - .cp_task = current
      				 - filemap_fdatawrite
      				 - .cp_task = NULL
          - filemap_fdatawrite
           - f2fs_write_cache_pages
            - enter f2fs_balance_fs_bg since .cp_task is NULL
          - .cp_task = NULL
      
      Change as below to avoid this:
      - add condition to avoid holding .writepages mutex lock in path
      of data flush
      - introduce mutex lock sbi.flush_lock to exclude concurrent data
      flush in background.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      040d2bb3
    • P
      f2fs: always assume that the device is idle under gc_urgent · f7dfd9f3
      Park Ju Hyung 提交于
      This allows more aggressive discards and balancing job to be done
      under gc_urgent.
      Signed-off-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f7dfd9f3
    • C
      f2fs: add bio cache for IPU · 8648de2c
      Chao Yu 提交于
      SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
      commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we
      lost the chance of merging page in inner managed bio cache, result in
      submitting more small-sized IO.
      
      So let's add temporary bio in writepages() to cache mergeable write IO as
      much as possible.
      
      Test case:
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      
      Before:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
      
      After:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8648de2c
  5. 09 5月, 2019 8 次提交
    • C
      f2fs: introduce DATA_GENERIC_ENHANCE · 93770ab7
      Chao Yu 提交于
      Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
      whether @blkaddr locates in main area or not.
      
      That check is weak, since the block address in range of main area can
      point to the address which is not valid in segment info table, and we
      can not detect such condition, we may suffer worse corruption as system
      continues running.
      
      So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
      which trigger SIT bitmap check rather than only range check.
      
      This patch did below changes as wel:
      - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
      - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
      - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
      - spread blkaddr check in:
       * f2fs_get_node_info()
       * __read_out_blkaddrs()
       * f2fs_submit_page_read()
       * ra_data_block()
       * do_recover_data()
      
      This patch can fix bug reported from bugzilla below:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203215
      https://bugzilla.kernel.org/show_bug.cgi?id=203223
      https://bugzilla.kernel.org/show_bug.cgi?id=203231
      https://bugzilla.kernel.org/show_bug.cgi?id=203235
      https://bugzilla.kernel.org/show_bug.cgi?id=203241
      
      = Update by Jaegeuk Kim =
      
      DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
      But, xfstest/generic/446 compalins some generated kernel messages saying invalid
      bitmap was detected when reading a block. The reaons is, when we get the
      block addresses from extent_cache, there is no lock to synchronize it from
      truncating the blocks in parallel.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      93770ab7
    • C
      f2fs: fix to consider multiple device for readonly check · f824deb5
      Chao Yu 提交于
      This patch introduce f2fs_hw_is_readonly() to check whether lower
      device is readonly or not, it adapts multiple device scenario.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f824deb5
    • C
      f2fs: relocate chksum_offset for large_nat_bitmap feature · b471eb99
      Chao Yu 提交于
      For large_nat_bitmap feature, there is a design flaw:
      
      Previous:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----|-------+
      | ......                   |      |       |
      | checksum_value           |<-----+       |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Obviously, if nat_bitmap size + sit_bitmap size is larger than
      MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap
      checkpoint checksum's position, once checkpoint() is triggered
      from kernel, nat or sit bitmap will be damaged by checksum field.
      
      In order to fix this, let's relocate checksum_value's position
      to the head of sit_nat_version_bitmap as below, then nat/sit
      bitmap and chksum value update will become safe.
      
      After:
      
      struct f2fs_checkpoint layout:
      +--------------------------+  0x0000
      | checkpoint_ver           |
      | ......                   |
      | checksum_offset          |------+
      | ......                   |      |
      | sit_nat_version_bitmap[] |<-----+
      | ......                   |<-------------+
      |                          |              |
      +--------------------------+  0x1000      |
      |                          |      nat_bitmap + sit_bitmap
      | payload blocks           |              |
      |                          |              |
      +--------------------------|<-------------+
      
      Related report and discussion:
      
      https://sourceforge.net/p/linux-f2fs/mailman/message/36642346/Reported-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b471eb99
    • C
      f2fs: allow address pointer number of dnode aligning to specified size · d02a6e61
      Chao Yu 提交于
      This patch expands scalability of dnode layout, it allows address pointer
      number of dnode aligning to specified size (now, the size is one byte by
      default), and later the number can align to compress cluster size
      (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
      design of compress meta layout simple.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d02a6e61
    • C
      f2fs: fix wrong __is_meta_io() macro · 6dc3a126
      Chao Yu 提交于
      This patch changes codes as below:
      - don't use is_read_io() as a condition to judge the meta IO.
      - use .is_por to replace .is_meta to indicate IO is from recovery explicitly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6dc3a126
    • C
      f2fs: fix to avoid panic in dec_valid_node_count() · ea6d7e72
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203213
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:2012!
       RIP: 0010:truncate_node+0x2c9/0x2e0
       Call Trace:
        f2fs_truncate_xattr_node+0xa1/0x130
        f2fs_remove_inode_page+0x82/0x2d0
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_node_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ea6d7e72
    • C
      f2fs: fix to avoid panic in dec_valid_block_count() · 5e159cd3
      Chao Yu 提交于
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5e159cd3
    • C
      f2fs: fix to use inline space only if inline_xattr is enable · 622927f3
      Chao Yu 提交于
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      622927f3
  6. 25 4月, 2019 1 次提交
    • E
      crypto: shash - remove shash_desc::flags · 877b5691
      Eric Biggers 提交于
      The flags field in 'struct shash_desc' never actually does anything.
      The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
      However, no shash algorithm ever sleeps, making this flag a no-op.
      
      With this being the case, inevitably some users who can't sleep wrongly
      pass MAY_SLEEP.  These would all need to be fixed if any shash algorithm
      actually started sleeping.  For example, the shash_ahash_*() functions,
      which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
      from the ahash API to the shash API.  However, the shash functions are
      called under kmap_atomic(), so actually they're assumed to never sleep.
      
      Even if it turns out that some users do need preemption points while
      hashing large buffers, we could easily provide a helper function
      crypto_shash_update_large() which divides the data into smaller chunks
      and calls crypto_shash_update() and cond_resched() for each chunk.  It's
      not necessary to have a flag in 'struct shash_desc', nor is it necessary
      to make individual shash algorithms aware of this at all.
      
      Therefore, remove shash_desc::flags, and document that the
      crypto_shash_*() functions can be called from any context.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      877b5691
  7. 09 4月, 2019 1 次提交
    • S
      treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively · d75f773c
      Sakari Ailus 提交于
      %pF and %pf are functionally equivalent to %pS and %ps conversion
      specifiers. The former are deprecated, therefore switch the current users
      to use the preferred variant.
      
      The changes have been produced by the following command:
      
      	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
      	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
      
      And verifying the result.
      
      Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: xen-devel@lists.xenproject.org
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-nvdimm@lists.01.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: linux-mm@kvack.org
      Cc: ceph-devel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
      Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      d75f773c
  8. 06 4月, 2019 4 次提交
    • C
      f2fs: add comment for conditional compilation statement · e1074d4b
      Chao Yu 提交于
      Commit af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      added function is_journalled_quota() in f2fs.h, but it located outside of
      _LINUX_F2FS_H macro coverage, it has been fixed with commit 0af725fc
      ("f2fs: fix wrong #endif").
      
      But anyway, in order to avoid making same mistake latter, let's add single
      line comment to notice which #if the last #endif is corresponding to.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: Remove unnecessary empty EOL]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1074d4b
    • D
      f2fs: improve discard handling with multi-device volumes · 7f3d7719
      Damien Le Moal 提交于
      f2fs_hw_support_discard() only tests if the super block device supports
      discard. However, for a multi-device volume, not all disks used may
      support discard. Improve the check performed to test all devices of
      the volume and report discard as supported if at least one device of
      the volume supports discard. To implement this, introduce the helper
      function f2fs_bdev_support_discard(), which returns true for zoned block
      devices (where discard is processed as a zone reset) and for regular
      disks supporting the discard command.
      
      f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
      handle discard command issuing for a particular device of the volume.
      That is, prevent issuing a discard command for block devices that do
      not support it.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7f3d7719
    • D
      f2fs: Reduce zoned block device memory usage · 95175daf
      Damien Le Moal 提交于
      For zoned block devices, an array of zone types for each device is
      allocated and initialized in order to determine if a section is stored
      on a sequential zone (zone reset needed) or a conventional zone (no
      zone reset needed and regular discard applies). Considering this usage,
      the zone types stored in memory can be replaced with a bitmap to
      indicate an equivalent information, that is, if a zone is sequential or
      not. This reduces the memory usage for each zoned device by roughly 8:
      on a 14TB disk with zones of 256 MB, the zone type array consumes
      13x4KB pages while the bitmap uses only 2x4KB pages.
      
      This patch changes the f2fs_dev_info structure blkz_type field to the
      bitmap blkz_seq. Access to this bitmap is done using the helper
      function f2fs_blkz_is_seq(), which is a rewrite of the function
      get_blkz_type().
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      95175daf
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  9. 13 3月, 2019 3 次提交
  10. 06 3月, 2019 2 次提交
    • C
      f2fs: fix to check inline_xattr_size boundary correctly · 500e0b28
      Chao Yu 提交于
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      500e0b28
    • C
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu 提交于
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce
  11. 16 2月, 2019 2 次提交
  12. 05 2月, 2019 1 次提交
  13. 24 1月, 2019 2 次提交
  14. 23 1月, 2019 2 次提交
  15. 22 1月, 2019 1 次提交
  16. 09 1月, 2019 1 次提交