1. 20 11月, 2019 2 次提交
  2. 08 11月, 2019 2 次提交
  3. 18 9月, 2019 1 次提交
  4. 16 9月, 2019 2 次提交
    • G
      f2fs: fix inode rwsem regression · cb8434f1
      Goldwyn Rodrigues 提交于
      This is similar to 942491c9 ("xfs: fix AIM7 regression")
      Apparently our current rwsem code doesn't like doing the trylock, then
      lock for real scheme.  So change our read/write methods to just do the
      trylock for the RWF_NOWAIT case.
      
      We don't need a check for IOCB_NOWAIT and !direct-IO because it
      is checked in generic_write_checks().
      
      Fixes: b91050a8 ("f2fs: add nowait aio support")
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cb8434f1
    • J
      f2fs: avoid infinite GC loop due to stale atomic files · 743b620c
      Jaegeuk Kim 提交于
      If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
      get commited pages but atomic_file being still set like:
      
      - inmem:    0, atomic IO:    4 (Max.   10), volatile IO:    0 (Max.    0)
      
      If GC selects this block, we can get an infinite loop like this:
      
      f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
      f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
      f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
      f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
      f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
      f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
      f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
      f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
      
      In that moment, we can observe:
      
      [Before]
      Try to move 5084219 blocks (BG: 384508)
        - data blocks : 4962373 (274483)
        - node blocks : 121846 (110025)
      Skipped : atomic write 4534686 (10)
      
      [After]
      Try to move 5088973 blocks (BG: 384508)
        - data blocks : 4967127 (274483)
        - node blocks : 121846 (110025)
      Skipped : atomic write 4539440 (10)
      
      So, refactor atomic_write flow like this:
      1. start_atomic_write
       - add inmem_list and set atomic_file
      
      2. write()
       - register it in inmem_pages
      
      3. commit_atomic_write
       - if no error, f2fs_drop_inmem_pages()
       - f2fs_commit_inmme_pages() failed
         : __revoked_inmem_pages() was done
       - f2fs_do_sync_file failed
         : abort_atomic_write later
      
      4. abort_atomic_write
       - f2fs_drop_inmem_pages
      
      5. f2fs_drop_inmem_pages
       - clear atomic_file
       - remove inmem_list
      
      Based on this change, when GC fails to move block in atomic_file,
      f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      743b620c
  5. 07 9月, 2019 2 次提交
    • J
      f2fs: convert inline_data in prior to i_size_write · cfb9a34d
      Jaegeuk Kim 提交于
      In below call path, we change i_size before inline conversion, however,
      if we failed to convert inline inode, the inode may have wrong i_size
      which is larger than max inline size, result inline inode corruption.
      
      - f2fs_setattr
       - truncate_setsize
       - f2fs_convert_inline_inode
      
      This patch reorders truncate_setsize() and f2fs_convert_inline_inode()
      to guarantee inline_data has valid i_size.
      
      Fixes: 0cab80ee ("f2fs: fix to convert inline inode in ->setattr")
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cfb9a34d
    • C
      f2fs: enhance f2fs_is_checkpoint_ready()'s readability · 00e09c0b
      Chao Yu 提交于
      This patch changes sematics of f2fs_is_checkpoint_ready()'s return
      value as: return true when checkpoint is ready, other return false,
      it can improve readability of below conditions.
      
      f2fs_submit_page_write()
      ...
      	if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
      				!f2fs_is_checkpoint_ready(sbi))
      		__submit_merged_bio(io);
      
      f2fs_balance_fs()
      ...
      	if (!f2fs_is_checkpoint_ready(sbi))
      		return;
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      00e09c0b
  6. 30 8月, 2019 1 次提交
    • D
      timestamp_truncate: Replace users of timespec64_trunc · 3818c190
      Deepa Dinamani 提交于
      Update the inode timestamp updates to use timestamp_truncate()
      instead of timespec64_trunc().
      
      The change was mostly generated by the following coccinelle
      script.
      
      virtual context
      virtual patch
      
      @r1 depends on patch forall@
      struct inode *inode;
      identifier i_xtime =~ "^i_[acm]time$";
      expression e;
      @@
      
      inode->i_xtime =
      - timespec64_trunc(
      + timestamp_truncate(
      ...,
      - e);
      + inode);
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Cc: adrian.hunter@intel.com
      Cc: dedekind1@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: hch@lst.de
      Cc: jaegeuk@kernel.org
      Cc: jlbec@evilplan.org
      Cc: richard@nod.at
      Cc: tj@kernel.org
      Cc: yuchao0@huawei.com
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: linux-mtd@lists.infradead.org
      3818c190
  7. 23 8月, 2019 7 次提交
    • C
      f2fs: support FS_IOC_{GET,SET}FSLABEL · 4507847c
      Chao Yu 提交于
      Support two generic fs ioctls FS_IOC_{GET,SET}FSLABEL, letting
      f2fs pass generic/492 testcase.
      
      Fixes were made by Eric where:
       - f2fs: fix buffer overruns in FS_IOC_{GET, SET}FSLABEL
         utf16s_to_utf8s() and utf8s_to_utf16s() take the number of characters,
         not the number of bytes.
      
       - f2fs: fix copying too many bytes in FS_IOC_SETFSLABEL
         Userspace provides a null-terminated string, so don't assume that the
         full FSLABEL_MAX bytes can always be copied.
      
       - f2fs: add missing authorization check in FS_IOC_SETFSLABEL
         FS_IOC_SETFSLABEL modifies the filesystem superblock, so it shouldn't be
         allowed to regular users.  Require CAP_SYS_ADMIN, like xfs and btrfs do.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4507847c
    • L
      f2fs: remove duplicate code in f2fs_file_write_iter · 0b86f789
      Lihong Kou 提交于
      We will do the same check in generic_write_checks.
      if (iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)
              return -EINVAL;
      just remove the same check in f2fs_file_write_iter.
      Signed-off-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b86f789
    • C
      f2fs: fix to migrate blocks correctly during defragment · d3a1a0e1
      Chao Yu 提交于
      During defragment, we missed to trigger fragmented blocks migration
      for below condition:
      
      In defragment region:
      - total number of valid blocks is smaller than 512;
      - the tail part of the region are all holes;
      
      In addtion, return zero to user via range->len if there is no
      fragmented blocks.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d3a1a0e1
    • C
      f2fs: fix to use more generic EOPNOTSUPP · fd114ab2
      Chao Yu 提交于
      EOPNOTSUPP is widely used as error number indicating operation is
      not supported in syscall, and ENOTSUPP was defined and only used
      for NFSv3 protocol, so use EOPNOTSUPP instead.
      
      Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fd114ab2
    • D
      f2fs: Support case-insensitive file name lookups · 2c2eb7a3
      Daniel Rosenberg 提交于
      Modeled after commit b886ee3e ("ext4: Support case-insensitive file
      name lookups")
      
      """
      This patch implements the actual support for case-insensitive file name
      lookups in f2fs, based on the feature bit and the encoding stored in the
      superblock.
      
      A filesystem that has the casefold feature set is able to configure
      directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
      to succeed in that directory in a case-insensitive fashion, i.e: match
      a directory entry even if the name used by userspace is not a byte per
      byte match with the disk name, but is an equivalent case-insensitive
      version of the Unicode string.  This operation is called a
      case-insensitive file name lookup.
      
      The feature is configured as an inode attribute applied to directories
      and inherited by its children.  This attribute can only be enabled on
      empty directories for filesystems that support the encoding feature,
      thus preventing collision of file names that only differ by case.
      
      * dcache handling:
      
      For a +F directory, F2Fs only stores the first equivalent name dentry
      used in the dcache. This is done to prevent unintentional duplication of
      dentries in the dcache, while also allowing the VFS code to quickly find
      the right entry in the cache despite which equivalent string was used in
      a previous lookup, without having to resort to ->lookup().
      
      d_hash() of casefolded directories is implemented as the hash of the
      casefolded string, such that we always have a well-known bucket for all
      the equivalencies of the same string. d_compare() uses the
      utf8_strncasecmp() infrastructure, which handles the comparison of
      equivalent, same case, names as well.
      
      For now, negative lookups are not inserted in the dcache, since they
      would need to be invalidated anyway, because we can't trust missing file
      dentries.  This is bad for performance but requires some leveraging of
      the vfs layer to fix.  We can live without that for now, and so does
      everyone else.
      
      * on-disk data:
      
      Despite using a specific version of the name as the internal
      representation within the dcache, the name stored and fetched from the
      disk is a byte-per-byte match with what the user requested, making this
      implementation 'name-preserving'. i.e. no actual information is lost
      when writing to storage.
      
      DX is supported by modifying the hashes used in +F directories to make
      them case/encoding-aware.  The new disk hashes are calculated as the
      hash of the full casefolded string, instead of the string directly.
      This allows us to efficiently search for file names in the htree without
      requiring the user to provide an exact name.
      
      * Dealing with invalid sequences:
      
      By default, when a invalid UTF-8 sequence is identified, ext4 will treat
      it as an opaque byte sequence, ignoring the encoding and reverting to
      the old behavior for that unique file.  This means that case-insensitive
      file name lookup will not work only for that file.  An optional bit can
      be set in the superblock telling the filesystem code and userspace tools
      to enforce the encoding.  When that optional bit is set, any attempt to
      create a file name using an invalid UTF-8 sequence will fail and return
      an error to userspace.
      
      * Normalization algorithm:
      
      The UTF-8 algorithms used to compare strings in f2fs is implemented
      in fs/unicode, and is based on a previous version developed by
      SGI.  It implements the Canonical decomposition (NFD) algorithm
      described by the Unicode specification 12.1, or higher, combined with
      the elimination of ignorable code points (NFDi) and full
      case-folding (CF) as documented in fs/unicode/utf8_norm.c.
      
      NFD seems to be the best normalization method for F2FS because:
      
        - It has a lower cost than NFC/NFKC (which requires
          decomposing to NFD as an intermediary step)
        - It doesn't eliminate important semantic meaning like
          compatibility decompositions.
      
      Although:
      
      - This implementation is not completely linguistic accurate, because
      different languages have conflicting rules, which would require the
      specialization of the filesystem to a given locale, which brings all
      sorts of problems for removable media and for users who use more than
      one language.
      """
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2c2eb7a3
    • C
      f2fs: disallow direct IO in atomic write · 038d0698
      Chao Yu 提交于
      Atomic write needs page cache to cache data of transaction,
      direct IO should never be allowed in atomic write, detect
      and deny it when open atomic write file.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      038d0698
    • C
      f2fs: fix to spread f2fs_is_checkpoint_ready() · 955ebcd3
      Chao Yu 提交于
      We missed to call f2fs_is_checkpoint_ready() in several places, it may
      allow space allocation even when free space was exhausted during
      checkpoint is disabled, fix to add them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      955ebcd3
  8. 13 8月, 2019 2 次提交
    • E
      f2fs: add fs-verity support · 95ae251f
      Eric Biggers 提交于
      Add fs-verity support to f2fs.  fs-verity is a filesystem feature that
      enables transparent integrity protection and authentication of read-only
      files.  It uses a dm-verity like mechanism at the file level: a Merkle
      tree is used to verify any block in the file in log(filesize) time.  It
      is implemented mainly by helper functions in fs/verity/.  See
      Documentation/filesystems/fsverity.rst for the full documentation.
      
      The f2fs support for fs-verity consists of:
      
      - Adding a filesystem feature flag and an inode flag for fs-verity.
      
      - Implementing the fsverity_operations to support enabling verity on an
        inode and reading/writing the verity metadata.
      
      - Updating ->readpages() to verify data as it's read from verity files
        and to support reading verity metadata pages.
      
      - Updating ->write_begin(), ->write_end(), and ->writepages() to support
        writing verity metadata pages.
      
      - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
      
      Like ext4, f2fs stores the verity metadata (Merkle tree and
      fsverity_descriptor) past the end of the file, starting at the first 64K
      boundary beyond i_size.  This approach works because (a) verity files
      are readonly, and (b) pages fully beyond i_size aren't visible to
      userspace but can be read/written internally by f2fs with only some
      relatively small changes to f2fs.  Extended attributes cannot be used
      because (a) f2fs limits the total size of an inode's xattr entries to
      4096 bytes, which wouldn't be enough for even a single Merkle tree
      block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
      metadata *must* be encrypted when the file is because it contains hashes
      of the plaintext data.
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Acked-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      95ae251f
    • E
      f2fs: wire up new fscrypt ioctls · 8ce589c7
      Eric Biggers 提交于
      Wire up the new ioctls for adding and removing fscrypt keys to/from the
      filesystem, and the new ioctl for retrieving v2 encryption policies.
      
      The key removal ioctls also required making f2fs_drop_inode() call
      fscrypt_drop_inode().
      
      For more details see Documentation/filesystems/fscrypt.rst and the
      fscrypt patches that added the implementation of these ioctls.
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      8ce589c7
  9. 13 7月, 2019 3 次提交
  10. 11 7月, 2019 1 次提交
  11. 03 7月, 2019 5 次提交
  12. 22 6月, 2019 1 次提交
    • E
      f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags · 36098557
      Eric Biggers 提交于
      f2fs copied all the on-disk i_flags from ext4, and along with it the
      assumption that the on-disk i_flags are the same as the bits used by
      FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.  This is problematic because
      reserving an on-disk inode flag in either filesystem's i_flags or in
      these ioctls effectively reserves it in all the other places too.  In
      fact, most of the "f2fs i_flags" are not used by f2fs at all.
      
      Fix this by separating f2fs's i_flags from the ioctl bits and ext4's
      i_flags.
      
      In the process, un-reserve all "f2fs i_flags" that aren't actually
      supported by f2fs.  This included various flags that were not settable
      at all, as well as various flags that were settable by FS_IOC_SETFLAGS
      but didn't actually do anything.
      
      There's a slight chance we'll need to add some flag(s) back to
      FS_IOC_SETFLAGS in order to avoid breaking users who expect f2fs to
      accept some random flag(s).  But hopefully such users don't exist.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36098557
  13. 09 5月, 2019 5 次提交
  14. 17 4月, 2019 1 次提交
  15. 06 4月, 2019 1 次提交
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  16. 15 3月, 2019 1 次提交
  17. 13 3月, 2019 2 次提交
  18. 06 3月, 2019 1 次提交
    • C
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu 提交于
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce