1. 23 8月, 2019 19 次提交
    • C
      f2fs: use wrapped f2fs_cp_error() · 33ac18a1
      Chao Yu 提交于
      Just cleanup, no logic change.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      33ac18a1
    • C
      f2fs: fix to use more generic EOPNOTSUPP · fd114ab2
      Chao Yu 提交于
      EOPNOTSUPP is widely used as error number indicating operation is
      not supported in syscall, and ENOTSUPP was defined and only used
      for NFSv3 protocol, so use EOPNOTSUPP instead.
      
      Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fd114ab2
    • C
      f2fs: use wrapped IS_SWAPFILE() · 3ee0c5d3
      Chao Yu 提交于
      Just cleanup, no logic change.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3ee0c5d3
    • D
      f2fs: Support case-insensitive file name lookups · 2c2eb7a3
      Daniel Rosenberg 提交于
      Modeled after commit b886ee3e ("ext4: Support case-insensitive file
      name lookups")
      
      """
      This patch implements the actual support for case-insensitive file name
      lookups in f2fs, based on the feature bit and the encoding stored in the
      superblock.
      
      A filesystem that has the casefold feature set is able to configure
      directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
      to succeed in that directory in a case-insensitive fashion, i.e: match
      a directory entry even if the name used by userspace is not a byte per
      byte match with the disk name, but is an equivalent case-insensitive
      version of the Unicode string.  This operation is called a
      case-insensitive file name lookup.
      
      The feature is configured as an inode attribute applied to directories
      and inherited by its children.  This attribute can only be enabled on
      empty directories for filesystems that support the encoding feature,
      thus preventing collision of file names that only differ by case.
      
      * dcache handling:
      
      For a +F directory, F2Fs only stores the first equivalent name dentry
      used in the dcache. This is done to prevent unintentional duplication of
      dentries in the dcache, while also allowing the VFS code to quickly find
      the right entry in the cache despite which equivalent string was used in
      a previous lookup, without having to resort to ->lookup().
      
      d_hash() of casefolded directories is implemented as the hash of the
      casefolded string, such that we always have a well-known bucket for all
      the equivalencies of the same string. d_compare() uses the
      utf8_strncasecmp() infrastructure, which handles the comparison of
      equivalent, same case, names as well.
      
      For now, negative lookups are not inserted in the dcache, since they
      would need to be invalidated anyway, because we can't trust missing file
      dentries.  This is bad for performance but requires some leveraging of
      the vfs layer to fix.  We can live without that for now, and so does
      everyone else.
      
      * on-disk data:
      
      Despite using a specific version of the name as the internal
      representation within the dcache, the name stored and fetched from the
      disk is a byte-per-byte match with what the user requested, making this
      implementation 'name-preserving'. i.e. no actual information is lost
      when writing to storage.
      
      DX is supported by modifying the hashes used in +F directories to make
      them case/encoding-aware.  The new disk hashes are calculated as the
      hash of the full casefolded string, instead of the string directly.
      This allows us to efficiently search for file names in the htree without
      requiring the user to provide an exact name.
      
      * Dealing with invalid sequences:
      
      By default, when a invalid UTF-8 sequence is identified, ext4 will treat
      it as an opaque byte sequence, ignoring the encoding and reverting to
      the old behavior for that unique file.  This means that case-insensitive
      file name lookup will not work only for that file.  An optional bit can
      be set in the superblock telling the filesystem code and userspace tools
      to enforce the encoding.  When that optional bit is set, any attempt to
      create a file name using an invalid UTF-8 sequence will fail and return
      an error to userspace.
      
      * Normalization algorithm:
      
      The UTF-8 algorithms used to compare strings in f2fs is implemented
      in fs/unicode, and is based on a previous version developed by
      SGI.  It implements the Canonical decomposition (NFD) algorithm
      described by the Unicode specification 12.1, or higher, combined with
      the elimination of ignorable code points (NFDi) and full
      case-folding (CF) as documented in fs/unicode/utf8_norm.c.
      
      NFD seems to be the best normalization method for F2FS because:
      
        - It has a lower cost than NFC/NFKC (which requires
          decomposing to NFD as an intermediary step)
        - It doesn't eliminate important semantic meaning like
          compatibility decompositions.
      
      Although:
      
      - This implementation is not completely linguistic accurate, because
      different languages have conflicting rules, which would require the
      specialization of the filesystem to a given locale, which brings all
      sorts of problems for removable media and for users who use more than
      one language.
      """
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2c2eb7a3
    • D
      f2fs: include charset encoding information in the superblock · 5aba5430
      Daniel Rosenberg 提交于
      Add charset encoding to f2fs to support casefolding. It is modeled after
      the same feature introduced in commit c83ad55e ("ext4: include charset
      encoding information in the superblock")
      
      Currently this is not compatible with encryption, similar to the current
      ext4 imlpementation. This will change in the future.
      
      >From the ext4 patch:
      """
      The s_encoding field stores a magic number indicating the encoding
      format and version used globally by file and directory names in the
      filesystem.  The s_encoding_flags defines policies for using the charset
      encoding, like how to handle invalid sequences.  The magic number is
      mapped to the exact charset table, but the mapping is specific to ext4.
      Since we don't have any commitment to support old encodings, the only
      encoding I am supporting right now is utf8-12.1.0.
      
      The current implementation prevents the user from enabling encoding and
      per-directory encryption on the same filesystem at the same time.  The
      incompatibility between these features lies in how we do efficient
      directory searches when we cannot be sure the encryption of the user
      provided fname will match the actual hash stored in the disk without
      decrypting every directory entry, because of normalization cases.  My
      quickest solution is to simply block the concurrent use of these
      features for now, and enable it later, once we have a better solution.
      """
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5aba5430
    • D
      fs: Reserve flag for casefolding · 71e90b46
      Daniel Rosenberg 提交于
      In preparation for including the casefold feature within f2fs, elevate
      the EXT4_CASEFOLD_FL flag to FS_CASEFOLD_FL.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      71e90b46
    • C
      f2fs: fix to avoid call kvfree under spinlock · 0921835c
      Chao Yu 提交于
      vfree() don't wish to be called from interrupt context, move it
      out of spin_lock_irqsave() coverage.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0921835c
    • J
      fs: f2fs: Remove unnecessary checks of SM_I(sbi) in update_general_status() · 280fd422
      Jia-Ju Bai 提交于
      In fill_super() and put_super(), f2fs_destroy_stats() is called
      in prior to f2fs_destroy_segment_manager(), so if current
      sbi can still be visited in global stat list, SM_I(sbi) should be
      released yet.
      For this reason, SM_I(sbi) does not need to be checked in
      update_general_status().
      Thank Chao Yu for advice.
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      280fd422
    • C
      f2fs: disallow direct IO in atomic write · 038d0698
      Chao Yu 提交于
      Atomic write needs page cache to cache data of transaction,
      direct IO should never be allowed in atomic write, detect
      and deny it when open atomic write file.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      038d0698
    • C
      f2fs: fix to handle quota_{on,off} correctly · fe973b06
      Chao Yu 提交于
      With quota_ino feature on, generic/232 reports an inconsistence issue
      on the image.
      
      The root cause is that the testcase tries to:
      - use quotactl to shutdown journalled quota based on sysfile;
      - and then use quotactl to enable/turn on quota based on specific file
      (aquota.user or aquota.group).
      
      Eventually, quota sysfile will be out-of-update due to following specific
      file creation.
      
      Change as below to fix this issue:
      - deny enabling quota based on specific file if quota sysfile exists.
      - set SBI_QUOTA_NEED_REPAIR once sysfile based quota shutdowns via
      ioctl.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe973b06
    • C
      f2fs: fix to detect cp error in f2fs_setxattr() · a25c2cdc
      Chao Yu 提交于
      It needs to return -EIO if filesystem has been shutdown, fix the
      miss case in f2fs_setxattr().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a25c2cdc
    • C
      f2fs: fix to spread f2fs_is_checkpoint_ready() · 955ebcd3
      Chao Yu 提交于
      We missed to call f2fs_is_checkpoint_ready() in several places, it may
      allow space allocation even when free space was exhausted during
      checkpoint is disabled, fix to add them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      955ebcd3
    • C
      f2fs: support fiemap() for directory inode · 7975f349
      Chao Yu 提交于
      Adjust f2fs_fiemap() to support fiemap() on directory inode.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7975f349
    • C
      f2fs: fix to avoid discard command leak · 04f9287a
      Chao Yu 提交于
       =============================================================================
       BUG discard_cmd (Tainted: G    B      OE  ): Objects remaining in discard_cmd on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0xffffe1ac481d22c0 objects=36 used=2 fp=0xffff936b4748bf50 flags=0x2ffff0000000100
       Call Trace:
        dump_stack+0x63/0x87
        slab_err+0xa1/0xb0
        __kmem_cache_shutdown+0x183/0x390
        shutdown_cache+0x14/0x110
        kmem_cache_destroy+0x195/0x1c0
        f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
        exit_f2fs_fs+0x35/0x641 [f2fs]
        SyS_delete_module+0x155/0x230
        ? vtime_user_exit+0x29/0x70
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
      
       INFO: Object 0xffff936b4748b000 @offset=0
       INFO: Object 0xffff936b4748b070 @offset=112
       kmem_cache_destroy discard_cmd: Slab cache still has objects
       Call Trace:
        dump_stack+0x63/0x87
        kmem_cache_destroy+0x1b4/0x1c0
        f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
        exit_f2fs_fs+0x35/0x641 [f2fs]
        SyS_delete_module+0x155/0x230
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
      
      Recovery can cache discard commands, so in error path of fill_super(),
      we need give a chance to handle them, otherwise it will lead to leak
      of discard_cmd slab cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04f9287a
    • C
      f2fs: fix to avoid tagging SBI_QUOTA_NEED_REPAIR incorrectly · 0f1898f9
      Chao Yu 提交于
      On a quota disabled image, with fault injection, SBI_QUOTA_NEED_REPAIR
      will be set incorrectly in error path of f2fs_evict_inode(), fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0f1898f9
    • C
      f2fs: fix to drop meta/node pages during umount · a8933b6b
      Chao Yu 提交于
      As reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=204193
      
      A null pointer dereference bug is triggered in f2fs under kernel-5.1.3.
      
       kasan_report.cold+0x5/0x32
       f2fs_write_end_io+0x215/0x650
       bio_endio+0x26e/0x320
       blk_update_request+0x209/0x5d0
       blk_mq_end_request+0x2e/0x230
       lo_complete_rq+0x12c/0x190
       blk_done_softirq+0x14a/0x1a0
       __do_softirq+0x119/0x3e5
       irq_exit+0x94/0xe0
       call_function_single_interrupt+0xf/0x20
      
      During umount, we will access NULL sbi->node_inode pointer in
      f2fs_write_end_io():
      
      	f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
      				page->index != nid_of_node(page));
      
      The reason is if disable_checkpoint mount option is on, meta dirty
      pages can remain during umount, and then be flushed by iput() of
      meta_inode, however node_inode has been iput()ed before
      meta_inode's iput().
      
      Since checkpoint is disabled, all meta/node datas are useless and
      should be dropped in next mount, so in umount, let's adjust
      drop_inode() to give a hint to iput_final() to drop all those dirty
      datas correctly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a8933b6b
    • C
      f2fs: disallow switching io_bits option during remount · 1f78adfa
      Chao Yu 提交于
      If IO alignment feature is turned on after remount, we didn't
      initialize mempool of it, it turns out we will encounter panic
      during IO submission due to access NULL mempool pointer.
      
      This feature should be set only at mount time, so simply deny
      configuring during remount.
      
      This fixes bug reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=204135Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1f78adfa
    • C
      f2fs: fix panic of IO alignment feature · c72db71e
      Chao Yu 提交于
      Since 07173c3e ("block: enable multipage bvecs"), one bio vector
      can store multi pages, so that we can not calculate max IO size of
      bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of
      f2fs always has that assumption, so finally, it may cause panic during
      IO submission as below stack.
      
       kernel BUG at fs/f2fs/data.c:317!
       RIP: 0010:__submit_merged_bio+0x8b0/0x8c0
       Call Trace:
        f2fs_submit_page_write+0x3cd/0xdd0
        do_write_page+0x15d/0x360
        f2fs_outplace_write_data+0xd7/0x210
        f2fs_do_write_data_page+0x43b/0xf30
        __write_data_page+0xcf6/0x1140
        f2fs_write_cache_pages+0x3ba/0xb40
        f2fs_write_data_pages+0x3dd/0x8b0
        do_writepages+0xbb/0x1e0
        __writeback_single_inode+0xb6/0x800
        writeback_sb_inodes+0x441/0x910
        wb_writeback+0x261/0x650
        wb_workfn+0x1f9/0x7a0
        process_one_work+0x503/0x970
        worker_thread+0x7d/0x820
        kthread+0x1ad/0x210
        ret_from_fork+0x35/0x40
      
      This patch adds one extra condition to check left space in bio while
      trying merging page to bio, to avoid panic.
      
      This bug was reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=204043Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c72db71e
    • C
      f2fs: introduce {page,io}_is_mergeable() for readability · 8896cbdf
      Chao Yu 提交于
      Wrap merge condition into function for readability, no logic change.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8896cbdf
  2. 17 8月, 2019 4 次提交
    • J
      f2fs: fix livelock in swapfile writes · 75a037f3
      Jaegeuk Kim 提交于
      This patch fixes livelock in the below call path when writing swap pages.
      
      [46374.617256] c2    701  __switch_to+0xe4/0x100
      [46374.617265] c2    701  __schedule+0x80c/0xbc4
      [46374.617273] c2    701  schedule+0x74/0x98
      [46374.617281] c2    701  rwsem_down_read_failed+0x190/0x234
      [46374.617291] c2    701  down_read+0x58/0x5c
      [46374.617300] c2    701  f2fs_map_blocks+0x138/0x9a8
      [46374.617310] c2    701  get_data_block_dio_write+0x74/0x104
      [46374.617320] c2    701  __blockdev_direct_IO+0x1350/0x3930
      [46374.617331] c2    701  f2fs_direct_IO+0x55c/0x8bc
      [46374.617341] c2    701  __swap_writepage+0x1d0/0x3e8
      [46374.617351] c2    701  swap_writepage+0x44/0x54
      [46374.617360] c2    701  shrink_page_list+0x140/0xe80
      [46374.617371] c2    701  shrink_inactive_list+0x510/0x918
      [46374.617381] c2    701  shrink_node_memcg+0x2d4/0x804
      [46374.617391] c2    701  shrink_node+0x10c/0x2f8
      [46374.617400] c2    701  do_try_to_free_pages+0x178/0x38c
      [46374.617410] c2    701  try_to_free_pages+0x348/0x4b8
      [46374.617419] c2    701  __alloc_pages_nodemask+0x7f8/0x1014
      [46374.617429] c2    701  pagecache_get_page+0x184/0x2cc
      [46374.617438] c2    701  f2fs_new_node_page+0x60/0x41c
      [46374.617449] c2    701  f2fs_new_inode_page+0x50/0x7c
      [46374.617460] c2    701  f2fs_init_inode_metadata+0x128/0x530
      [46374.617472] c2    701  f2fs_add_inline_entry+0x138/0xd64
      [46374.617480] c2    701  f2fs_do_add_link+0xf4/0x178
      [46374.617488] c2    701  f2fs_create+0x1e4/0x3ac
      [46374.617497] c2    701  path_openat+0xdc0/0x1308
      [46374.617507] c2    701  do_filp_open+0x78/0x124
      [46374.617516] c2    701  do_sys_open+0x134/0x248
      [46374.617525] c2    701  SyS_openat+0x14/0x20
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      75a037f3
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · b7e7c85d
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Don't taint the kernel if CPUs have different sets of page sizes
         supported (other than the one in use).
      
       - Issue I-cache maintenance for module ftrace trampoline.
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
        arm64: cpufeature: Don't treat granule sizes as strict
      b7e7c85d
    • W
      arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side · b6143d10
      Will Deacon 提交于
      The initial support for dynamic ftrace trampolines in modules made use
      of an indirect branch which loaded its target from the beginning of
      a special section (e71a4e1b ("arm64: ftrace: add support for far
      branches to dynamic ftrace")). Since no instructions were being patched,
      no cache maintenance was needed. However, later in be0f272b ("arm64:
      ftrace: emit ftrace-mod.o contents through code") this code was reworked
      to output the trampoline instructions directly into the PLT entry but,
      unfortunately, the necessary cache maintenance was overlooked.
      
      Add a call to __flush_icache_range() after writing the new trampoline
      instructions but before patching in the branch to the trampoline.
      
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: <stable@vger.kernel.org>
      Fixes: be0f272b ("arm64: ftrace: emit ftrace-mod.o contents through code")
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      b6143d10
    • L
      Merge tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 2d63ba3e
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These add a check to avoid recent suspend-to-idle power regression on
        systems with NVMe drives where the PCIe ASPM policy is "performance"
        (or when the kernel is built without ASPM support), fix an issue
        related to frequency limits in the schedutil cpufreq governor and fix
        a mistake related to the PM QoS usage in the cpufreq core introduced
        recently.
      
        Specifics:
      
         - Disable NVMe power optimization related to suspend-to-idle added
           recently on systems where PCIe ASPM is not able to put PCIe links
           into low-power states to prevent excess power from being drawn by
           the system while suspended (Rafael Wysocki).
      
         - Make the schedutil governor handle frequency limits changes
           properly in all cases (Viresh Kumar).
      
         - Prevent the cpufreq core from treating positive values returned by
           dev_pm_qos_update_request() as errors (Viresh Kumar)"
      
      * tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
        PCI/ASPM: Add pcie_aspm_enabled()
        cpufreq: schedutil: Don't skip freq update when limits change
        cpufreq: dev_pm_qos_update_request() can return 1 on success
      2d63ba3e
  3. 16 8月, 2019 10 次提交
  4. 15 8月, 2019 7 次提交