1. 20 8月, 2020 1 次提交
    • B
      ext4: limit the length of per-inode prealloc list · 27bc446e
      brookxu 提交于
      In the scenario of writing sparse files, the per-inode prealloc list may
      be very long, resulting in high overhead for ext4_mb_use_preallocated().
      To circumvent this problem, we limit the maximum length of per-inode
      prealloc list to 512 and allow users to modify it.
      
      After patching, we observed that the sys ratio of cpu has dropped, and
      the system throughput has increased significantly. We created a process
      to write the sparse file, and the running time of the process on the
      fixed kernel was significantly reduced, as follows:
      
      Running time on unfixed kernel:
      [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
      real    0m2.051s
      user    0m0.008s
      sys     0m2.026s
      
      Running time on fixed kernel:
      [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
      real    0m0.471s
      user    0m0.004s
      sys     0m0.395s
      Signed-off-by: NChunguang Xu <brookxu@tencent.com>
      Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      27bc446e
  2. 06 8月, 2020 1 次提交
  3. 04 6月, 2020 3 次提交
  4. 29 5月, 2020 4 次提交
  5. 20 5月, 2020 1 次提交
  6. 20 3月, 2020 1 次提交
  7. 06 3月, 2020 1 次提交
    • E
      ext4: remove EXT4_EOFBLOCKS_FL and associated code · 4337ecd1
      Eric Whitney 提交于
      The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file
      contains unwritten blocks past i_size.  It's set when ext4_fallocate
      is called with the KEEP_SIZE flag to extend a file with an unwritten
      extent.  However, this flag hasn't been useful functionally since
      March, 2012, when a decision was made to remove it from ext4.
      
      All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version
      1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag
      handling") at that time.  Now that enough time has passed to make
      e2fsprogs versions containing this modification common, this patch now
      removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as
      well.
      
      This change has two implications.  First, because pre-1.42.2 e2fsck
      versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and
      because that bit will never be set by newer kernels containing this
      patch, old versions of e2fsck won't have a compatibility problem with
      files created by newer kernels.
      
      Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits
      belonging to a file written by an older kernel.  If set, it will remain
      in that state until the file is deleted.  Because e2fsck versions since
      1.42.2 don't check the flag at all, no adverse effect is expected.
      However, pre-1.42.2 e2fsck versions that do check the flag may report
      that it is set when it ought not to be after a file has been truncated
      or had its unwritten blocks written.  In this case, the old version of
      e2fsck will offer to clear the flag.  No adverse effect would then
      occur whether the user chooses to clear the flag or not.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      4337ecd1
  8. 18 1月, 2020 1 次提交
  9. 23 10月, 2019 1 次提交
  10. 31 8月, 2019 1 次提交
    • C
      ext4 crypto: fix to check feature status before get policy · 0642ea24
      Chao Yu 提交于
      When getting fscrypt policy via EXT4_IOC_GET_ENCRYPTION_POLICY, if
      encryption feature is off, it's better to return EOPNOTSUPP instead of
      ENODATA, so let's add ext4_has_feature_encrypt() to do the check for
      that.
      
      This makes it so that all fscrypt ioctls consistently check for the
      encryption feature, and makes ext4 consistent with f2fs in this regard.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [EB - removed unneeded braces, updated the documentation, and
            added more explanation to commit message]
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      0642ea24
  11. 13 8月, 2019 2 次提交
    • E
      ext4: add basic fs-verity support · c93d8f88
      Eric Biggers 提交于
      Add most of fs-verity support to ext4.  fs-verity is a filesystem
      feature that enables transparent integrity protection and authentication
      of read-only files.  It uses a dm-verity like mechanism at the file
      level: a Merkle tree is used to verify any block in the file in
      log(filesize) time.  It is implemented mainly by helper functions in
      fs/verity/.  See Documentation/filesystems/fsverity.rst for the full
      documentation.
      
      This commit adds all of ext4 fs-verity support except for the actual
      data verification, including:
      
      - Adding a filesystem feature flag and an inode flag for fs-verity.
      
      - Implementing the fsverity_operations to support enabling verity on an
        inode and reading/writing the verity metadata.
      
      - Updating ->write_begin(), ->write_end(), and ->writepages() to support
        writing verity metadata pages.
      
      - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
      
      ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
      past the end of the file, starting at the first 64K boundary beyond
      i_size.  This approach works because (a) verity files are readonly, and
      (b) pages fully beyond i_size aren't visible to userspace but can be
      read/written internally by ext4 with only some relatively small changes
      to ext4.  This approach avoids having to depend on the EA_INODE feature
      and on rearchitecturing ext4's xattr support to support paging
      multi-gigabyte xattrs into memory, and to support encrypting xattrs.
      Note that the verity metadata *must* be encrypted when the file is,
      since it contains hashes of the plaintext data.
      
      This patch incorporates work by Theodore Ts'o and Chandan Rajendra.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      c93d8f88
    • E
      ext4: wire up new fscrypt ioctls · 29b3692e
      Eric Biggers 提交于
      Wire up the new ioctls for adding and removing fscrypt keys to/from the
      filesystem, and the new ioctl for retrieving v2 encryption policies.
      
      The key removal ioctls also required making ext4_drop_inode() call
      fscrypt_drop_inode().
      
      For more details see Documentation/filesystems/fscrypt.rst and the
      fscrypt patches that added the implementation of these ioctls.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      29b3692e
  12. 12 8月, 2019 3 次提交
  13. 01 7月, 2019 3 次提交
  14. 10 6月, 2019 2 次提交
  15. 12 5月, 2019 1 次提交
  16. 26 4月, 2019 2 次提交
    • G
      ext4: Support case-insensitive file name lookups · b886ee3e
      Gabriel Krisman Bertazi 提交于
      This patch implements the actual support for case-insensitive file name
      lookups in ext4, based on the feature bit and the encoding stored in the
      superblock.
      
      A filesystem that has the casefold feature set is able to configure
      directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
      to succeed in that directory in a case-insensitive fashion, i.e: match
      a directory entry even if the name used by userspace is not a byte per
      byte match with the disk name, but is an equivalent case-insensitive
      version of the Unicode string.  This operation is called a
      case-insensitive file name lookup.
      
      The feature is configured as an inode attribute applied to directories
      and inherited by its children.  This attribute can only be enabled on
      empty directories for filesystems that support the encoding feature,
      thus preventing collision of file names that only differ by case.
      
      * dcache handling:
      
      For a +F directory, Ext4 only stores the first equivalent name dentry
      used in the dcache. This is done to prevent unintentional duplication of
      dentries in the dcache, while also allowing the VFS code to quickly find
      the right entry in the cache despite which equivalent string was used in
      a previous lookup, without having to resort to ->lookup().
      
      d_hash() of casefolded directories is implemented as the hash of the
      casefolded string, such that we always have a well-known bucket for all
      the equivalencies of the same string. d_compare() uses the
      utf8_strncasecmp() infrastructure, which handles the comparison of
      equivalent, same case, names as well.
      
      For now, negative lookups are not inserted in the dcache, since they
      would need to be invalidated anyway, because we can't trust missing file
      dentries.  This is bad for performance but requires some leveraging of
      the vfs layer to fix.  We can live without that for now, and so does
      everyone else.
      
      * on-disk data:
      
      Despite using a specific version of the name as the internal
      representation within the dcache, the name stored and fetched from the
      disk is a byte-per-byte match with what the user requested, making this
      implementation 'name-preserving'. i.e. no actual information is lost
      when writing to storage.
      
      DX is supported by modifying the hashes used in +F directories to make
      them case/encoding-aware.  The new disk hashes are calculated as the
      hash of the full casefolded string, instead of the string directly.
      This allows us to efficiently search for file names in the htree without
      requiring the user to provide an exact name.
      
      * Dealing with invalid sequences:
      
      By default, when a invalid UTF-8 sequence is identified, ext4 will treat
      it as an opaque byte sequence, ignoring the encoding and reverting to
      the old behavior for that unique file.  This means that case-insensitive
      file name lookup will not work only for that file.  An optional bit can
      be set in the superblock telling the filesystem code and userspace tools
      to enforce the encoding.  When that optional bit is set, any attempt to
      create a file name using an invalid UTF-8 sequence will fail and return
      an error to userspace.
      
      * Normalization algorithm:
      
      The UTF-8 algorithms used to compare strings in ext4 is implemented
      lives in fs/unicode, and is based on a previous version developed by
      SGI.  It implements the Canonical decomposition (NFD) algorithm
      described by the Unicode specification 12.1, or higher, combined with
      the elimination of ignorable code points (NFDi) and full
      case-folding (CF) as documented in fs/unicode/utf8_norm.c.
      
      NFD seems to be the best normalization method for EXT4 because:
      
        - It has a lower cost than NFC/NFKC (which requires
          decomposing to NFD as an intermediary step)
        - It doesn't eliminate important semantic meaning like
          compatibility decompositions.
      
      Although:
      
        - This implementation is not completely linguistic accurate, because
        different languages have conflicting rules, which would require the
        specialization of the filesystem to a given locale, which brings all
        sorts of problems for removable media and for users who use more than
        one language.
      Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b886ee3e
    • K
      ext4: actually request zeroing of inode table after grow · 310a997f
      Kirill Tkhai 提交于
      It is never possible, that number of block groups decreases,
      since only online grow is supported.
      
      But after a growing occured, we have to zero inode tables
      for just created new block groups.
      
      Fixes: 19c5246d ("ext4: add new online resize interface")
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      310a997f
  17. 24 3月, 2019 1 次提交
  18. 11 2月, 2019 5 次提交
  19. 24 1月, 2019 1 次提交
  20. 20 12月, 2018 1 次提交
    • T
      ext4: avoid declaring fs inconsistent due to invalid file handles · 8a363970
      Theodore Ts'o 提交于
      If we receive a file handle, either from NFS or open_by_handle_at(2),
      and it points at an inode which has not been initialized, and the file
      system has metadata checksums enabled, we shouldn't try to get the
      inode, discover the checksum is invalid, and then declare the file
      system as being inconsistent.
      
      This can be reproduced by creating a test file system via "mke2fs -t
      ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
      directory, and then running the following program.
      
      #define _GNU_SOURCE
      #include <fcntl.h>
      
      struct handle {
      	struct file_handle fh;
      	unsigned char fid[MAX_HANDLE_SZ];
      };
      
      int main(int argc, char **argv)
      {
      	struct handle h = {{8, 1 }, { 12, }};
      
      	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
      	return 0;
      }
      
      Google-Bug-Id: 120690101
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      8a363970
  21. 04 10月, 2018 1 次提交
  22. 03 10月, 2018 2 次提交
    • W
      ext4: fix setattr project check in fssetxattr ioctl · dc7ac6c4
      Wang Shilong 提交于
      Currently, project quota could be changed by fssetxattr
      ioctl, and existed permission check inode_owner_or_capable()
      is obviously not enough, just think that common users could
      change project id of file, that could make users to
      break project quota easily.
      
      This patch try to follow same regular of xfs project
      quota:
      
      "Project Quota ID state is only allowed to change from
      within the init namespace. Enforce that restriction only
      if we are trying to change the quota ID state.
      Everything else is allowed in user namespaces."
      
      Besides that, check and set project id'state should
      be an atomic operation, protect whole operation with
      inode lock, ext4_ioctl_setproject() is only used for
      ioctl EXT4_IOC_FSSETXATTR, we have held mnt_want_write_file()
      before ext4_ioctl_setflags(), and ext4_ioctl_setproject()
      is called after ext4_ioctl_setflags(), we could share
      codes, so remove it inside ext4_ioctl_setproject().
      Signed-off-by: NWang Shilong <wshilong@ddn.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: stable@kernel.org
      dc7ac6c4
    • T
      ext4: fix EXT4_IOC_SWAP_BOOT · 18aded17
      Theodore Ts'o 提交于
      The code EXT4_IOC_SWAP_BOOT ioctl hasn't been updated in a while, and
      it's a bit broken with respect to more modern ext4 kernels, especially
      metadata checksums.
      
      Other problems fixed with this commit:
      
      * Don't allow installing a DAX, swap file, or an encrypted file as a
        boot loader.
      
      * Respect the immutable and append-only flags.
      
      * Wait until any DIO operations are finished *before* calling
        truncate_inode_pages().
      
      * Don't swap inode->i_flags, since these flags have nothing to do with
        the inode blocks --- and it will give the IMA/audit code heartburn
        when the inode is evicted.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Reported-by: syzbot+e81ccd4744c6c4f71354@syzkaller.appspotmail.com
      18aded17
  23. 22 3月, 2018 1 次提交