1. 20 12月, 2018 1 次提交
    • T
      ext4: avoid declaring fs inconsistent due to invalid file handles · 8a363970
      Theodore Ts'o 提交于
      If we receive a file handle, either from NFS or open_by_handle_at(2),
      and it points at an inode which has not been initialized, and the file
      system has metadata checksums enabled, we shouldn't try to get the
      inode, discover the checksum is invalid, and then declare the file
      system as being inconsistent.
      
      This can be reproduced by creating a test file system via "mke2fs -t
      ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
      directory, and then running the following program.
      
      #define _GNU_SOURCE
      #include <fcntl.h>
      
      struct handle {
      	struct file_handle fh;
      	unsigned char fid[MAX_HANDLE_SZ];
      };
      
      int main(int argc, char **argv)
      {
      	struct handle h = {{8, 1 }, { 12, }};
      
      	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
      	return 0;
      }
      
      Google-Bug-Id: 120690101
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      8a363970
  2. 07 11月, 2018 1 次提交
  3. 21 10月, 2018 1 次提交
  4. 03 10月, 2018 1 次提交
  5. 02 10月, 2018 3 次提交
    • E
      ext4: fix reserved cluster accounting at page invalidation time · f456767d
      Eric Whitney 提交于
      Add new code to count canceled pending cluster reservations on bigalloc
      file systems and to reduce the cluster reservation count on all file
      systems using delayed allocation.  This replaces old code in
      ext4_da_page_release_reservations that was incorrect.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f456767d
    • E
      ext4: fix reserved cluster accounting at delayed write time · 0b02f4c0
      Eric Whitney 提交于
      The code in ext4_da_map_blocks sometimes reserves space for more
      delayed allocated clusters than it should, resulting in premature
      ENOSPC, exceeded quota, and inaccurate free space reporting.
      
      Fix this by checking for written and unwritten blocks shared in the
      same cluster with the newly delayed allocated block.  A cluster
      reservation should not be made for a cluster for which physical space
      has already been allocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0b02f4c0
    • E
      ext4: generalize extents status tree search functions · ad431025
      Eric Whitney 提交于
      Ext4 contains a few functions that are used to search for delayed
      extents or blocks in the extents status tree.  Rather than duplicate
      code to add new functions to search for extents with different status
      values, such as written or a combination of delayed and unwritten,
      generalize the existing code to search for caller-specified extents
      status values.  Also, move this code into extents_status.c where it
      is better associated with the data structures it operates upon, and
      where it can be more readily used to implement new extents status tree
      functions that might want a broader scope for i_es_lock.
      
      Three missing static specifiers in RFC version of patch reported and
      fixed by Fengguang Wu <fengguang.wu@intel.com>.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ad431025
  6. 16 9月, 2018 2 次提交
    • T
      ext4, dax: set ext4_dax_aops for dax files · cce6c9f7
      Toshi Kani 提交于
      Sync syscall to DAX file needs to flush processor cache, but it
      currently does not flush to existing DAX files.  This is because
      'ext4_da_aops' is set to address_space_operations of existing DAX
      files, instead of 'ext4_dax_aops', since S_DAX flag is set after
      ext4_set_aops() in the open path.
      
        New file
        --------
        lookup_open
          ext4_create
            __ext4_new_inode
              ext4_set_inode_flags   // Set S_DAX flag
            ext4_set_aops            // Set aops to ext4_dax_aops
      
        Existing file
        -------------
        lookup_open
          ext4_lookup
            ext4_iget
              ext4_set_aops          // Set aops to ext4_da_aops
              ext4_set_inode_flags   // Set S_DAX flag
      
      Change ext4_iget() to initialize i_flags before ext4_set_aops().
      
      Fixes: 5f0663bb ("ext4, dax: introduce ext4_dax_aops")
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Suggested-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      cce6c9f7
    • T
      ext4, dax: add ext4_bmap to ext4_dax_aops · 94dbb631
      Toshi Kani 提交于
      Ext4 mount path calls .bmap to the journal inode. This currently
      works for the DAX mount case because ext4_iget() always set
      'ext4_da_aops' to any regular files.
      
      In preparation to fix ext4_iget() to set 'ext4_dax_aops' for ext4
      DAX files, add ext4_bmap() to 'ext4_dax_aops', since bmap works for
      DAX inodes.
      
      Fixes: 5f0663bb ("ext4, dax: introduce ext4_dax_aops")
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Suggested-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      94dbb631
  7. 12 9月, 2018 1 次提交
  8. 02 9月, 2018 1 次提交
  9. 18 8月, 2018 1 次提交
  10. 02 8月, 2018 1 次提交
  11. 30 7月, 2018 2 次提交
  12. 10 7月, 2018 1 次提交
  13. 17 6月, 2018 1 次提交
  14. 16 6月, 2018 1 次提交
  15. 23 5月, 2018 1 次提交
  16. 14 5月, 2018 2 次提交
  17. 10 5月, 2018 1 次提交
    • E
      ext4: use raw i_version value for ea_inode · e254d1af
      Eryu Guan 提交于
      Currently, creating large xattr (e.g. 2k) in ea_inode would cause
      ea_inode refcount corruption, e.g.
      
        Pass 4: Checking reference counts
        Extended attribute inode 13 ref count is 0, should be 1. Fix? no
      
      This is because that we save the lower 32bit of refcount in
      inode->i_version and store it in raw_inode->i_disk_version on disk.
      But since commit ee73f9a5 ("ext4: convert to new i_version
      API"), we load/store modified i_disk_version from/to disk instead of
      raw value, which causes on-disk ea_inode refcount corruption.
      
      Fix it by loading/storing raw i_version/i_disk_version, because it's
      a self-managed value in this case.
      
      Fixes: ee73f9a5 ("ext4: convert to new i_version API")
      Cc: Tahsin Erdogan <tahsin@google.com>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e254d1af
  18. 31 3月, 2018 1 次提交
    • D
      ext4, dax: introduce ext4_dax_aops · 5f0663bb
      Dan Williams 提交于
      In preparation for the dax implementation to start associating dax pages
      to inodes via page->mapping, we need to provide a 'struct
      address_space_operations' instance for dax. Otherwise, direct-I/O
      triggers incorrect page cache assumptions and warnings.
      
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: linux-ext4@vger.kernel.org
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5f0663bb
  19. 30 3月, 2018 1 次提交
  20. 28 3月, 2018 1 次提交
  21. 26 3月, 2018 1 次提交
  22. 22 3月, 2018 4 次提交
    • N
      ext4: remove EXT4_STATE_DIOREAD_LOCK flag · 1d39834f
      Nikolay Borisov 提交于
      Commit 16c54688 ("ext4: Allow parallel DIO reads") reworked the way
      locking happens around parallel dio reads. This resulted in obviating
      the need for EXT4_STATE_DIOREAD_LOCK flag and accompanying logic.
      Currently this amounts to dead code so let's remove it. No functional
      changes
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      1d39834f
    • J
      ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin() · fe23cb65
      Jiri Slaby 提交于
      ext4_iomap_begin() has a bug where offset returned in the iomap
      structure will be truncated to unsigned long size. On 64-bit
      architectures this is fine but on 32-bit architectures obviously not.
      Not many places actually use the offset stored in the iomap structure
      but one of visible failures is in SEEK_HOLE / SEEK_DATA implementation.
      If we create a file like:
      
      dd if=/dev/urandom of=file bs=1k seek=8m count=1
      
      then
      
      lseek64("file", 0x100000000ULL, SEEK_DATA)
      
      wrongly returns 0x100000000 on unfixed kernel while it should return
      0x200000000. Avoid the overflow by proper type cast.
      
      Fixes: 545052e9 ("ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA")
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org # v4.15
      fe23cb65
    • E
      ext4: update i_disksize if direct write past ondisk size · 45d8ec4d
      Eryu Guan 提交于
      Currently in ext4 direct write path, we update i_disksize only when
      new eof is greater than i_size, and don't update it even when new
      eof is greater than i_disksize but less than i_size. This doesn't
      work well with delalloc buffer write, which updates i_size and
      i_disksize only when delalloc blocks are resolved (at writeback
      time), the i_disksize from direct write can be lost if a previous
      buffer write succeeded at write time but failed at writeback time,
      then results in corrupted ondisk inode size.
      
      Consider this case, first buffer write 4k data to a new file at
      offset 16k with delayed allocation, then direct write 4k data to the
      same file at offset 4k before delalloc blocks are resolved, which
      doesn't update i_disksize because it writes within i_size(20k), but
      the extent tree metadata has been committed in journal. Then
      writeback of the delalloc blocks fails (due to device error etc.),
      and i_size/i_disksize from buffer write can't be written to disk
      (still zero). A subsequent umount/mount cycle recovers journal and
      writes extent tree metadata from direct write to disk, but with
      i_disksize being zero.
      
      Fix it by updating i_disksize too in direct write path when new eof
      is greater than i_disksize but less than i_size, so i_disksize is
      always consistent with direct write.
      
      This fixes occasional i_size corruption in fstests generic/475.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      45d8ec4d
    • E
      ext4: protect i_disksize update by i_data_sem in direct write path · 73fdad00
      Eryu Guan 提交于
      i_disksize update should be protected by i_data_sem, by either taking
      the lock explicitly or by using ext4_update_i_disksize() helper. But the
      i_disksize updates in ext4_direct_IO_write() are not protected at all,
      which may be racing with i_disksize updates in writeback path in
      delalloc buffer write path.
      
      This is found by code inspection, and I didn't hit any i_disksize
      corruption due to this bug. Thanks to Jan Kara for catching this bug and
      suggesting the fix!
      Reported-by: NJan Kara <jack@suse.cz>
      Suggested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      73fdad00
  23. 29 1月, 2018 2 次提交
  24. 10 1月, 2018 1 次提交
    • H
      ext4: fix a race in the ext4 shutdown path · abbc3f93
      Harshad Shirwadkar 提交于
      This patch fixes a race between the shutdown path and bio completion
      handling. In the ext4 direct io path with async io, after submitting a
      bio to the block layer, if journal starting fails,
      ext4_direct_IO_write() would bail out pretending that the IO
      failed. The caller would have had no way of knowing whether or not the
      IO was successfully submitted. So instead, we return -EIOCBQUEUED in
      this case. Now, the caller knows that the IO was submitted.  The bio
      completion handler takes care of the error.
      
      Tested: Ran the shutdown xfstest test 461 in loop for over 2 hours across
      4 machines resulting in over 400 runs. Verified that the race didn't
      occur. Usually the race was seen in about 20-30 iterations.
      Signed-off-by: NHarshad Shirwadkar <harshads@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      abbc3f93
  25. 04 12月, 2017 1 次提交
    • A
      ext4: support fast symlinks from ext3 file systems · fc82228a
      Andi Kleen 提交于
      407cd7fb (ext4: change fast symlink test to not rely on i_blocks)
      broke ~10 years old ext3 file systems created by 2.6.17. Any ELF
      executable fails because the /lib/ld-linux.so.2 fast symlink
      cannot be read anymore.
      
      The patch assumed fast symlinks were created in a specific way,
      but that's not true on these really old file systems.
      
      The new behavior is apparently needed only with the large EA inode
      feature.
      
      Revert to the old behavior if the large EA inode feature is not set.
      
      This makes my old VM boot again.
      
      Fixes: 407cd7fb (ext4: change fast symlink test to not rely on i_blocks)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: stable@vger.kernel.org
      fc82228a
  26. 28 11月, 2017 1 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  27. 16 11月, 2017 3 次提交
  28. 14 11月, 2017 1 次提交
  29. 03 11月, 2017 1 次提交
    • J
      ext4: Support for synchronous DAX faults · b8a6176c
      Jan Kara 提交于
      We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
      prepare blocks for writing and the inode has some uncommitted metadata
      changes. In the fault handler ext4_dax_fault() we then detect this case
      (through VM_FAULT_NEEDDSYNC return value) and call helper
      dax_finish_sync_fault() to flush metadata changes and insert page table
      entry. Note that this will also dirty corresponding radix tree entry
      which is what we want - fsync(2) will still provide data integrity
      guarantees for applications not using userspace flushing. And
      applications using userspace flushing can avoid calling fsync(2) and
      thus avoid the performance overhead.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b8a6176c