1. 22 3月, 2018 2 次提交
    • E
      ext4: update i_disksize if direct write past ondisk size · 45d8ec4d
      Eryu Guan 提交于
      Currently in ext4 direct write path, we update i_disksize only when
      new eof is greater than i_size, and don't update it even when new
      eof is greater than i_disksize but less than i_size. This doesn't
      work well with delalloc buffer write, which updates i_size and
      i_disksize only when delalloc blocks are resolved (at writeback
      time), the i_disksize from direct write can be lost if a previous
      buffer write succeeded at write time but failed at writeback time,
      then results in corrupted ondisk inode size.
      
      Consider this case, first buffer write 4k data to a new file at
      offset 16k with delayed allocation, then direct write 4k data to the
      same file at offset 4k before delalloc blocks are resolved, which
      doesn't update i_disksize because it writes within i_size(20k), but
      the extent tree metadata has been committed in journal. Then
      writeback of the delalloc blocks fails (due to device error etc.),
      and i_size/i_disksize from buffer write can't be written to disk
      (still zero). A subsequent umount/mount cycle recovers journal and
      writes extent tree metadata from direct write to disk, but with
      i_disksize being zero.
      
      Fix it by updating i_disksize too in direct write path when new eof
      is greater than i_disksize but less than i_size, so i_disksize is
      always consistent with direct write.
      
      This fixes occasional i_size corruption in fstests generic/475.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      45d8ec4d
    • E
      ext4: protect i_disksize update by i_data_sem in direct write path · 73fdad00
      Eryu Guan 提交于
      i_disksize update should be protected by i_data_sem, by either taking
      the lock explicitly or by using ext4_update_i_disksize() helper. But the
      i_disksize updates in ext4_direct_IO_write() are not protected at all,
      which may be racing with i_disksize updates in writeback path in
      delalloc buffer write path.
      
      This is found by code inspection, and I didn't hit any i_disksize
      corruption due to this bug. Thanks to Jan Kara for catching this bug and
      suggesting the fix!
      Reported-by: NJan Kara <jack@suse.cz>
      Suggested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      73fdad00
  2. 29 1月, 2018 2 次提交
  3. 10 1月, 2018 1 次提交
    • H
      ext4: fix a race in the ext4 shutdown path · abbc3f93
      Harshad Shirwadkar 提交于
      This patch fixes a race between the shutdown path and bio completion
      handling. In the ext4 direct io path with async io, after submitting a
      bio to the block layer, if journal starting fails,
      ext4_direct_IO_write() would bail out pretending that the IO
      failed. The caller would have had no way of knowing whether or not the
      IO was successfully submitted. So instead, we return -EIOCBQUEUED in
      this case. Now, the caller knows that the IO was submitted.  The bio
      completion handler takes care of the error.
      
      Tested: Ran the shutdown xfstest test 461 in loop for over 2 hours across
      4 machines resulting in over 400 runs. Verified that the race didn't
      occur. Usually the race was seen in about 20-30 iterations.
      Signed-off-by: NHarshad Shirwadkar <harshads@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      abbc3f93
  4. 04 12月, 2017 1 次提交
    • A
      ext4: support fast symlinks from ext3 file systems · fc82228a
      Andi Kleen 提交于
      407cd7fb (ext4: change fast symlink test to not rely on i_blocks)
      broke ~10 years old ext3 file systems created by 2.6.17. Any ELF
      executable fails because the /lib/ld-linux.so.2 fast symlink
      cannot be read anymore.
      
      The patch assumed fast symlinks were created in a specific way,
      but that's not true on these really old file systems.
      
      The new behavior is apparently needed only with the large EA inode
      feature.
      
      Revert to the old behavior if the large EA inode feature is not set.
      
      This makes my old VM boot again.
      
      Fixes: 407cd7fb (ext4: change fast symlink test to not rely on i_blocks)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: stable@vger.kernel.org
      fc82228a
  5. 28 11月, 2017 1 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  6. 16 11月, 2017 3 次提交
  7. 14 11月, 2017 1 次提交
  8. 03 11月, 2017 1 次提交
    • J
      ext4: Support for synchronous DAX faults · b8a6176c
      Jan Kara 提交于
      We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
      prepare blocks for writing and the inode has some uncommitted metadata
      changes. In the fault handler ext4_dax_fault() we then detect this case
      (through VM_FAULT_NEEDDSYNC return value) and call helper
      dax_finish_sync_fault() to flush metadata changes and insert page table
      entry. Note that this will also dirty corresponding radix tree entry
      which is what we want - fsync(2) will still provide data integrity
      guarantees for applications not using userspace flushing. And
      applications using userspace flushing can avoid calling fsync(2) and
      thus avoid the performance overhead.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b8a6176c
  9. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  10. 19 10月, 2017 2 次提交
    • E
      ext4: switch to fscrypt_prepare_setattr() · 3ce2b8dd
      Eric Biggers 提交于
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3ce2b8dd
    • E
      fs, fscrypt: add an S_ENCRYPTED inode flag · 2ee6a576
      Eric Biggers 提交于
      Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate
      that the inode is encrypted using the fscrypt (fs/crypto/) mechanism.
      
      Checking this flag will give the same information that
      inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more
      efficient.  This will be useful for adding higher-level helper functions
      for filesystems to use.  For example we'll be able to replace this:
      
      	if (ext4_encrypted_inode(inode)) {
      		ret = fscrypt_get_encryption_info(inode);
      		if (ret)
      			return ret;
      		if (!fscrypt_has_encryption_key(inode))
      			return -ENOKEY;
      	}
      
      with this:
      
      	ret = fscrypt_require_key(inode);
      	if (ret)
      		return ret;
      
      ... since we'll be able to retain the fast path for unencrypted files as
      a single flag check, using an inline function.  This wasn't possible
      before because we'd have had to frequently call through the
      ->i_sb->s_cop->is_encrypted function pointer, even when the encryption
      support was disabled or not being used.
      
      Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is
      disabled because we want to continue to return an error if an encrypted
      file is accessed without encryption support, rather than pretending that
      it is unencrypted.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2ee6a576
  11. 13 10月, 2017 1 次提交
  12. 12 10月, 2017 1 次提交
  13. 02 10月, 2017 3 次提交
  14. 07 9月, 2017 3 次提交
  15. 01 9月, 2017 1 次提交
  16. 25 8月, 2017 1 次提交
    • T
      ext4: backward compatibility support for Lustre ea_inode implementation · a6d05676
      Tahsin Erdogan 提交于
      Original Lustre ea_inode feature did not have ref counts on xattr inodes
      because there was always one parent that referenced it. New
      implementation expects ref count to be initialized which is not true for
      Lustre case. Handle this by detecting Lustre created xattr inode and set
      its ref count to 1.
      
      The quota handling of xattr inodes have also changed with deduplication
      support. New implementation manually manages quotas to support sharing
      across multiple users. A consequence is that, a referencing inode
      incorporates the blocks of xattr inode into its own i_block field.
      
      We need to know how a xattr inode was created so that we can reverse the
      block charges during reference removal. This is handled by introducing a
      EXT4_STATE_LUSTRE_EA_INODE flag. The flag is set on a xattr inode if
      inode appears to have been created by Lustre. During xattr inode reference
      removal, the manual quota uncharge is skipped if the flag is set.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a6d05676
  17. 06 8月, 2017 4 次提交
  18. 31 7月, 2017 1 次提交
  19. 04 7月, 2017 1 次提交
    • T
      ext4: change fast symlink test to not rely on i_blocks · 407cd7fb
      Tahsin Erdogan 提交于
      ext4_inode_info->i_data is the storage area for 4 types of data:
      
        a) Extents data
        b) Inline data
        c) Block map
        d) Fast symlink data (symlink length < 60)
      
      Extents data case is positively identified by EXT4_INODE_EXTENTS flag.
      Inline data case is also obvious because of EXT4_INODE_INLINE_DATA
      flag.
      
      Distinguishing c) and d) however requires additional logic. This
      currently relies on i_blocks count. After subtracting external xattr
      block from i_blocks, if it is greater than 0 then we know that some
      data blocks exist, so there must be a block map.
      
      This logic got broken after ea_inode feature was added. That feature
      charges the data blocks of external xattr inodes to the referencing
      inode and so adds them to the i_blocks. To fix this, we could subtract
      ea_inode blocks by iterating through all xattr entries and then check
      whether remaining i_blocks count is zero. Besides being complicated,
      this won't change the fact that the current way of distinguishing
      between c) and d) is fragile.
      
      The alternative solution is to test whether i_size is less than 60 to
      determine fast symlink case. ext4_symlink() uses the same test to decide
      whether to store the symlink in i_data. There is one caveat to address
      before this can work though.
      
      If an inode's i_nlink is zero during eviction, its i_size is set to
      zero and its data is truncated. If system crashes before inode is removed
      from the orphan list, next boot orphan cleanup may find the inode with
      zero i_size. So, a symlink that had its data stored in a block may now
      appear to be a fast symlink. The solution used in this patch is to treat
      i_size = 0 as a non-fast symlink case. A zero sized symlink is not legal
      so the only time this can happen is the mentioned scenario. This is also
      logically correct because a i_size = 0 symlink has no data stored in
      i_data.
      Suggested-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      407cd7fb
  20. 24 6月, 2017 1 次提交
    • E
      ext4: require key for truncate(2) of encrypted file · 63136858
      Eric Biggers 提交于
      Currently, filesystems allow truncate(2) on an encrypted file without
      the encryption key.  However, it's impossible to correctly handle the
      case where the size being truncated to is not a multiple of the
      filesystem block size, because that would require decrypting the final
      block, zeroing the part beyond i_size, then encrypting the block.
      
      As other modifications to encrypted file contents are prohibited without
      the key, just prohibit truncate(2) as well, making it fail with ENOKEY.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      63136858
  21. 23 6月, 2017 1 次提交
  22. 22 6月, 2017 7 次提交
    • T
      quota: add get_inode_usage callback to transfer multi-inode charges · 7a9ca53a
      Tahsin Erdogan 提交于
      Ext4 ea_inode feature allows storing xattr values in external inodes to
      be able to store values that are bigger than a block in size. Ext4 also
      has deduplication support for these type of inodes. With deduplication,
      the actual storage waste is eliminated but the users of such inodes are
      still charged full quota for the inodes as if there was no sharing
      happening in the background.
      
      This design requires ext4 to manually charge the users because the
      inodes are shared.
      
      An implication of this is that, if someone calls chown on a file that
      has such references we need to transfer the quota for the file and xattr
      inodes. Current dquot_transfer() function implicitly transfers one inode
      charge. With ea_inode feature, we would like to transfer multiple inode
      charges.
      
      Add get_inode_usage callback which can interrogate the total number of
      inodes that were charged for a given inode.
      
      [ Applied fix from Colin King to make sure the 'ret' variable is
        initialized on the successful return path.  Detected by
        CoverityScan, CID#1446616 ("Uninitialized scalar variable") --tytso]
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NJan Kara <jack@suse.cz>
      7a9ca53a
    • T
      ext4: xattr inode deduplication · dec214d0
      Tahsin Erdogan 提交于
      Ext4 now supports xattr values that are up to 64k in size (vfs limit).
      Large xattr values are stored in external inodes each one holding a
      single value. Once written the data blocks of these inodes are immutable.
      
      The real world use cases are expected to have a lot of value duplication
      such as inherited acls etc. To reduce data duplication on disk, this patch
      implements a deduplicator that allows sharing of xattr inodes.
      
      The deduplication is based on an in-memory hash lookup that is a best
      effort sharing scheme. When a xattr inode is read from disk (i.e.
      getxattr() call), its crc32c hash is added to a hash table. Before
      creating a new xattr inode for a value being set, the hash table is
      checked to see if an existing inode holds an identical value. If such an
      inode is found, the ref count on that inode is incremented. On value
      removal the ref count is decremented and if it reaches zero the inode is
      deleted.
      
      The quota charging for such inodes is manually managed. Every reference
      holder is charged the full size as if there was no sharing happening.
      This is consistent with how xattr blocks are also charged.
      
      [ Fixed up journal credits calculation to handle inline data and the
        rare case where an shared xattr block can get freed when two thread
        race on breaking the xattr block sharing. --tytso ]
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dec214d0
    • T
      ext4: cleanup transaction restarts during inode deletion · 30a7eb97
      Tahsin Erdogan 提交于
      During inode deletion, the number of journal credits that will be
      needed is hard to determine.  For that reason we have journal
      extend/restart calls in several places.  Whenever a transaction is
      restarted, filesystem must be in a consistent state because there is
      no atomicity guarantee beyond a restart call.
      
      Add ext4_xattr_ensure_credits() helper function which takes care of
      journal extend/restart logic.  It also handles getting jbd2 write
      access and dirty metadata calls.  This function is called at every
      iteration of handling an ea_inode reference.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      30a7eb97
    • T
      ext4: add ext4_is_quota_file() · 02749a4c
      Tahsin Erdogan 提交于
      IS_NOQUOTA() indicates whether quota is disabled for an inode. Ext4
      also uses it to check whether an inode is for a quota file. The
      distinction currently doesn't matter because quota is disabled only
      for the quota files. When we start disabling quota for other inodes
      in the future, we will want to make the distinction clear.
      
      Replace IS_NOQUOTA() call with ext4_is_quota_file() at places where
      we are checking for quota files.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      02749a4c
    • T
      ext4: modify ext4_xattr_ino_array to hold struct inode * · 0421a189
      Tahsin Erdogan 提交于
      Tracking struct inode * rather than the inode number eliminates the
      repeated ext4_xattr_inode_iget() call later. The second call cannot
      fail in practice but still requires explanation when it wants to ignore
      the return value. Avoid the trouble and make things simple.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0421a189
    • T
      ext4: fix lockdep warning about recursive inode locking · 33d201e0
      Tahsin Erdogan 提交于
      Setting a large xattr value may require writing the attribute contents
      to an external inode. In this case we may need to lock the xattr inode
      along with the parent inode. This doesn't pose a deadlock risk because
      xattr inodes are not directly visible to the user and their access is
      restricted.
      
      Assign a lockdep subclass to xattr inode's lock.
      
       ============================================
       WARNING: possible recursive locking detected
       4.12.0-rc1+ #740 Not tainted
       --------------------------------------------
       python/1822 is trying to acquire lock:
        (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff804912ca>] ext4_xattr_set_entry+0x65a/0x7b0
      
       but task is already holding lock:
        (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803d6687>] vfs_setxattr+0x57/0xb0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&sb->s_type->i_mutex_key#15);
         lock(&sb->s_type->i_mutex_key#15);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       4 locks held by python/1822:
        #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff803d0eef>] mnt_want_write+0x1f/0x50
        #1:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803d6687>] vfs_setxattr+0x57/0xb0
        #2:  (jbd2_handle){.+.+..}, at: [<ffffffff80493f40>] start_this_handle+0xf0/0x420
        #3:  (&ei->xattr_sem){++++..}, at: [<ffffffff804920ba>] ext4_xattr_set_handle+0x9a/0x4f0
      
       stack backtrace:
       CPU: 0 PID: 1822 Comm: python Not tainted 4.12.0-rc1+ #740
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       Call Trace:
        dump_stack+0x67/0x9e
        __lock_acquire+0x5f3/0x1750
        lock_acquire+0xb5/0x1d0
        down_write+0x2c/0x60
        ext4_xattr_set_entry+0x65a/0x7b0
        ext4_xattr_block_set+0x1b2/0x9b0
        ext4_xattr_set_handle+0x322/0x4f0
        ext4_xattr_set+0x144/0x1a0
        ext4_xattr_user_set+0x34/0x40
        __vfs_setxattr+0x66/0x80
        __vfs_setxattr_noperm+0x69/0x1c0
        vfs_setxattr+0xa2/0xb0
        setxattr+0x12e/0x150
        path_setxattr+0x87/0xb0
        SyS_setxattr+0xf/0x20
        entry_SYSCALL_64_fastpath+0x18/0xad
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      33d201e0
    • A
      ext4: xattr-in-inode support · e50e5129
      Andreas Dilger 提交于
      Large xattr support is implemented for EXT4_FEATURE_INCOMPAT_EA_INODE.
      
      If the size of an xattr value is larger than will fit in a single
      external block, then the xattr value will be saved into the body
      of an external xattr inode.
      
      The also helps support a larger number of xattr, since only the headers
      will be stored in the in-inode space or the single external block.
      
      The inode is referenced from the xattr header via "e_value_inum",
      which was formerly "e_value_block", but that field was never used.
      The e_value_size still contains the xattr size so that listing
      xattrs does not need to look up the inode if the data is not accessed.
      
      struct ext4_xattr_entry {
              __u8    e_name_len;     /* length of name */
              __u8    e_name_index;   /* attribute name index */
              __le16  e_value_offs;   /* offset in disk block of value */
              __le32  e_value_inum;   /* inode in which value is stored */
              __le32  e_value_size;   /* size of attribute value */
              __le32  e_hash;         /* hash value of name and value */
              char    e_name[0];      /* attribute name */
      };
      
      The xattr inode is marked with the EXT4_EA_INODE_FL flag and also
      holds a back-reference to the owning inode in its i_mtime field,
      allowing the ext4/e2fsck to verify the correct inode is accessed.
      
      [ Applied fix by Dan Carpenter to avoid freeing an ERR_PTR. ]
      
      Lustre-Jira: https://jira.hpdd.intel.com/browse/LU-80
      Lustre-bugzilla: https://bugzilla.lustre.org/show_bug.cgi?id=4424Signed-off-by: NKalpak Shah <kalpak.shah@sun.com>
      Signed-off-by: NJames Simmons <uja.ornl@gmail.com>
      Signed-off-by: NAndreas Dilger <andreas.dilger@intel.com>
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      e50e5129