- 10 11月, 2018 1 次提交
-
-
由 Vasily Averin 提交于
According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write() should not return bh = NULL The only time that bh could be NULL, then, would be in the case of something really going wrong; a programming error elsewhere (perhaps a wild pointer dereference) or I/O error causing on-disk file system corruption (although that would be highly unlikely given that we had *just* allocated the blocks and so the metadata blocks in question probably would still be in the cache). Fixes: e50e5129 ("ext4: xattr-in-inode support") Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.13
-
- 08 11月, 2018 4 次提交
-
-
由 Vasily Averin 提交于
Fixes: de05ca85 ("ext4: move call to ext4_error() into ...") Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.17
-
由 Vasily Averin 提交于
Fixes: 3f2571c1 ("ext4: factor out xattr moving") Fixes: 6dd4ee7c ("ext4: Expand extra_inodes space per ...") Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.23
-
由 Vasily Averin 提交于
bs.bh was taken in previous ext4_xattr_block_find() call, it should be released before re-using Fixes: 7e01c8e5 ("ext3/4: fix uninitialized bs in ...") Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.26
-
由 Vasily Averin 提交于
Fixes: dec214d0 ("ext4: xattr inode deduplication") Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.13
-
- 07 11月, 2018 1 次提交
-
-
由 Vasily Averin 提交于
Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 02 8月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
Extended attribute names are defined to be NUL-terminated, so the name must not contain a NUL character. This is important because there are places when remove extended attribute, the code uses strlen to determine the length of the entry. That should probably be fixed at some point, but code is currently really messy, so the simplest fix for now is to simply validate that the extended attributes are sane. https://bugzilla.kernel.org/show_bug.cgi?id=200401Reported-by: NWen Xu <wen.xu@gatech.edu> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 17 6月, 2018 2 次提交
-
-
由 Theodore Ts'o 提交于
Use a separate journal transaction if it turns out that we need to convert an inline file to use an data block. Otherwise we could end up failing due to not having journal credits. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
由 Theodore Ts'o 提交于
When expanding the extra isize space, we must never move the system.data xattr out of the inode body. For performance reasons, it doesn't make any sense, and the inline data implementation assumes that system.data xattr is never in the external xattr block. This addresses CVE-2018-10880 https://bugzilla.kernel.org/show_bug.cgi?id=200005Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 13 6月, 2018 2 次提交
-
-
由 Theodore Ts'o 提交于
If there an inode points to a block which is also some other type of metadata block (such as a block allocation bitmap), the buffer_verified flag can be set when it was validated as that other metadata block type; however, it would make a really terrible external attribute block. The reason why we use the verified flag is to avoid constantly reverifying the block. However, it doesn't take much overhead to make sure the magic number of the xattr block is correct, and this will avoid potential crashes. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=200001Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org
-
由 Theodore Ts'o 提交于
In theory this should have been caught earlier when the xattr list was verified, but in case it got missed, it's simple enough to add check to make sure we don't overrun the xattr buffer. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=200001Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org
-
- 23 5月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
Ext4 will always create ext4 extended attributes which do not have a value (where e_value_size is zero) with e_value_offs set to zero. In most places e_value_offs will not be used in a substantive way if e_value_size is zero. There was one exception to this, which is in ext4_xattr_set_entry(), where if there is a maliciously crafted file system where there is an extended attribute with e_value_offs is non-zero and e_value_size is 0, the attempt to remove this xattr will result in a negative value getting passed to memmove, leading to the following sadness: [ 41.225365] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) [ 44.538641] BUG: unable to handle kernel paging request at ffff9ec9a3000000 [ 44.538733] IP: __memmove+0x81/0x1a0 [ 44.538755] PGD 1249bd067 P4D 1249bd067 PUD 1249c1067 PMD 80000001230000e1 [ 44.538793] Oops: 0003 [#1] SMP PTI [ 44.539074] CPU: 0 PID: 1470 Comm: poc Not tainted 4.16.0-rc1+ #1 ... [ 44.539475] Call Trace: [ 44.539832] ext4_xattr_set_entry+0x9e7/0xf80 ... [ 44.539972] ext4_xattr_block_set+0x212/0xea0 ... [ 44.540041] ext4_xattr_set_handle+0x514/0x610 [ 44.540065] ext4_xattr_set+0x7f/0x120 [ 44.540090] __vfs_removexattr+0x4d/0x60 [ 44.540112] vfs_removexattr+0x75/0xe0 [ 44.540132] removexattr+0x4d/0x80 ... [ 44.540279] path_removexattr+0x91/0xb0 [ 44.540300] SyS_removexattr+0xf/0x20 [ 44.540322] do_syscall_64+0x71/0x120 [ 44.540344] entry_SYSCALL_64_after_hwframe+0x21/0x86 https://bugzilla.kernel.org/show_bug.cgi?id=199347 This addresses CVE-2018-10840. Reported-by: N"Xu, Wen" <wen.xu@gatech.edu> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Fixes: dec214d0 ("ext4: xattr inode deduplication")
-
- 31 3月, 2018 3 次提交
-
-
由 Theodore Ts'o 提交于
Add explicit checks in ext4_xattr_block_get() just in case the e_value_offs and e_value_size fields in the the xattr block are corrupted in memory after the buffer_verified bit is set on the xattr block. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
由 Theodore Ts'o 提交于
Add some paranoia checks to make sure we don't stray beyond the end of the valid memory region containing ext4 xattr entries while we are scanning for a match. Also rename the function to xattr_find_entry() since it is static and thus only used in fs/ext4/xattr.c Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
由 Theodore Ts'o 提交于
Refactor the call to EXT4_ERROR_INODE() into ext4_xattr_check_block(). This simplifies the code, and fixes a problem where not all callers of ext4_xattr_check_block() were not resulting in ext4_error() getting called when the xattr block is corrupted. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 30 3月, 2018 1 次提交
-
-
由 Eric Biggers 提交于
ext4 isn't validating the sizes of xattrs where the value of the xattr is stored in an external inode. This is problematic because ->e_value_size is a u32, but ext4_xattr_get() returns an int. A very large size is misinterpreted as an error code, which ext4_get_acl() translates into a bogus ERR_PTR() for which IS_ERR() returns false, causing a crash. Fix this by validating that all xattrs are <= INT_MAX bytes. This issue has been assigned CVE-2018-1095. https://bugzilla.kernel.org/show_bug.cgi?id=199185 https://bugzilla.redhat.com/show_bug.cgi?id=1560793Reported-by: NWen Xu <wen.xu@gatech.edu> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Fixes: e50e5129 ("ext4: xattr-in-inode support")
-
- 29 1月, 2018 1 次提交
-
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@redhat.com> Acked-by: NTheodore Ts'o <tytso@mit.edu>
-
- 02 11月, 2017 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 8月, 2017 1 次提交
-
-
由 Tahsin Erdogan 提交于
Original Lustre ea_inode feature did not have ref counts on xattr inodes because there was always one parent that referenced it. New implementation expects ref count to be initialized which is not true for Lustre case. Handle this by detecting Lustre created xattr inode and set its ref count to 1. The quota handling of xattr inodes have also changed with deduplication support. New implementation manually manages quotas to support sharing across multiple users. A consequence is that, a referencing inode incorporates the blocks of xattr inode into its own i_block field. We need to know how a xattr inode was created so that we can reverse the block charges during reference removal. This is handled by introducing a EXT4_STATE_LUSTRE_EA_INODE flag. The flag is set on a xattr inode if inode appears to have been created by Lustre. During xattr inode reference removal, the manual quota uncharge is skipped if the flag is set. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 14 8月, 2017 1 次提交
-
-
由 Tahsin Erdogan 提交于
When updating an extended attribute, if the padded value sizes are the same, a shortcut is taken to avoid the bulk of the work. This was fine until the xattr hash update was moved inside ext4_xattr_set_entry(). With that change, the hash update got missed in the shortcut case. Thanks to ZhangYi (yizhang089@gmail.com) for root causing the problem. Fixes: daf83281 ("ext4: eliminate xattr entry e_hash recalculation for removes") Reported-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 06 8月, 2017 5 次提交
-
-
由 Miao Xie 提交于
Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer. Signed-off-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NWang Shilong <wshilong@ddn.com>
-
由 Miao Xie 提交于
Current ext4_expand_extra_isize just tries to expand extra isize, if someone is holding xattr lock or some check fails, it will give up. So rename its name to ext4_try_to_expand_extra_isize. Besides that, we clean up unnecessary check and move some relative checks into it. Signed-off-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NWang Shilong <wshilong@ddn.com>
-
由 Miao Xie 提交于
We should avoid the contention between the i_extra_isize update and the inline data insertion, so move the xattr trylock in front of i_extra_isize update. Signed-off-by: NMiao Xie <miaoxie@huawei.com> Reviewed-by: NWang Shilong <wshilong@ddn.com>
-
由 Tahsin Erdogan 提交于
ext4_xattr_inode_read() currently reads each block sequentially while waiting for io operation to complete before moving on to the next block. This prevents request merging in block layer. Add a ext4_bread_batch() function that starts reads for all blocks then optionally waits for them to complete. A similar logic is used in ext4_find_entry(), so update that code to use the new function. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
When an xattr block has a single reference, block is updated inplace and it is reinserted to the cache. Later, a cache lookup is performed to see whether an existing block has the same contents. This cache lookup will most of the time return the just inserted entry so deduplication is not achieved. Running the following test script will produce two xattr blocks which can be observed in "File ACL: " line of debugfs output: mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G mount /dev/sdb /mnt/sdb touch /mnt/sdb/{x,y} setfattr -n user.1 -v aaa /mnt/sdb/x setfattr -n user.2 -v bbb /mnt/sdb/x setfattr -n user.1 -v aaa /mnt/sdb/y setfattr -n user.2 -v bbb /mnt/sdb/y debugfs -R 'stat x' /dev/sdb | cat debugfs -R 'stat y' /dev/sdb | cat This patch defers the reinsertion to the cache so that we can locate other blocks with the same contents. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
-
- 31 7月, 2017 1 次提交
-
-
由 Emoly Liu 提交于
For Lustre, if ea_inode fails in hash validation but passes parent inode and generation checks, it won't be added to the cache as well as the error "-EFSCORRUPTED" should be cleared, otherwise it will cause "Structure needs cleaning" when running getfattr command. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9723 Cc: stable@vger.kernel.org Fixes: dec214d0Signed-off-by: NEmoly Liu <emoly.liu@intel.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Reviewed-by: tahsin@google.com
-
- 06 7月, 2017 1 次提交
-
-
由 Tahsin Erdogan 提交于
ea_inode feature allows creating extended attributes that are up to 64k in size. Update __ext4_new_inode() to pick increased credit limits. To avoid overallocating too many journal credits, update __ext4_xattr_set_credits() to make a distinction between xattr create vs update. This helps __ext4_new_inode() because all attributes are known to be new, so we can save credits that are normally needed to delete old values. Also, have fscrypt specify its maximum context size so that we don't end up allocating credits for 64k size. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 6月, 2017 13 次提交
-
-
由 Tahsin Erdogan 提交于
The main purpose of mb cache is to achieve deduplication in extended attributes. In use cases where opportunity for deduplication is unlikely, it only adds overhead. Add a mount option to explicitly turn off mb cache. Suggested-by: NAndreas Dilger <adilger@dilger.ca> Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
To verify that a xattr entry is not pointing to the wrong xattr inode, we currently check that the target inode has EXT4_EA_INODE_FL flag set and also the entry size matches the target inode size. For stronger validation, also incorporate crc32c hash of the value into the e_hash field. This is done regardless of whether the entry lives in the inode body or external attribute block. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
When an extended attribute block is modified, ext4_xattr_hash_entry() recalculates e_hash for the entry that is pointed by s->here. This is unnecessary if the modification is to remove an entry. Currently, if the removed entry is the last one and there are other entries remaining, hash calculation targets the just erased entry which has been filled with zeroes and effectively does nothing. If the removed entry is not the last one and there are more entries, this time it will recalculate hash on the next entry which is totally unnecessary. Fix these by moving the decision on when to recalculate hash to ext4_xattr_set_entry(). Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
New ea_inode feature allows putting large xattr values into external inodes. struct ext4_xattr_entry and the attribute name however have to remain in the inode extra space or external attribute block. Once that space is exhausted, no further entries can be added. Some of that space could also be used by values that fit in there at the time of addition. So, a single xattr entry whose value barely fits in the external block could prevent further entries being added. To mitigate the problem, this patch introduces a notion of reserved space in the external attribute block that cannot be used by value data. This reserve is enforced when ea_inode feature is enabled. The amount of reserve is arbitrarily chosen to be min(block_size/8, 1024). The table below shows how much space is reserved for each block size and the guaranteed mininum number of entries that can be placed in the external attribute block. block size reserved bytes entries (name length = 16) 1k 128 3 2k 256 7 4k 512 15 8k 1024 31 16k 1024 31 32k 1024 31 64k 1024 31 Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
Ext4 ea_inode feature allows storing xattr values in external inodes to be able to store values that are bigger than a block in size. Ext4 also has deduplication support for these type of inodes. With deduplication, the actual storage waste is eliminated but the users of such inodes are still charged full quota for the inodes as if there was no sharing happening in the background. This design requires ext4 to manually charge the users because the inodes are shared. An implication of this is that, if someone calls chown on a file that has such references we need to transfer the quota for the file and xattr inodes. Current dquot_transfer() function implicitly transfers one inode charge. With ea_inode feature, we would like to transfer multiple inode charges. Add get_inode_usage callback which can interrogate the total number of inodes that were charged for a given inode. [ Applied fix from Colin King to make sure the 'ret' variable is initialized on the successful return path. Detected by CoverityScan, CID#1446616 ("Uninitialized scalar variable") --tytso] Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NJan Kara <jack@suse.cz>
-
由 Tahsin Erdogan 提交于
Ext4 now supports xattr values that are up to 64k in size (vfs limit). Large xattr values are stored in external inodes each one holding a single value. Once written the data blocks of these inodes are immutable. The real world use cases are expected to have a lot of value duplication such as inherited acls etc. To reduce data duplication on disk, this patch implements a deduplicator that allows sharing of xattr inodes. The deduplication is based on an in-memory hash lookup that is a best effort sharing scheme. When a xattr inode is read from disk (i.e. getxattr() call), its crc32c hash is added to a hash table. Before creating a new xattr inode for a value being set, the hash table is checked to see if an existing inode holds an identical value. If such an inode is found, the ref count on that inode is incremented. On value removal the ref count is decremented and if it reaches zero the inode is deleted. The quota charging for such inodes is manually managed. Every reference holder is charged the full size as if there was no sharing happening. This is consistent with how xattr blocks are also charged. [ Fixed up journal credits calculation to handle inline data and the rare case where an shared xattr block can get freed when two thread race on breaking the xattr block sharing. --tytso ] Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
During inode deletion, the number of journal credits that will be needed is hard to determine. For that reason we have journal extend/restart calls in several places. Whenever a transaction is restarted, filesystem must be in a consistent state because there is no atomicity guarantee beyond a restart call. Add ext4_xattr_ensure_credits() helper function which takes care of journal extend/restart logic. It also handles getting jbd2 write access and dirty metadata calls. This function is called at every iteration of handling an ea_inode reference. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
There will be a second mb_cache instance that tracks ea_inodes. Make existing names more explicit so that it is clear that they refer to xattr block cache. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
Make names more generic so that mbcache usage is not limited to block sharing. In a subsequent patch in the series ("ext4: xattr inode deduplication"), we start using the mbcache code for sharing xattr inodes. With that patch, old mb_cache_entry.e_block field could be holding either a block number or an inode number. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
Tracking struct inode * rather than the inode number eliminates the repeated ext4_xattr_inode_iget() call later. The second call cannot fail in practice but still requires explanation when it wants to ignore the return value. Avoid the trouble and make things simple. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
Both ext4_set_acl() and ext4_set_context() need to be made aware of ea_inode feature when it comes to credits calculation. Also add a sufficient credits check in ext4_xattr_set_handle() right after xattr write lock is grabbed. Original credits calculation is done outside the lock so there is a possiblity that the initially calculated credits are not sufficient anymore. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
In a few places the function returns without trying to pass the actual error code to the caller. Fix those. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
When value size is <= EXT4_XATTR_MIN_LARGE_EA_SIZE(), and it doesn't fit in either inline or xattr block, a second try is made to store it in an external inode while storing the entry itself in inline area. There should also be an attempt to store the entry in xattr block. This patch adds a retry loop to do that. It also makes the caller the sole decider on whether to store a value in an external inode. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-