- 21 2月, 2019 3 次提交
-
-
由 Eric Whitney 提交于
commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream. Modify ext4_ext_remove_space() and the code it calls to correct the reserved cluster count for pending reservations (delayed allocated clusters shared with allocated blocks) when a block range is removed from the extent tree. Pending reservations may be found for the clusters at the ends of written or unwritten extents when a block range is removed. If a physical cluster at the end of an extent is freed, it's necessary to increment the reserved cluster count to maintain correct accounting if the corresponding logical cluster is shared with at least one delayed and unwritten extent as found in the extents status tree. Add a new function, ext4_rereserve_cluster(), to reapply a reservation on a delayed allocated cluster sharing blocks with a freed allocated cluster. To avoid ENOSPC on reservation, a flag is applied to ext4_free_blocks() to briefly defer updating the freeclusters counter when an allocated cluster is freed. This prevents another thread from allocating the freed block before the reservation can be reapplied. Redefine the partial cluster object as a struct to carry more state information and to clarify the code using it. Adjust the conditional code structure in ext4_ext_remove_space to reduce the indentation level in the main body of the code to improve readability. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream. The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream. Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
- 10 1月, 2019 1 次提交
-
-
由 Theodore Ts'o 提交于
commit fde872682e175743e0c3ef939c89e3c6008a1529 upstream. Some time back, nfsd switched from calling vfs_fsync() to using a new commit_metadata() hook in export_operations(). If the file system did not provide a commit_metadata() hook, it fell back to using sync_inode_metadata(). Unfortunately doesn't work on all file systems. In particular, it doesn't work on ext4 due to how the inode gets journalled --- the VFS writeback code will not always call ext4_write_inode(). So we need to provide our own ext4_nfs_commit_metdata() method which calls ext4_write_inode() directly. Google-Bug-Id: 121195940 Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 19 2月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 02 11月, 2017 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 31 7月, 2017 1 次提交
-
-
由 Eric Whitney 提交于
Two variables in ext4_inode_info, i_reserved_meta_blocks and i_allocated_meta_blocks, are unused. Removing them saves a little memory per in-memory inode and cleans up clutter in several tracepoints. Adjust tracepoint output from ext4_alloc_da_blocks() for consistency and fix a typo and whitespace near these changes. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 30 4月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Support the GETFSMAP ioctls so that we can use the xfs free space management tools to probe ext4 as well. Note that this is a partial implementation -- we only report fixed-location metadata and free space; everything else is reported as "unknown". Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 11 4月, 2016 1 次提交
-
-
由 Al Viro 提交于
... and neither can ever be NULL Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 12月, 2015 2 次提交
-
-
由 Jan Kara 提交于
DAX page fault path needs to get blocks that are pre-zeroed to avoid races when two concurrent page faults happen in the same block of a file. Implement support for this in ext4_map_blocks(). Signed-off-by: NJan Kara <jack@suse.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
When dioread_nolock mode is enabled, we grab i_data_sem in ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block() not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However holding i_data_sem over overwrite direct IO isn't needed these days. We have exclusion against truncate / hole punching because we increase i_dio_count under i_mutex in ext4_ext_direct_IO() so once ext4_file_write_iter() verifies blocks are allocated & written, they are guaranteed to stay so during the whole direct IO even after we drop i_mutex. So we can just remove this locking abuse and the no longer necessary EXT4_GET_BLOCKS_NO_LOCK flag. Signed-off-by: NJan Kara <jack@suse.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 6月, 2015 1 次提交
-
-
由 Eric Whitney 提交于
Remove outdated comments and dead code from ext4_da_reserve_space. Clean up its trace point, and relocate it to make it more useful. While we're at it, fix a nearby conditional used to determine if we have a non-bigalloc file system. It doesn't match usage elsewhere in the code, and misleadingly suggests that an s_cluster_ratio value of 0 would be legal. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 09 6月, 2015 1 次提交
-
-
由 Namjae Jeon 提交于
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying between [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
-
- 16 4月, 2015 1 次提交
-
-
由 David Howells 提交于
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 3月, 2015 1 次提交
-
-
由 Scott Wood 提交于
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Link: http://lkml.kernel.org/r/1426130037-17956-22-git-send-email-scottwood@freescale.comSigned-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 05 2月, 2015 1 次提交
-
-
由 Theodore Ts'o 提交于
Add an optimization for the MS_LAZYTIME mount option so that we will opportunistically write out any inodes with the I_DIRTY_TIME flag set in a particular inode table block when we need to update some inode in that inode table block anyway. Also add some temporary code so that we can set the lazytime mount option without needing a modified /sbin/mount program which can set MS_LAZYTIME. We can eventually make this go away once util-linux has added support. Google-Bug-Id: 18297052 Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 11月, 2014 3 次提交
-
-
由 Zheng Liu 提交于
In this commit we discard the lru algorithm for inodes with extent status tree because it takes significant effort to maintain a lru list in extent status tree shrinker and the shrinker can take a long time to scan this lru list in order to reclaim some objects. We replace the lru ordering with a simple round-robin. After that we never need to keep a lru list. That means that the list needn't be sorted if the shrinker can not reclaim any objects in the first round. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Zheng Liu 提交于
Currently extent status tree doesn't cache extent hole when a write looks up in extent tree to make sure whether a block has been allocated or not. In this case, we don't put extent hole in extent cache because later this extent might be removed and a new delayed extent might be added back. But it will cause a defect when we do a lot of writes. If we don't put extent hole in extent cache, the following writes also need to access extent tree to look at whether or not a block has been allocated. It brings a cache miss. This commit fixes this defect. Also if the inode doesn't have any extent, this extent hole will be cached as well. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
For bigalloc filesystems we have to check whether newly requested inode block isn't already part of a cluster for which we already have delayed allocation reservation. This check happens in ext4_ext_map_blocks() and that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if ext4_da_map_blocks() finds in extent cache information about the block, we don't call into ext4_ext_map_blocks() and thus we always end up getting new reservation even if the space for cluster is already reserved. This results in overreservation and premature ENOSPC reports. Fix the problem by checking for existing cluster reservation already in ext4_da_map_blocks(). That simplifies the logic and actually allows us to get rid of the EXT4_MAP_FROM_CLUSTER flag completely. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 02 9月, 2014 2 次提交
-
-
由 Zheng Liu 提交于
This commit adds some statictics in extent status tree shrinker. The purpose to add these is that we want to collect more details when we encounter a stall caused by extent status tree shrinker. Here we count the following statictics: stats: the number of all objects on all extent status trees the number of reclaimable objects on lru list cache hits/misses the last sorted interval the number of inodes on lru list average: scan time for shrinking some objects the number of shrunk objects maximum: the inode that has max nr. of objects on lru list the maximum scan time for shrinking some objects The output looks like below: $ cat /proc/fs/ext4/sda1/es_shrinker_info stats: 28228 objects 6341 reclaimable objects 5281/631 cache hits/misses 586 ms last sorted interval 250 inodes on lru list average: 153 us scan time 128 shrunk objects maximum: 255 inode (255 objects, 198 reclaimable) 125723 us max scan time If the lru list has never been sorted, the following line will not be printed: 586ms last sorted interval If there is an empty lru list, the following lines also will not be printed: 250 inodes on lru list ... maximum: 255 inode (255 objects, 198 reclaimable) 0 us max scan time Meanwhile in this commit a new trace point is defined to print some details in __ext4_es_shrink(). Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Zheng Liu 提交于
This commit improves the trace point of extents status tree. We rename trace_ext4_es_shrink_enter in ext4_es_count() because it is also used in ext4_es_scan() and we can not identify them from the result. Further this commit fixes a variable name in trace point in order to keep consistency with others. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 21 4月, 2014 2 次提交
-
-
由 Lukas Czerner 提交于
Currently in ext4 there is quite a mess when it comes to naming unwritten extents. Sometimes we call it uninitialized and sometimes we refer to it as unwritten. The right name for the extent which has been allocated but does not contain any written data is _unwritten_. Other file systems are using this name consistently, even the buffer head state refers to it as unwritten. We need to fix this confusion in ext4. This commit changes every reference to an uninitialized extent (meaning allocated but unwritten) to unwritten extent. This includes comments, function names and variable names. It even covers abbreviation of the word uninitialized (such as uninit) and some misspellings. This commit does not change any of the code paths at all. This has been confirmed by comparing md5sums of the assembly code of each object file after all the function names were stripped from it. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the cases where we're using dioread_nolock and we're writing into either unallocated, or unwritten extent, because we need to make sure that any DIO write into that inode will wait for the extent conversion. However EXT4_MAP_UNINIT is not only entirely misleading name but also unnecessary because we can check for EXT4_MAP_UNWRITTEN in the dioread_nolock case instead. This commit removes EXT4_MAP_UNINIT flag. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 15 4月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
In retrospect, this was a bad way to handle things, since it limited testing of these patches. We should just get the VFS level changes merged in first. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 19 3月, 2014 1 次提交
-
-
由 Lukas Czerner 提交于
Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Also add appropriate tracepoints. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 24 2月, 2014 1 次提交
-
-
由 Namjae Jeon 提交于
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Tested-by: NDongsu Park <dongsu.park@profitbricks.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 22 2月, 2014 1 次提交
-
-
由 Lukas Czerner 提交于
Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 29 8月, 2013 1 次提交
-
-
由 Zheng Liu 提交于
After applied the commit (4a092d73), we have reduced the number of source files that need to #include ext4_extents.h. But we can do better. This commit defines ext4_zeroout_es() in extents.c and move EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in indirect.c and ioctl.c. Meanwhile we just need to include this file in extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this commit removes a duplicated declaration in trace/events/ext4.h. After applied this patch, we just need to include ext4_extents.h file in {super,migrate,move_extents,extents}.c, and it is easy for us to define a new extent disk layout. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 8月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
When we read in an extent tree leaf block from disk, arrange to have all of its entries cached. In nearly all cases the in-memory representation will be more compact than the on-disk representation in the buffer cache, and it allows us to get the information without having to traverse the extent tree for successive extents. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Theodore Ts'o 提交于
Don't use an unsigned long long for the es_status flags; this requires that we pass 64-bit values around which is painful on 32-bit systems. Instead pass the extent status flags around using the low 4 bits of an unsigned int, and shift them into place when we are reading or writing es_pblk. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
- 01 7月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Translate the bitfields used in various flags argument to strings to make the tracepoint output more human-readable. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 6月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Rename ext4_da_writepages() to ext4_writepages() and use it for all modes. We still need to iterate over all the pages in the case of data=journalling, but in the case of nodelalloc/data=ordered (which is what file systems mounted using ext3 backwards compatibility will use) this will allow us to use a much more efficient I/O submission path. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 6月, 2013 2 次提交
-
-
由 Jan Kara 提交于
There are two issues with current writeback path in ext4. For one we don't necessarily map complete pages when blocksize < pagesize and thus needn't do any writeback in one iteration. We always map some blocks though so we will eventually finish mapping the page. Just if writeback races with other operations on the file, forward progress is not really guaranteed. The second problem is that current code structure makes it hard to associate all the bios to some range of pages with one io_end structure so that unwritten extents can be converted after all the bios are finished. This will be especially difficult later when io_end will be associated with reserved transaction handle. We restructure the writeback path to a relatively simple loop which first prepares extent of pages, then maps one or more extents so that no page is partially mapped, and once page is fully mapped it is submitted for IO. We keep all the mapping and IO submission information in mpage_da_data structure to somewhat reduce stack usage. Resulting code is somewhat shorter than the old one and hopefully also easier to read. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 5月, 2013 2 次提交
-
-
由 Lukas Czerner 提交于
Currently punch hole is disabled in file systems with bigalloc feature enabled. However the recent changes in punch hole patch should make it easier to support punching holes on bigalloc enabled file systems. This commit changes partial_cluster handling in ext4_remove_blocks(), ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently partial_cluster is unsigned long long type and it makes sure that we will free the partial cluster if all extents has been released from that cluster. However it has been specifically designed only for truncate. With punch hole we can be freeing just some extents in the cluster leaving the rest untouched. So we have to make sure that we will notice cluster which still has some extents. To do this I've changed partial_cluster to be signed long long type. The only scenario where this could be a problem is when cluster_size == block size, however in that case there would not be any partial clusters so we're safe. For bigger clusters the signed type is enough. Now we use the negative value in partial_cluster to mark such cluster used, hence we know that we must not free it even if all other extents has been freed from such cluster. This scenario can be described in simple diagram: |FFF...FF..FF.UUU| ^----------^ punch hole . - free space | - cluster boundary F - freed extent U - used extent Also update respective tracepoints to use signed long long type for partial_cluster. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Add "end" variable. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 5月, 2013 1 次提交
-
-
由 Lukas Czerner 提交于
->invalidatepage() aop now accepts range to invalidate so we can make use of it in all ext4 invalidatepage routines. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 03 5月, 2013 1 次提交
-
-
由 Yan, Zheng 提交于
We (Linux Kernel Performance project) found a regression introduced by commit: f7fec032 ext4: track all extent status in extent status tree The commit causes about 20% performance decrease in fio random write test. Profiler shows that rb_next() uses a lot of CPU time. The call stack is: rb_next ext4_es_find_delayed_extent ext4_map_blocks _ext4_get_block ext4_get_block_write __blockdev_direct_IO ext4_direct_IO generic_file_direct_write __generic_file_aio_write ext4_file_write aio_rw_vect_retry aio_run_iocb do_io_submit sys_io_submit system_call_fastpath io_submit td_io_getevents io_u_queued_complete thread_main main __libc_start_main The cause is that ext4_es_find_delayed_extent() doesn't have an upper bound, it keeps searching until a delayed extent is found. When there are a lots of non-delayed entries in the extent state tree, ext4_es_find_delayed_extent() may uses a lot of CPU time. Reported-by: NLKP project <lkp@linux.intel.com> Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Cc: "Theodore Ts'o" <tytso@mit.edu>
-
- 10 4月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
None of these result in any bug, but they makes sparse complain. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 4月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
The only difference between how we handle data=ordered and data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate code duplication by factoring out redundant the code paths. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com>
-