- 10 9月, 2011 13 次提交
-
-
由 Theodore Ts'o 提交于
Rename the function so it is more clear what is going on. Also rename the various variables so it's clearer what's happening. Also fix a missing blocks to cluster conversion when reading the number of reserved blocks for root. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This function really claims a number of free clusters, not blocks, so rename it so it's clearer what's going on. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This function really returns the number of clusters after initializing an uninitalized block bitmap has been initialized. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This function really counts the free clusters reported in the block group descriptors, so rename it to reduce confusion. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The field bg_free_blocks_count_{lo,high} in the block group descriptor has been repurposed to hold the number of free clusters for bigalloc functions. So rename the functions so it makes it easier to read and audit the block allocation and block freeing code. Note: at this point in bigalloc development we doesn't support online resize, so this also makes it really obvious all of the places we need to fix up to add support for online resize. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Now that we have implemented all of the changes needed for bigalloc, we can finally enable it! Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aditya Kali 提交于
With bigalloc changes, the i_blocks value was not correctly set (it was still set to number of blocks being used, but in case of bigalloc, we want i_blocks to represent the number of clusters being used). Since the quota subsystem sets the i_blocks value, this patch fixes the quota accounting and makes sure that the i_blocks value is set correctly. Signed-off-by: NAditya Kali <adityakali@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Convert the free_blocks to be free_clusters to make the final revised bigalloc changes easier to read/understand. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Convert the percpu counters s_dirtyblocks_counter and s_freeblocks_counter in struct ext4_super_info to be s_dirtyclusters_counter and s_freeclusters_counter. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The ext4_free_blocks() function now has two new flags that indicate whether a partial cluster at the beginning or the end of the block extents should be freed or not. That will be up the caller (i.e., truncate), who can figure out whether partial clusters at the beginning or the end of a block range can be freed. We also have to update the ext4_mb_free_metadata() and release_blocks_on_commit() machinery to be cluster-based, since it is used by ext4_free_blocks(). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Add bigalloc support to ext4_init_block_bitmap() and ext4_free_blocks_after_init(). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The function ext4_free_blocks_after_init() used to be a #define of ext4_init_block_bitmap(). This actually made it difficult to understand how the function worked, and made it hard make changes to support clusters. So as an initial cleanup, I've separated out the functionality of initializing block bitmap from calculating the number of free blocks in the new block group. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This adds supports for bigalloc file systems. It teaches the mount code just enough about bigalloc superblock fields that it will mount the file system without freaking out that the number of blocks per group is too big. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 9月, 2011 1 次提交
-
-
由 Theodore Ts'o 提交于
If the user explicitly specifies conflicting mount options for delalloc or dioread_nolock and data=journal, fail the mount, instead of printing a warning and continuing (since many user's won't look at dmesg and notice the warning). Also, print a single warning that data=journal implies that delayed allocation is not on by default (since it's not supported), and furthermore that O_DIRECT is not supported. Improve the text in Documentation/filesystems/ext4.txt so this is clear there as well. Similarly, if the dioread_nolock mount option is specified when the file system block size != PAGE_SIZE, fail the mount instead of printing a warning message and ignoring the mount option. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 03 9月, 2011 1 次提交
-
-
由 Allison Henderson 提交于
This patch adds two new routines: ext4_discard_partial_page_buffers and ext4_discard_partial_page_buffers_no_lock. The ext4_discard_partial_page_buffers routine is a wrapper function to ext4_discard_partial_page_buffers_no_lock. The wrapper function locks the page and passes it to ext4_discard_partial_page_buffers_no_lock. Calling functions that already have the page locked can call ext4_discard_partial_page_buffers_no_lock directly. The ext4_discard_partial_page_buffers_no_lock function zeros a specified range in a page, and unmaps the corresponding buffer heads. Only block aligned regions of the page will have their buffer heads unmapped. Unblock aligned regions will be mapped if needed so that they can be updated with the partial zero out. This function is meant to be used to update a page and its buffer heads to be zeroed and unmapped when the corresponding blocks have been released or will be released. This routine is used in the following scenarios: * A hole is punched and the non page aligned regions of the head and tail of the hole need to be discarded * The file is truncated and the partial page beyond EOF needs to be discarded * The end of a hole is in the same page as EOF. After the page is flushed, the partial page beyond EOF needs to be discarded. * A write operation begins or ends inside a hole and the partial page appearing before or after the write needs to be discarded * A write operation extends EOF and the partial page beyond EOF needs to be discarded This function takes a flag EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED which is used when a write operation begins or ends in a hole. When the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED flag is used, only buffer heads that are already unmapped will have the corresponding regions of the page zeroed. Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 31 8月, 2011 2 次提交
-
-
由 Theodore Ts'o 提交于
This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc:stable@kernel.org
-
由 Jiaying Zhang 提交于
The i_mutex lock and flush_completed_IO() added by commit 2581fdc8 in ext4_evict_inode() causes lockdep complaining about potential deadlock in several places. In most/all of these LOCKDEP complaints it looks like it's a false positive, since many of the potential circular locking cases can't take place by the time the ext4_evict_inode() is called; but since at the very least it may mask real problems, we need to address this. This change removes the flush_completed_IO() and i_mutex lock in ext4_evict_inode(). Instead, we take a different approach to resolve the software lockup that commit 2581fdc8 intends to fix. Rather than having ext4-dio-unwritten thread wait for grabing the i_mutex lock of an inode, we use mutex_trylock() instead, and simply requeue the work item if we fail to grab the inode's i_mutex lock. This should speed up work queue processing in general and also prevents the following deadlock scenario: During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Signed-off-by: NJiaying Zhang <jiayingz@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 01 8月, 2011 1 次提交
-
-
由 Theodore Ts'o 提交于
Introduce new helper functions which try kmalloc, and then fall back to vmalloc if necessary, and use them for allocating and deallocating s_flex_groups. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 27 7月, 2011 4 次提交
-
-
由 Yongqiang Yang 提交于
Rename mb_set_bits() to ext4_set_bits() and make it a global function so that setup_new_group_blocks() can use it. Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Yongqiang Yang 提交于
This patch lets ext4_group_add_blocks() return an error code if it fails, so that upper functions can handle error correctly. Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Yongqiang Yang 提交于
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Yongqiang Yang 提交于
Before this patch, parallel resizers are allowed and protected by a mutex lock, actually, there is no need to support parallel resizer, so this patch prevents parallel resizers by atmoic bit ops, like lock_page() and unlock_page() do. To do this, the patch removed the mutex lock s_resize_lock from struct ext4_sb_info and added a unsigned long field named s_resize_flags which inidicates if there is a resizer. Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 21 7月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 7月, 2011 2 次提交
-
-
由 Tao Ma 提交于
In ext4, when FITRIM is called every time, we iterate all the groups and do trim one by one. It is a bit time wasting if the group has been trimmed and there is no change since the last trim. So this patch adds a new flag in ext4_group_info->bb_state to indicate that the group has been trimmed, and it will be cleared if some blocks is freed(in release_blocks_on_commit). Another trim_minlen is added in ext4_sb_info to record the last minlen we use to trim the volume, so that if the caller provide a small one, we will go on the trim regardless of the bb_state. A simple test with my intel x25m ssd: df -h shows: /dev/sdb1 40G 21G 17G 56% /mnt/ext4 Block size: 4096 run the FITRIM with the following parameter: range.start = 0; range.len = UINT64_MAX; range.minlen = 1048576; without the patch: [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.505s user 0m0.000s sys 0m1.224s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.359s user 0m0.000s sys 0m1.178s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.228s user 0m0.000s sys 0m1.151s with the patch: [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.625s user 0m0.000s sys 0m1.269s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m0.002s user 0m0.000s sys 0m0.001s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m0.002s user 0m0.000s sys 0m0.001s A big improvement for the 2nd and 3rd run. Even after I delete some big image files, it is still much faster than iterating the whole disk. [root@boyu-tm test]# time ./ftrim /mnt/ext4/a real 0m1.217s user 0m0.000s sys 0m0.196s Cc: Lukas Czerner <lczerner@redhat.com> Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Maxim Patlasov 提交于
The current implementation of ext4_free_blocks() always calls dquot_free_block This looks quite sensible in the most cases: blocks to be freed are associated with inode and were accounted in quota and i_blocks some time ago. However, there is a case when blocks to free were not accounted by the time calling ext4_free_blocks() yet: 1. delalloc is on, write_begin pre-allocated some space in quota 2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks() 3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from ext4_ext_insert_extent() and calls ext4_free_blocks(). In this scenario, ext4_free_blocks() calls dquot_free_block() who, in turn, decrements i_blocks for blocks which were not accounted yet (due to delalloc) After clean umount, e2fsck reports something like: > Inode 21, i_blocks is 5080, should be 5128. Fix<y>? because i_blocks was erroneously decremented as explained above. The patch fixes the problem by passing the new flag EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request that the dquot_free_block() call be skipped. Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
- 28 6月, 2011 4 次提交
-
-
由 Eric Sandeen 提交于
I found that ext4_ext_find_goal() and ext4_find_near() share the same code for returning a coloured start block based on i_block_group. We can refactor this into a common function so that they don't diverge in the future. Thanks to adilger for suggesting the new function name. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Amir Goldstein 提交于
This patch moves functions from inode.c to indirect.c. The moved functions are ext4_ind_* functions and their helpers. Functions called from inode.c are declared extern. Signed-off-by: NAmir Goldstein <amir73il@users.sf.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
In preparation for moving the indirect functions to a separate file, move __ext4_check_blockref() to block_validity.c and rename it to ext4_check_blockref() which is exported as globally visible function. Also, rename the cpp macro ext4_check_inode_blockref() to ext4_ind_check_inode(), to make it clear that it is only valid for use with non-extent mapped inodes. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Amir Goldstein 提交于
We are about to move all indirect inode functions to a new file. Before we do that, let's split ext4_ind_truncate() out of ext4_truncate() leaving only generic code in the latter, so we will be able to move ext4_ind_truncate() to the new file. Signed-off-by: NAmir Goldstein <amir73il@users.sf.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 27 5月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 5月, 2011 6 次提交
-
-
由 Vivek Haldar 提交于
Currently, an fallocate request of size slightly larger than a power of 2 is turned into two block requests, each a power of 2, with the extra blocks pre-allocated for future use. When an application calls fallocate, it already has an idea about how large the file may grow so there is usually little benefit to reserve extra blocks on the preallocation list. This reduces disk fragmentation. Tested: fsstress. Also verified manually that fallocat'ed files are contiguously laid out with this change (whereas without it they begin at power-of-2 boundaries, leaving blocks in between). CPU usage of fallocate is not appreciably higher. In a tight fallocate loop, CPU usage hovers between 5%-8% with this change, and 5%-7% without it. Using a simulated file system aging program which the file system to 70%, the percentage of free extents larger than 8MB (as measured by e2freefrag) increased from 38.8% without this change, to 69.4% with this change. Signed-off-by: NVivek Haldar <haldar@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Allison Henderson 提交于
This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole" and "ext4_ext_check_cache" fallocate has been modified to call ext4_punch_hole when the punch hole flag is passed. At the moment, we only support punching holes in extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole routine. The ext4_ext_punch_hole routine first completes all outstanding writes with the associated pages, and then releases them. The unblock aligned data is zeroed, and all blocks in between are punched out. The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache except it accepts a ext4_ext_cache parameter instead of a ext4_extent parameter. This routine is used by ext4_ext_punch_hole to check and see if a block in a hole that has been cached. The ext4_ext_cache parameter is necessary because the members ext4_extent structure are not large enough to hold a 32 bit value. The existing ext4_ext_in_cache routine has become a wrapper to this new function. [ext4 punch hole patch series 5/5 v7] Signed-off-by: NAllison Henderson <achender@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NMingming Cao <cmm@us.ibm.com>
-
由 Allison Henderson 提交于
This patch modifies the existing ext4_block_truncate_page() function which was used by the truncate code path, and which zeroes out block unaligned data, by adding a new length parameter, and renames it to ext4_block_zero_page_rage(). This function can now be used to zero out the head of a block, the tail of a block, or the middle of a block. The ext4_block_truncate_page() function is now a wrapper to ext4_block_zero_page_range(). [ext4 punch hole patch series 2/5 v7] Signed-off-by: NAllison Henderson <achender@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NMingming Cao <cmm@us.ibm.com>
-
由 Allison Henderson 提交于
This patch adds an allocation request flag to the ext4_has_free_blocks function which enables the use of reserved blocks. This will allow a punch hole to proceed even if the disk is full. Punching a hole may require additional blocks to first split the extents. Because ext4_has_free_blocks is a low level function, the flag needs to be passed down through several functions listed below: ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth ext4_ext_split ext4_ext_new_meta_block ext4_mb_new_blocks ext4_claim_free_blocks ext4_has_free_blocks [ext4 punch hole patch series 1/5 v7] Signed-off-by: NAllison Henderson <achender@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NMingming Cao <cmm@us.ibm.com>
-
由 Aditya Kali 提交于
I am working on patch to add quota as a built-in feature for ext4 filesystem. The implementation is based on the design given at https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4. This patch reserves the inode numbers 3 and 4 for quota purposes and also reserves EXT4_FEATURE_RO_COMPAT_QUOTA feature code. Signed-off-by: NAditya Kali <adityakali@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Johann Lombardi 提交于
Prevent an ext4 filesystem from being mounted multiple times. A sequence number is stored on disk and is periodically updated (every 5 seconds by default) by a mounted filesystem. At mount time, we now wait for s_mmp_update_interval seconds to make sure that the MMP sequence does not change. In case of failure, the nodename, bdevname and the time at which the MMP block was last updated is displayed. Signed-off-by: NAndreas Dilger <adilger@whamcloud.com> Signed-off-by: NJohann Lombardi <johann@whamcloud.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 23 5月, 2011 1 次提交
-
-
由 Vivek Haldar 提交于
The number of hits and misses for each filesystem is exposed in /sys/fs/ext4/<dev>/extent_cache_{hits, misses}. Tested: fsstress, manual checks. Signed-off-by: NVivek Haldar <haldar@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 21 5月, 2011 2 次提交
-
-
由 Lukas Czerner 提交于
For some reason we have been waiting for lazyinit thread to start in the ext4_run_lazyinit_thread() but it is not needed since it was jus unnecessary complexity, so get rid of it. We can also remove li_task and li_wait_task since it is not used anymore. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
由 Lukas Czerner 提交于
In order to make lazyinit eat approx. 10% of io bandwidth at max, we are sleeping between zeroing each single inode table. For that purpose we are using timer which wakes up thread when it expires. It is set via add_timer() and this may cause troubles in the case that thread has been woken up earlier and in next iteration we call add_timer() on still running timer hence hitting BUG_ON in add_timer(). We could fix that by using mod_timer() instead however we can use schedule_timeout_interruptible() for waiting and hence simplifying things a lot. This commit exchange the old "waiting mechanism" with simple schedule_timeout_interruptible(), setting the time to sleep. Hence we do not longer need li_wait_daemon waiting queue and others, so get rid of it. Addresses-Red-Hat-Bugzilla: #699708 Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
- 09 5月, 2011 1 次提交
-
-
由 Amir Goldstein 提交于
In preparation for the next patch, the function ext4_add_groupblocks() is moved to mballoc.c, where it could use some static functions. Signed-off-by: NAmir Goldstein <amir73il@users.sf.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-