- 15 12月, 2015 2 次提交
-
-
由 Benjamin Marzinski 提交于
gfs2 currently returns 31 bits of filename hash as a cookie that readdir uses for an offset into the directory. When there are a large number of directory entries, the likelihood of a collision goes up way too quickly. GFS2 will now return cookies that are guaranteed unique for a while, and then fail back to using 30 bits of filename hash. Specifically, the directory leaf blocks are divided up into chunks based on the minimum size of a gfs2 directory entry (48 bytes). Each entry's cookie is based off the chunk where it starts, in the linked list of leaf blocks that it hashes to (there are 131072 hash buckets). Directory entries will have unique names until they take reach chunk 8192. Assuming the largest filenames possible, and the least efficient spacing possible, this new method will still be able to return unique names when the previous method has statistically more than a 99% chance of a collision. The non-unique names it fails back to are guaranteed to not collide with the unique names. unique cookies will be in this format: - 1 bit "0" to make sure the the returned cookie is positive - 17 bits for the hash table index - 1 bit for the mode "0" - 13 bits for the offset non-unique cookies will be in this format: - 1 bit "0" to make sure the the returned cookie is positive - 17 bits for the hash table index - 1 bit for the mode "1" - 13 more bits of the name hash Another benefit of location based cookies, is that once a directory's exhash table is fully extended (so that multiple hash table indexs do not use the same leaf blocks), gfs2 can skip sorting the directory entries until it reaches the non-unique ones, and then it only needs to sort these. This provides a significant speed up for directory reads of very large directories. The only issue is that for these cookies to continue to point to the correct entry as files are added and removed from the directory, gfs2 must keep the entries at the same offset in the leaf block when they are split (see my previous patch). This means that until all the nodes in a cluster are running with code that will split the directory leaf blocks this way, none of the nodes can use the new cookie code. To deal with this, gfs2 now has the mount option loccookie, which, if set, will make it return these new location based cookies. This option must not be set until all nodes in the cluster are at least running this version of the kernel code, and you have guaranteed that there are no outstanding cookies required by other software, such as NFS. gfs2 uses some of the extra space at the end of the gfs2_dirent structure to store the calculated readdir cookies. This keeps us from needing to allocate a seperate array to hold these values. gfs2 recomputes the cookie stored in de_cookie for every readdir call. The time it takes to do so is small, and if gfs2 expected this value to be saved on disk, the new code wouldn't work correctly on filesystems created with an earlier version of gfs2. One issue with adding de_cookie to the union in the gfs2_dirent structure is that it caused the union to align itself to a 4 byte boundary, instead of its previous 2 byte boundary. This changed the offset of de_rahead. To solve that, I pulled de_rahead out of the union, since it does not need to be there. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Benjamin Marzinski 提交于
Currently, when gfs2 splits a directory leaf block, the dirents that need to be copied to the new leaf block are packed into the start of it. This is good for space efficiency. However, if gfs2 were to copy those dirents into the exact same offset in the new leaf block as they had in the old block, it would be able to generate a readdir cookie based on the dirent location, that would be guaranteed to be unique up well past where the current code is statistically almost guaranteed to have collisions. So, gfs2 now keeps the dirent's offset in the block the same when it copies it to the new leaf block. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 17 11月, 2015 1 次提交
-
-
由 Andreas Gruenbacher 提交于
When gfs2 allocates an inode and its extended attribute block next to each other at inode create time, the inode's directory entry indicates that in de_rahead. In that case, we can readahead the extended attribute block when we read in the inode. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 05 11月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
This patch changes function gfs2_dir_hash_inval so it uses the i_lock spin_lock to protect the in-core hash table, i_hash_cache. This will prevent double-frees due to a race between gfs2_evict_inode and inode invalidation. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 23 2月, 2015 1 次提交
-
-
由 David Howells 提交于
Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 2月, 2015 1 次提交
-
-
由 Oleg Drokin 提交于
leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so it clearly does not want any shrinker activity within the fs itself. convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve this goal. Signed-off-by: NOleg Drokin <green@linuxhacker.ru> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 20 11月, 2014 2 次提交
-
-
由 Al Viro 提交于
vfree() is allowed under spinlock these days, but it's cheaper when it doesn't step into deferred case and here it's very easy to avoid. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 01 10月, 2014 1 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a regression in the patch "GFS2: Remember directory insert point", commit 2b47dad8. The problem had to do with the rename function: The function found space for the new dirent, and remembered that location. But then the old dirent was removed, which often moved the eligible location for the renamed dirent. Putting the new dirent at the saved location caused file system corruption. This patch adds a new "save_loc" variable to struct gfs2_diradd. If 1, the dirent location is saved. If 0, the dirent location is not saved and the buffer_head is released as per previous behavior. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 07 3月, 2014 2 次提交
-
-
由 Joe Perches 提交于
Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Fabian Frederick 提交于
-All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 07 2月, 2014 1 次提交
-
-
由 Steven Whitehouse 提交于
The intent of this new field in the directory entry is to allow a subsequent lookup to know how many blocks, which are contiguous with the inode, contain metadata which relates to the inode. This will then allow the issuing of a single read to read these blocks, rather than reading the inode first, and then issuing a second read for the metadata. This only works under some fairly strict conditions, since we do not have back pointers from inodes to directory entries we must ensure that the blocks referenced in this way will always belong to the inode. This rules out being able to use this system for indirect blocks, as these can change as a result of truncate/rewrite. So the idea here is to restrict this to xattr blocks only for the time being. For most inodes, that means only a single block. Also, when using ACLs and/or SELinux or other LSMs, these will be added at inode creation time so that they will be contiguous with the inode on disk and also will almost always be needed when we read the inode in for permissions checks. Once an xattr block for an inode is allocated, it will never change until the inode is deallocated. This patch adds the new field, a further patch will add the readahead in due course. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 08 1月, 2014 2 次提交
-
-
由 Steven Whitehouse 提交于
This patch adds four new fields to directory leaf blocks. The intent is not to use them in the kernel itself, although perhaps we may be able to use them as hints at some later date, but instead to provide more information for debug/fsck use. One new field adds a pointer to the inode to which the leaf belongs. This can be useful if the pointer to the leaf block has become corrupt, as it will allow us to know which inode this block should be associated with. This field is set when the leaf is created and never changed over its lifetime. The second field is a "distance from the hash table" field. The meaning is as follows: 0 = An old leaf in which this value has not been set 1 = This leaf is pointed to directly from the hash table 2+ = This leaf is part of a chain, pointed to by another leaf block, the value gives the position in the chain. The third and fourth fields combine to give a time stamp of the most recent directory insertion or deletion from this leaf block. The time stamp is not updated when a new leaf block is chained from the current one. The code is currently written such that the timestamp on the dir inode will match that of the leaf block for the most recent insertion/deletion. For backwards compatibility, any of these new fields which is zero should be considered to be "unknown". Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
For most cases, only a single new block is needed when we reach the point of converting from stuffed to exhash directory. The exception being when the file name is so long that it will not fit within the new leaf block. So this patch adds a simple test for that situation so that we do not need to request the full reservation size in this case. Potentially we could calculate more accurately the value to use in other cases too, but that is much more complicated to do and it is doubtful that the benefit would outweigh the extra cost in code complexity. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 06 1月, 2014 2 次提交
-
-
由 Steven Whitehouse 提交于
When we look to see if there is enough space to add a dir entry without allocation, we have then been repeating the same search later when we do the actual insertion. This patch caches the details of the location in the gfs2_diradd structure, so that we do not have to repeat the search. This will provide a performance improvement which will be greater as the size of the directory increases. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The intent is that this structure will hold the information required when adding entries to a directory (linking). To start with, it will contain only the number of blocks which are required to link the new entry into the directory. The current calculation returns either 0 or the maximim number of blocks that can ever be requested by such a transaction. The intent is that in a later patch, we can update the dir code to calculate this value more accurately. In addition further patches will also add further fields to the new structure to increase its utility. In addition this patch fixes a bug where the link used during inode creation was adding requesting too many blocks in some cases. This is harmless unless the fs is close to being full. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 20 8月, 2013 1 次提交
-
-
由 Joe Perches 提交于
Don't emit OOM warnings when k.alloc calls fail when there there is a v.alloc immediately afterwards. Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc. Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 29 6月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 6月, 2013 1 次提交
-
-
由 Bob Peterson 提交于
Recent commit e8830d88 introduced a bug in function dir_double_exhash; it was failing to set h in the fall-back case. This patch corrects it. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 11 6月, 2013 1 次提交
-
-
由 Steven Whitehouse 提交于
Creation of a new inode requires a directory search in order to ensure that we are not trying to create an inode with the same name as an existing one. This was hidden away inside the create_ok() function. In the case that there was an existing inode, and a lookup can be substituted for a create (which is the case with regular files when the O_EXCL flag is not in use) then we were doing a second lookup in order to return the inode. This patch merges these two lookups into one. This can be done by passing a flag to gfs2_dir_search() to tell it to just return -EEXIST in the cases where we don't actually want to look up the inode. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 03 6月, 2013 1 次提交
-
-
由 Bob Peterson 提交于
This version has one more correction: the vmalloc calls are replaced by __vmalloc calls to preserve the GFP_NOFS flag. When GFS2's directory management code allocates buffers for a directory hash table, if it can't get the memory it needs, it currently gives a bad return code. Rather than giving an error, this patch allows it to use virtual memory rather than kernel memory for the hash table. This should make it possible for directories to function properly, even when kernel memory becomes very fragmented. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 13 2月, 2013 1 次提交
-
-
由 Eric W. Biederman 提交于
Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE so the constants may be well typed. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 29 1月, 2013 1 次提交
-
-
由 Steven Whitehouse 提交于
There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 13 11月, 2012 1 次提交
-
-
由 Bob Peterson 提交于
This patch changes the gfs2_dir_add function so that it uses the dirty_inode function (via mark_inode_dirty) rather than manually updating the dinode. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 06 6月, 2012 1 次提交
-
-
由 Bob Peterson 提交于
This patch moves the ancillary quota data structures into the block reservations structure. This saves GFS2 some time and effort in allocating and deallocating the qadata structure. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 11 5月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This allows comparing hash and len in one operation on 64-bit architectures. Right now only __d_lookup_rcu() takes advantage of this, since that is the case we care most about. The use of anonymous struct/unions hides the alternate 64-bit approach from most users, the exception being a few cases where we initialize a 'struct qstr' with a static initializer. This makes the problematic cases use a new QSTR_INIT() helper function for that (but initializing just the name pointer with a "{ .name = xyzzy }" initializer remains valid, as does just copying another qstr structure). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 4月, 2012 1 次提交
-
-
由 Bob Peterson 提交于
This patch removes the call from gfs2_blk2rgrd to function gfs2_rindex_update and replaces it with individual calls. The former way turned out to be too problematic. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 22 11月, 2011 1 次提交
-
-
由 Bob Peterson 提交于
This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 21 11月, 2011 1 次提交
-
-
由 Bob Peterson 提交于
This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 15 11月, 2011 1 次提交
-
-
由 Bob Peterson 提交于
GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically the same things, with a few exceptions. This patch combines the two functions into a slightly more generic gfs2_alloc_block. Having one centralized block allocation function will reduce code redundancy and make it easier to implement multi-block reservations to reduce file fragmentation in the future. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 09 11月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
As a result, we don't need to test it each time. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
-
- 08 11月, 2011 1 次提交
-
-
由 Bob Peterson 提交于
This patch adds read-ahead capability to GFS2's directory hash table management. It greatly improves performance for some directory operations. For example: In one of my file systems that has 1000 directories, each of which has 1000 files, time to execute a recursive ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced from 2m2.814s on a stock kernel to 0m45.938s. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 21 10月, 2011 4 次提交
-
-
由 Steven Whitehouse 提交于
Each block which is deallocated, requires a call to gfs2_rlist_add() and each of those calls was calling gfs2_blk2rgrpd() in order to figure out which rgrp the block belonged in. This can be speeded up by making use of the rgrp cached in the inode. We also reset this cached rgrp in case the block has changed rgrp. This should provide a big reduction in gfs2_blk2rgrpd() calls during deallocation. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The aim of this patch is to use the newly enhanced ->dirty_inode() super block operation to deal with atime updates, rather than piggy backing that code into ->write_inode() as is currently done. The net result is a simplification of the code in various places and a reduction of the number of gfs2_dinode_out() calls since this is now implied by ->dirty_inode(). Some of the mark_inode_dirty() calls have been moved under glocks in order to take advantage of then being able to avoid locking in ->dirty_inode() when we already have suitable locks. One consequence is that generic_write_end() now correctly deals with file size updates, so that we do not need a separate check for that afterwards. This also, indirectly, means that fdatasync should work correctly on GFS2 - the current code always syncs the metadata whether it needs to or not. Has survived testing with postmark (with and without atime) and also fsx. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Since there is now only a single caller to gfs2_dir_read_data() and it has a number of constant arguments, we can factor those out. Also some tests relating to the inode size were being done twice. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 15 7月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 09 5月, 2011 2 次提交
-
-
由 Steven Whitehouse 提交于
This adds an increment of the link count when we add a new directory entry, if that entry is itself a directory. This means that we no longer need separate code to perform this operation. Now that both adding and removing directory entries automatically update the parent directory's link count if required, that makes the code shorter and simpler than before. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
When we remove an entry from a directory, we can save ourselves some trouble if we know the type of the entry in question, since if it is itself a directory, we can update the link count of the parent at the same time as removing the directory entry. In addition this patch also merges the rmdir and unlink code which was almost identical anyway. This eliminates the calls to remove the . and .. directory entries on each rmdir (not needed since the directory will be deallocated, anyway) which was the only thing preventing passing the dentry to gfs2_dir_del(). The passing of the dentry rather than just the name allows us to figure out the type of the entry which is being removed, and thus adjust the link count when required. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 20 4月, 2011 1 次提交
-
-
由 Bob Peterson 提交于
The previous patches made function gfs2_dir_exhash_dealloc do nothing but call function foreach_leaf. This patch simplifies the code by moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-