- 25 8月, 2017 2 次提交
-
-
由 Andreas Gruenbacher 提交于
Enlarge sd_fsname to be big enough for the longest long lock table name and an arbitrary journal number. This silences two -Wformat-truncation warnings with gcc 7.1.1. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Bob Peterson 提交于
Before this patch, if GFS2 encountered IO errors while writing to the journal, it would not report the problem, so they would go unnoticed, sometimes for many hours. Sometimes this would only be noticed later, when recovery tried to do journal replay and failed due to invalid metadata at the blocks that resulted in IO errors. This patch makes GFS2's log daemon check for IO errors. If it encounters one, it withdraws from the file system and reports why in dmesg. A similar action is taken when IO errors occur when writing to the system statfs file. These errors are also reported back to any callers of fsync, since that requires the journal to be flushed. Therefore, any IO errors that would previously go unnoticed are now noticed and the file system is withdrawn as early as possible, thus preventing further file system damage. Also note that this reintroduces superblock variable sd_log_error, which Christoph removed with commit f729b66f. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 10 8月, 2017 1 次提交
-
-
由 Abhi Das 提交于
On systems with low memory, it is possible for gfs2 to infinitely loop in balance_dirty_pages() under heavy IO (creating sparse files). balance_dirty_pages() attempts to write out the dirty pages via gfs2_writepages() but none are found because these dirty pages are being used by the journaling code in the ail. Normally, the journal has an upper threshold which when hit triggers an automatic flush of the ail. But this threshold can be higher than the number of allowable dirty pages and result in the ail never being flushed. This patch forces an ail flush when gfs2_writepages() fails to write anything. This is a good indication that the ail might be holding some dirty pages. Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 08 7月, 2017 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Before commit 88ffbf3e "GFS2: Use resizable hash table for glocks", glocks were freed via call_rcu to allow reading the glock hashtable locklessly using rcu. This was then changed to free glocks immediately, which made reading the glock hashtable unsafe. Bring back the original code for freeing glocks via call_rcu. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # 4.3+
-
- 05 7月, 2017 2 次提交
-
-
由 Andreas Gruenbacher 提交于
Put all remaining accesses to gl->gl_object under the gl->gl_lockref.lock spinlock to prevent races. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andreas Gruenbacher 提交于
So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock work queue to make sure that inode glops which dereference gl->gl_object have finished running before the inode is destroyed. However, flushing the work queue may do more work than needed, and in particular, it may call into DLM, which we want to avoid here. Use a bit lock (GIF_GLOP_PENDING) to synchronize between the inode glops and gfs2_evict_inode instead to get rid of the flushing. In addition, flush the work queues of existing glocks before reusing them for new inodes to get those glocks into a known state: the glock state engine currently doesn't handle glock re-appropriation correctly. (We may be able to fix the glock state engine instead later.) Based on a patch by Steven Whitehouse <swhiteho@redhat.com>. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 20 6月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
Superblock variable sd_log_flush_wrapped is set, but never referenced, so this patch eliminates it. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 13 6月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
The gl_list is no longer used nor needed in the glock structure, so this patch eliminates it. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 09 6月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 3月, 2017 1 次提交
-
-
由 Andreas Gruenbacher 提交于
As per a suggestion by Linus, don't pack struct lm_lockname: we did that because the struct is used as a rhashtable key, but packing tells the compiler that the 64-bit fields in the struct may be unaligned, causing it to generate worse code on some architectures. Instead, rearrange the fields in the struct so that there is no padding between fields, and exclude any tail padding from the hash key size. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 15 3月, 2017 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Commit 88ffbf3e switches to using rhashtables for glocks, hashing over the entire struct lm_lockname instead of its individual fields. On some architectures, struct lm_lockname contains a hole of uninitialized memory due to alignment rules, which now leads to incorrect hash values. Get rid of that hole. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> CC: <stable@vger.kernel.org> #v4.3+
-
- 27 1月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
This patch eliminates the int variable tr_touched in favor of a new flag in the transaction. This is a step toward reducing contention on the gfs2_log_lock spin_lock. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 06 1月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, the logd daemon only tried to flush things when the log blocks pinned exceeded a certain threshold. But when we're deleting very large files, it may require a huge number of journal blocks, and that, in turn, may exceed the threshold. This patch factors that into account. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 15 3月, 2016 1 次提交
-
-
由 Bob Peterson 提交于
This patch tries to prevent delete work (queued via iopen callback) from executing if the glock is currently being used to create a new inode. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 15 12月, 2015 3 次提交
-
-
由 Benjamin Marzinski 提交于
gfs2 currently returns 31 bits of filename hash as a cookie that readdir uses for an offset into the directory. When there are a large number of directory entries, the likelihood of a collision goes up way too quickly. GFS2 will now return cookies that are guaranteed unique for a while, and then fail back to using 30 bits of filename hash. Specifically, the directory leaf blocks are divided up into chunks based on the minimum size of a gfs2 directory entry (48 bytes). Each entry's cookie is based off the chunk where it starts, in the linked list of leaf blocks that it hashes to (there are 131072 hash buckets). Directory entries will have unique names until they take reach chunk 8192. Assuming the largest filenames possible, and the least efficient spacing possible, this new method will still be able to return unique names when the previous method has statistically more than a 99% chance of a collision. The non-unique names it fails back to are guaranteed to not collide with the unique names. unique cookies will be in this format: - 1 bit "0" to make sure the the returned cookie is positive - 17 bits for the hash table index - 1 bit for the mode "0" - 13 bits for the offset non-unique cookies will be in this format: - 1 bit "0" to make sure the the returned cookie is positive - 17 bits for the hash table index - 1 bit for the mode "1" - 13 more bits of the name hash Another benefit of location based cookies, is that once a directory's exhash table is fully extended (so that multiple hash table indexs do not use the same leaf blocks), gfs2 can skip sorting the directory entries until it reaches the non-unique ones, and then it only needs to sort these. This provides a significant speed up for directory reads of very large directories. The only issue is that for these cookies to continue to point to the correct entry as files are added and removed from the directory, gfs2 must keep the entries at the same offset in the leaf block when they are split (see my previous patch). This means that until all the nodes in a cluster are running with code that will split the directory leaf blocks this way, none of the nodes can use the new cookie code. To deal with this, gfs2 now has the mount option loccookie, which, if set, will make it return these new location based cookies. This option must not be set until all nodes in the cluster are at least running this version of the kernel code, and you have guaranteed that there are no outstanding cookies required by other software, such as NFS. gfs2 uses some of the extra space at the end of the gfs2_dirent structure to store the calculated readdir cookies. This keeps us from needing to allocate a seperate array to hold these values. gfs2 recomputes the cookie stored in de_cookie for every readdir call. The time it takes to do so is small, and if gfs2 expected this value to be saved on disk, the new code wouldn't work correctly on filesystems created with an earlier version of gfs2. One issue with adding de_cookie to the union in the gfs2_dirent structure is that it caused the union to align itself to a 4 byte boundary, instead of its previous 2 byte boundary. This changed the offset of de_rahead. To solve that, I pulled de_rahead out of the union, since it does not need to be there. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Bob Peterson 提交于
This patch makes no functional changes. Its goal is to reduce the size of the gfs2 inode in memory by rearranging structures and changing the size of some variables within the structure. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Bob Peterson 提交于
Before this patch, multi-block reservation structures were allocated from a special slab. This patch folds the structure into the gfs2_inode structure. The disadvantage is that the gfs2_inode needs more memory, even when a file is opened read-only. The advantages are: (a) we don't need the special slab and the extra time it takes to allocate and deallocate from it. (b) we no longer need to worry that the structure exists for things like quota management. (c) This also allows us to remove the calls to get_write_access and put_write_access since we know the structure will exist. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 24 11月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
This patch basically reverts the majority of patch 5407e242. That patch eliminated the gfs2_qadata structure in favor of just using the reservations structure. The problem with doing that is that it increases the size of the reservations structure. That is not an issue until it comes time to fold the reservations structure into the inode in memory so we know it's always there. By separating out the quota structure again, we aren't punishing the non-quota users by making all the inodes bigger, requiring more slab space. This patch creates a new slab area to allocate the quota stuff so it's managed a little more sanely. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 17 11月, 2015 1 次提交
-
-
由 Andreas Gruenbacher 提交于
When gfs2 allocates an inode and its extended attribute block next to each other at inode create time, the inode's directory entry indicates that in de_rahead. In that case, we can readahead the extended attribute block when we read in the inode. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 30 10月, 2015 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in gl_lockref). Remove that define to make the references to gl_lockref.lock more obvious. Signed-off-by: NAndreas Gruenbacher <andreas.gruenbacher@gmail.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 04 9月, 2015 3 次提交
-
-
由 Ben Hutchings 提交于
None of these statistics can meaningfully be negative, and the numerator for do_div() must have the type u64. The generic implementation of do_div() used on some 32-bit architectures asserts that, resulting in a compiler error in gfs2_rgrp_congested(). Fixes: 0166b197 ("GFS2: Average in only non-zero round-trip times ...") Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
由 Bob Peterson 提交于
This patch changes the glock hash table from a normal hash table to a resizable hash table, which scales better. This also simplifies a lot of code. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
What uniquely identifies a glock in the glock hash table is not gl_name, but gl_name and its superblock pointer. This patch makes the gl_name field correspond to a unique glock identifier. That will allow us to simplify hashing with a future patch, since the hash algorithm can then take the gl_name and hash its components in one operation. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 19 6月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
The glocks used for resource groups often come and go hundreds of thousands of times per second. Adding them to the lru list just adds unnecessary contention for the lru_lock spin_lock, especially considering we're almost certainly going to re-use the glock and take it back off the lru microseconds later. We never want the glock shrinker to cull them anyway. This patch adds a new bit in the glops that determines which glock types get put onto the lru list and which ones don't. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 03 6月, 2015 1 次提交
-
-
由 Abhi Das 提交于
This patch makes the quota subsystem only report once that a particular user/group has exceeded their allotted quota. Previously, it was possible for a program to continuously try exceeding quota (despite receiving EDQUOT) and in turn trigger gfs2 to issue a kernel log message about quota exceed. In theory, this could get out of hand and flood the log and the filesystem hosting the log files. Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 19 3月, 2015 2 次提交
-
-
由 Abhi Das 提交于
struct gfs2_alloc_parms is passed to gfs2_quota_check() and gfs2_inplace_reserve() with ap->target containing the number of blocks being requested for allocation in the current operation. We add a new field to struct gfs2_alloc_parms called 'allowed'. gfs2_quota_check() and gfs2_inplace_reserve() return the max blocks allowed by quota and the max blocks allowed by the chosen rgrp respectively in 'allowed'. A new field 'min_target', when non-zero, tells gfs2_quota_check() and gfs2_inplace_reserve() to not return -EDQUOT/-ENOSPC when there are atleast 'min_target' blocks allowable/available. The assumption is that the caller is ok with just 'min_target' blocks and will likely proceed with allocating them. Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Abhi Das 提交于
Use struct gfs2_alloc_parms as an argument to gfs2_quota_check() and gfs2_quota_lock_check() to check for quota violations while accounting for the new blocks requested by the current operation in ap->target. Previously, the number of new blocks requested during an operation were not accounted for during quota_check and would allow these operations to exceed quota. This was not very apparent since most operations allocated only 1 block at a time and quotas would get violated in the next operation. i.e. quota excess would only be by 1 block or so. With fallocate, (where we allocate a bunch of blocks at once) the quota excess is non-trivial and is addressed by this patch. Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 17 11月, 2014 1 次提交
-
-
由 Benjamin Marzinski 提交于
The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 04 11月, 2014 1 次提交
-
-
由 Bob Peterson 提交于
This patch tries to use the journal numbers to evenly distribute which node prefers which resource group for block allocations. This is to help performance. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 11 9月, 2014 1 次提交
-
-
由 Jan Kara 提交于
MAXQUOTAS value defines maximum number of quota types VFS supports. This isn't necessarily the number of types gfs2 supports and with addition of project quotas these two numbers stop matching. So make gfs2 use its private definition. CC: cluster-devel@redhat.com Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 03 6月, 2014 1 次提交
-
-
由 Bob Peterson 提交于
This patch uses a completion to prevent dlm's recovery process from referencing and trying to recover a journal before a journal has been opened. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 14 5月, 2014 1 次提交
-
-
由 Benjamin Marzinski 提交于
GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 31 3月, 2014 1 次提交
-
-
由 Abhi Das 提交于
When gfs2_create_inode() fails due to quota violation, the VFS inode is not completely uninitialized. This can cause a list corruption error. This patch correctly uninitializes the VFS inode when a quota violation occurs in the gfs2_create_inode codepath. Resolves: rhbz#1059808 Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 07 3月, 2014 1 次提交
-
-
由 Bob Peterson 提交于
If multiple nodes fail and their recovery work runs simultaneously, they would use the same unprotected variables in the superblock. For example, they would stomp on each other's revoked blocks lists, which resulted in file system metadata corruption. This patch moves the necessary variables so that each journal has its own separate area for tracking its journal replay. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 03 3月, 2014 1 次提交
-
-
由 Steven Whitehouse 提交于
This patch fixes a long standing issue in mapping the journal extents. Most journals will consist of only a single extent, and although the cache took account of that by merging extents, it did not actually map large extents, but instead was doing a block by block mapping. Since the journal was only being mapped on mount, this was not normally noticeable. With the updated code, it is now possible to use the same extent mapping system during journal recovery (which will be added in a later patch). This will allow checking of the integrity of the journal before any reply of the journal content is attempted. For this reason the code is moving to bmap.c, since it will be used more widely in due course. An exercise left for the reader is to compare the new function gfs2_map_journal_extents() with gfs2_write_alloc_required() Additionally, should there be a failure, the error reporting is also updated to show more detail about what went wrong. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 25 2月, 2014 2 次提交
-
-
由 Steven Whitehouse 提交于
Now we have a master transaction into which other transactions are merged, the accounting can be done using this master transaction. We no longer require the superblock fields which were being used for this function. In addition, this allows for a clean up in calc_reserved() making it rather easier understand. Also, by reducing the number of variables used to track the buffers being added and removed from the journal, a number of error checks are now no longer required. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 21 2月, 2014 1 次提交
-
-
由 Steven Whitehouse 提交于
A couple of "int" fields were being used as boolean values so we can make them bitfields of one bit, and put them in what might otherwise be a hole in the structure with 64 bit alignment. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 16 1月, 2014 1 次提交
-
-
由 Steven Whitehouse 提交于
Al Viro has tactfully pointed out that we are using the incorrect error code in some cases. This patch fixes that, and also removes the (unused) return value for glock dumping. > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since > we don't support STREAMS it's probably fair game, but... what the hell? Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
-
- 15 1月, 2014 1 次提交
-
-
由 Steven Whitehouse 提交于
Gradually, the global qd_lock is being used for less and less. After this patch it will only be used for the per super block list whose purpose is to allow syncing of changes back to the master quota file from the local quota changes file. Fixing up that process to make it more efficient will be the subject of a later patch, however this patch removes another barrier to doing that. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-