- 12 10月, 2018 4 次提交
-
-
由 Andreas Gruenbacher 提交于
This definition is only used to define RGRP_RSRV_MINBLKS, with no benefit over defining RGRP_RSRV_MINBLKS directly. In addition, instead of forcing RGRP_RSRV_MINBLKS to be of type u32, cast it to that type where that type is required. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Andreas Gruenbacher 提交于
Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv into the inode: they are more closely related to the inode than to a particular reservation. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Andreas Gruenbacher 提交于
We already have a function that checks if a block is within a resource group, so use that in gfs2_rbm_from_block as well. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Andreas Gruenbacher 提交于
When gfs2_rbm_from_block fails, the rbm it returns is undefined, so we always want to make sure gfs2_rbm_from_block has succeeded before looking at the rbm. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 06 10月, 2018 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, various errors and messages were reported using the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does not tell you which gfs2 mount had the problem, which is often vital to debugging. This patch changes the calls from pr_* to fs_* in most of the messages so that the file system id is printed along with the message. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 29 8月, 2018 2 次提交
-
-
由 Bob Peterson 提交于
The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when a rgrp buffer is valid. It's cleared when the glock is invalidated, signifying that the buffer data is now invalid. But before this patch, function update_rgrp_lvb was setting the flag when it determined it had a valid lvb. But that's an invalid assumption: just because you have a valid lvb doesn't mean you have valid buffers. After all, another node may have made the lvb valid, and this node just fetched it from the glock via dlm. Consider this scenario: 1. The file system is mounted with RGRPLVB option. 2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks to GL_SKIP, it skips the gfs2_rgrp_bh_get. 3. Since loops == 0 and the allocation target (ap->target) is bigger than the largest known chunk of blocks in the rgrp (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses the call to gfs2_rgrp_bh_get there as well. 4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due to this invalid assumption. 5. The next time update_rgrp_lvb is called, it sees the bit is set and just returns 0, assuming both the lvb and rgrp are both uptodate. But since this is a smaller allocation, or space has been freed by another node, thus adjusting the lvb values, it decides to use the rgrp for allocations, with invalid rd_free due to the fact it was never updated. This patch changes update_rgrp_lvb so it doesn't set the UPTODATE flag anymore. That way, it has no choice but to fetch the latest values. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Bob Peterson 提交于
Before this patch, gfs2_rgrp_bh_get would check for lvb mismatches, but it wouldn't tell you what was actually wrong. This patch adds more information to help us debug it. It also makes rgrp consistency checks dump any bad rgrps, and the rgrp dump code dump any lvbs as well as the rgrp itself. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 08 8月, 2018 1 次提交
-
-
由 Bob Peterson 提交于
Function update_rgrp_lvb_unlinked used to do the same thing as be32_add_cpu. This patch removes it in favor of using be32_add_cpu directly. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NAndrew Price <anprice@redhat.com>
-
- 07 8月, 2018 1 次提交
-
-
由 Bob Peterson 提交于
Function gfs2_testbit is called in three places. Two of those places, gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used by the block reservations scheme to determine the length of an extent of free blocks. Before this patch, it wasn't using the clone bitmap, which means recently-freed blocks were treated as free blocks for the purposes of an allocation. This patch adds a new parameter to gfs2_testbit to indicate whether or not the clone bitmaps should be used (if available). Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
- 27 7月, 2018 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch gfs2_rgrp_ondisk2lvb was called after every call to gfs2_rgrp_out. This patch just calls it directly from within gfs2_rgrp_out, and moves the function to be before it so we don't need a function prototype. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
- 25 7月, 2018 2 次提交
-
-
由 Bob Peterson 提交于
Before this patch, several functions in rgrp.c checked the value of rgd->rd_free_clone. That does not take into account blocks that were reserved by a multi-block reservation. This causes a problem when space gets tight in the file system. For example, when function gfs2_inplace_reserve checks to see if a rgrp has enough blocks to satisfy the request, it can accept a rgrp that it should reject because, although there are enough blocks to satisfy the request _now_, those blocks may be reserved for another running process. A second problem with this occurs when we've reserved the remaining blocks in an rgrp: function rg_mblk_search() can reject an rgrp improperly because it calculates: u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved; But rd_reserved includes blocks that the current process just reserved in its own call to inplace_reserve. For example, it can reserve the last 128 blocks of an rgrp, then reject that same rgrp because the above calculates out to free_blocks = 0; Consequences include, but are not limited to, (1) leaving holes, and thus increasing file system fragmentation, and (2) reporting file system is full long before it actually is. This patch introduces a new function, rgd_free, which returns the number of clone-free blocks (blocks that are truly free as opposed to blocks that are still being used because an unlinked file is still open) minus the number of blocks reserved by processes, but not counting the blocks we ourselves reserved (because obviously we need to allocate them). Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
由 Bob Peterson 提交于
Before this patch, you could get into situations like this: 1. Process 1 searches for X free blocks, finds them, makes a reservation 2. Process 2 searches for free blocks in the same rgrp, but now the bitmap is full because process 1's reservation is skipped over. So it marks the bitmap as GBF_FULL. 3. Process 1 tries to allocate blocks from its own reservation, but since the GBF_FULL bit is set, it skips over the rgrp and searches elsewhere, thus not using its own reservation. This patch adds an additional check to allow processes to use their own reservations. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
- 05 7月, 2018 2 次提交
-
-
由 Andreas Gruenbacher 提交于
GFS2 remembers the last rgrp used for allocations in ip->i_rgd. However, block allocations are made by way of a reservations structure, ip->i_res, which keeps the last rgrp in ip->i_res.rs_rgd, and ip->i_res is kept in sync with ip->i_res.rs_rgd, so it's redundant. Get rid of ip->i_rgd and just use ip->i_res.rs_rgd in its place. Based on patches by Robert Peterson. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andreas Gruenbacher 提交于
In the resource group list code, keep the last resource group added in the last position in the array. Check against that instead of messing with ip->i_rgd. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 21 6月, 2018 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, block reservations kept track of the inode number. At one point, that was a valid thing to do. However, since we made the reservation a part of the inode (rather than a pointer to a separate allocated object) the reservation can determine the inode number by using container_of. This saves us a little memory in our inode. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com> Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 04 6月, 2018 1 次提交
-
-
由 Bob Peterson 提交于
Function gfs2_free_extlen calculates the length of an extent of free blocks that may be reserved. The end pointer was calculated as end = start + bh->b_size but b_size is incorrect because the bitmap usually stops prior to the end of the buffer data on the last bitmap. What this means is that when you do a write, you can reserve a chunk of blocks that runs off the end of the last bitmap. For example, I've got a file system where there is only one bitmap for each rgrp, so ri_length==1. I saw cases in which iozone tried to do a big write, grabbed a large block reservation, chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188. So 5464153 + 8188 = 5472341 which is the end of the rgrp. When it grabbed a reservation it got back: 5470936, length 7229. But 5470936 + 7229 = 5478165. So the reservation starts inside the rgrp but runs 5824 blocks past the end of the bitmap. This patch fixes the calculation so it won't exceed the last bitmap. It also adds a BUG_ON to guard against overflows in the future. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 31 1月, 2018 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Some of the info, warning, and error messages are missing their trailing newline. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 23 1月, 2018 2 次提交
-
-
由 Bob Peterson 提交于
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
由 Bob Peterson 提交于
This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
-
- 17 1月, 2018 1 次提交
-
-
由 Steven Whitehouse 提交于
Document when to use gfs2_blk2rgrpd for "inexact" resource group matching. Based on that, fix an incorrect use of gfs2_blk2rgrpd in sweep_bh_for_rgrps. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 13 12月, 2017 3 次提交
-
-
由 Andrew Price 提交于
Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This allows us to check resource group headers' integrity and removes the requirement to check them against the rindex entries in fsck. If this field is found to be zero, it should be ignored (or updated with an accurate value). Signed-off-by: NAndrew Price <anprice@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andrew Price 提交于
Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields are identical to their counterparts in struct gfs2_rindex and are intended to reduce the use of the rindex. For now the fields are only written back as the in-memory equivalents in struct gfs2_rgrpd are set using values from the rindex. However, they are needed at this point so that userspace can make use of them, allowing a migration away from the rindex over time. The new fields take up previously reserved space which was explicitly zeroed on write so, in clusters with mixed kernels, these fields could get zeroed after being set and this should not be treated as an error. Signed-off-by: NAndrew Price <anprice@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andrew Price 提交于
Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The rg_skip field has the following meaning: - If rg_skip is zero, it is considered unset and not useful. - If rg_skip is non-zero, its value will be the number of blocks between this rgrp's address and the next rgrp's address. This can be used as a hint by fsck.gfs2 when rebuilding a bad rindex, for example. This will provide less dependency on the rindex in future, and allow tools such as fsck.gfs2 to iterate the resource groups without keeping the rindex around. The field is updated in gfs2_rgrp_out() so that existing file systems will have it set. This means that any resource groups that aren't ever written will not be updated. The final rgrp is a special case as there is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is extended). Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so the rg_skip field can get set back to 0 in cases where nodes with and without this patch are mixed in a cluster. In some cases, the field may bounce between being set by one node and then zeroed by another which may harm performance slightly, e.g. when two nodes create many small files. In testing this situation is rare but it becomes more likely as the filesystem fills up and there are fewer resource groups to choose from. The problem goes away when all nodes are running with this patch. Dipping into the space currently occupied by the rg_reserved field would have resulted in the same problem as it is also explicitly zeroed, so unfortunately there is no other way around it. Signed-off-by: NAndrew Price <anprice@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 28 11月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, function gfs2_free_di was 4 lines of code, and one of those lines was to call gfs2_free_uninit_di. Although unlikely, if function gfs2_free_uninit_di encountered an error finding the block to be freed, the error was silently ignored by the caller, which went ahead and improperly did a quota-change operation and meta_wipe despite the error. This patch combines the two functions into one to make the code more readable and fixes the bug by returning from the combined function before it takes those next incorrect steps. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 30 8月, 2017 1 次提交
-
-
由 Andreas Gruenbacher 提交于
The following cleanup is needed to avoid spilling the syslog with false warnings. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 09 8月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
This patch removes a call to gfs2_glock_add_to_lru from function gfs2_clear_rgrpd. The call is just a waste of time because as soon as it adds it to the lru_list, the call to gfs2_glock_put takes it back off again. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 05 7月, 2017 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Put all remaining accesses to gl->gl_object under the gl->gl_lockref.lock spinlock to prevent races. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 19 4月, 2017 1 次提交
-
-
由 Bob Peterson 提交于
Implement truncate/delete as a non-recursive algorithm. The older algorithm was implemented with recursion to strip off each layer at a time (going by height, starting with the maximum height. This version tries to do the same thing but without recursion, and without needing to allocate new structures or lists in memory. For example, say you want to truncate a very large file to 1 byte, and its end-of-file metapath is: 0.505.463.428. The starting metapath would be 0.0.0.0. Since it's a truncate to non-zero, it needs to preserve that byte, and all metadata pointing to it. So it would start at 0.0.0.0, look up all its metadata buffers, then free all data blocks pointed to at the highest level. After that buffer is "swept", it moves on to 0.0.0.1, then 0.0.0.2, etc., reading in buffers and sweeping them clean. When it gets to the end of the 0.0.0 metadata buffer (for 4K blocks the last valid one is 0.0.0.508), it backs up to the previous height and starts working on 0.0.1.0, then 0.0.1.1, and so forth. After it reaches the end and sweeps 0.0.1.508, it continues with 0.0.2.0, and so on. When that height is exhausted, and it reaches 0.0.508.508 it backs up another level, to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep marching backwards and forwards through the metadata until it's all swept clean. Once it has all the data blocks freed, it lowers the strip height, and begins the process all over again, but with one less height. This time it sweeps 0.0.0 through 0.505.463. When that's clean, it lowers the strip height again and works to free 0.505. Eventually it strips the lowest height, 0. For a delete or truncate to 0, all metadata for all heights of 0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would be preserved. This isn't much different from normal integer incrementing, where an integer gets incremented from 0000 (0.0.0.0) to 3021 (3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009, then on to 0010, 0011 up to 0099, then 0100 and so forth. It's just that each "digit" goes from 0 to 508 (for a total of 509 pointers) rather than from 0 to 9. Note that the dinode will only have 483 pointers due to the dinode structure itself. Also note: this is just an example. These numbers (509 and 483) are based on a standard 4K block size. Smaller block sizes will yield smaller numbers of indirect pointers accordingly. The truncation process is accomplished with the help of two major functions and a few helper functions. Functions do_strip and recursive_scan are obsolete, so removed. New function sweep_bh_for_rgrps cleans a buffer_head pointed to by the given metapath and height. By cleaning, I mean it frees all blocks starting at the offset passed in metapath. It starts at the first block in the buffer pointed to by the metapath and identifies its resource group (rgrp). From there it frees all subsequent block pointers that lie within that rgrp. If it's already inside a transaction, it stays within it as long as it can. In other words, it doesn't close a transaction until it knows it's freed what it can from the resource group. In this way, multiple buffers may be cleaned in a single transaction, as long as those blocks in the buffer all lie within the same rgrp. If it's not in a transaction, it starts one. If the buffer_head has references to blocks within multiple rgrps, it frees all the blocks inside the first rgrp it finds, then closes the transaction. Then it repeats the cycle: identifies the next unfreed block, uses it to find its rgrp, then starts a new transaction for that set. It repeats this process repeatedly until the buffer_head contains no more references to any blocks past the given metapath. Function trunc_dealloc has been reworked into a finite state automaton. It has basically 3 active states: DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP: The DEALLOC_MP_FULL state implies the metapath has a full set of buffers out to the "shrink height", and therefore, it can call function sweep_bh_for_rgrps to free the blocks within the highest height of the metapath. If it's just swept the lowest level (or an error has occurred) the state machine is ended. Otherwise it proceeds to the DEALLOC_MP_LOWER state. The DEALLOC_MP_LOWER state implies we are finished with a given buffer_head, which may now be released, and therefore we are then missing some buffer information from the metapath. So we need to find more buffers to read in. In most cases, this is just a matter of releasing the buffer_head and moving to the next pointer from the previous height, so it may be read in and swept as well. If it can't find another non-null pointer to process, it checks whether it's reached the end of a height and needs to lower the strip height, or whether it still needs move forward through the previous height's metadata. In this state, all zero-pointers are skipped. From this state, it can only loop around (once more backing up another height) or, once a valid metapath is found (one that has non-zero pointers), proceed to state DEALLOC_FILL_MP. The DEALLOC_FILL_MP state implies that we have a metapath but not all its buffers are read in. So we must proceed to read in buffer_heads until the metapath has a valid buffer for every height. If the previous state backed us up 3 heights, we may need to read in a buffer, increment the height, then repeat the process until buffers have been read in for all required heights. If it's successful reading a buffer, and it's at the highest height we need, it proceeds back to the DEALLOC_MP_FULL state. If it's unable to fill in a buffer, (encounters a hole, etc.) it tries to find another non-zero block pointer. If they're all zero, it lowers the height and returns to the DEALLOC_MP_LOWER state. If it finds a good non-null pointer, it loops around and reads it in, while keeping the metapath in lock-step with the pointers it examines. The state machine runs until the truncation request is satisfied. Then any transactions are ended, the quota and statfs data are updated, and the function is complete. Helper function metaptr1 was introduced to be an easy way to determine the start of a buffer_head's indirect pointers. Helper function lookup_mp_height was introduced to find a metapath index and read in the buffer that corresponds to it. In this way, function lookup_metapath becomes a simple loop to call it for every height. Helper function fillup_metapath is similar to lookup_metapath except it can do partial lookups. If the state machine backed up multiple levels (like 2999 wrapping to 3000) it needs to find out the next starting point and start issuing metadata reads at that point. Helper function hptrs is a shortcut to determine how many pointers should be expected in a buffer. Height 0 is the dinode which has fewer pointers than the others. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 13 7月, 2016 1 次提交
-
-
由 Bob Peterson 提交于
For the last process to close a file opened for write, function gfs2_rsqa_delete was deleting the file's inode's block reservation out of the rgrp reservations tree. Then it was checking to make sure rs_free was 0, but it was performing the check outside the protection of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed to prevent a race between the process freeing the reservation and another who is allocating a new set of blocks inside the same rgrp for the same inode, thus changing its value. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 27 6月, 2016 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Make the code more readable by cleaning up the different ways of initializing lock holders and checking for initialized lock holders: mark lock holders as uninitialized by setting the holder's glock to NULL (gfs2_holder_mark_uninitialized) instead of zeroing out the entire object or using a separate flag. Recognize initialized holders by their non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects which are immeditiately initialized via gfs2_holder_init or gfs2_glock_nq_init. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 10 6月, 2016 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, function read_rindex_entry would set a rgrp glock's gl_object pointer to itself before inserting the rgrp into the rgrp rbtree. The problem is: if another process was also reading the rgrp in, and had already inserted its newly created rgrp, then the second call to read_rindex_entry would overwrite that value, then return a bad return code to the caller. Later, other functions would reference the now-freed rgrp memory by way of gl_object. In some cases, that could result in gfs2_rgrp_brelse being called twice for the same rgrp: once for the failed attempt and once for the "real" rgrp release. Eventually the kernel would panic. There are also a number of other things that could go wrong when a kernel module is accessing freed storage. For example, this could result in rgrp corruption because the fake rgrp would point to a fake bitmap in memory too, causing gfs2_inplace_reserve to search some random memory for free blocks, and find some, since we were never setting rgd->rd_bits to NULL before freeing it. This patch fixes the problem by not setting gl_object until we have successfully inserted the rgrp into the rbtree. Also, it sets rd_bits to NULL as it frees them, which will ensure any accidental access to the wrong rgrp will result in a kernel panic rather than file system corruption, which is preferred. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 02 5月, 2016 1 次提交
-
-
由 Bob Peterson 提交于
Struct gfs2_alloc_parms ap is never referenced in function gfs2_rbm_find, so this patch removes it. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 12月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, when function try_rgrp_unlink queued a glock for delete_work to reclaim the space, it used the inode glock to do so. That's different from the iopen callback which uses the iopen glock for the same purpose. We should be consistent and always use the iopen glock. This may also save us reference counting problems with the inode glock, since clear_glock does an extra glock_put() for the inode glock. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 15 12月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
Before this patch, multi-block reservation structures were allocated from a special slab. This patch folds the structure into the gfs2_inode structure. The disadvantage is that the gfs2_inode needs more memory, even when a file is opened read-only. The advantages are: (a) we don't need the special slab and the extra time it takes to allocate and deallocate from it. (b) we no longer need to worry that the structure exists for things like quota management. (c) This also allows us to remove the calls to get_write_access and put_write_access since we know the structure will exist. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 24 11月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
This patch basically reverts the majority of patch 5407e242. That patch eliminated the gfs2_qadata structure in favor of just using the reservations structure. The problem with doing that is that it increases the size of the reservations structure. That is not an issue until it comes time to fold the reservations structure into the inode in memory so we know it's always there. By separating out the quota structure again, we aren't punishing the non-quota users by making all the inodes bigger, requiring more slab space. This patch creates a new slab area to allocate the quota stuff so it's managed a little more sanely. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 17 11月, 2015 1 次提交
-
-
由 Andreas Gruenbacher 提交于
When gfs2 allocates an inode and its extended attribute block next to each other at inode create time, the inode's directory entry indicates that in de_rahead. In that case, we can readahead the extended attribute block when we read in the inode. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 09 11月, 2015 1 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a bug introduced by commit 7005c3e4. That patch tries to map a vm range for resource groups, but the calculation breaks down when the block size is less than the page size. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 30 10月, 2015 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in gl_lockref). Remove that define to make the references to gl_lockref.lock more obvious. Signed-off-by: NAndreas Gruenbacher <andreas.gruenbacher@gmail.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-