- 12 6月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
For whatever reason we never actually update pagi_count (the in-core perag inode count) when we allocate or free inode chunks. Online scrub is going to use it, so we need to fix the accounting. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 09 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
New verification functions like xfs_verify_fsbno() and xfs_verify_agino() are spread across multiple files and different header files. They really don't fit cleanly into the places they've been put, and have wider scope than the current header includes. Move the type verifiers to a new file in libxfs (xfs-types.c) and the prototypes to xfs_types.h where they will be visible to all the code that uses the types. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 07 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 06 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
So we don't check the validity of records as we walk the btree. When there are corrupt records in the free space btree (e.g. zero startblock/length or beyond EOAG) we just blindly use it and things go bad from there. That leads to assert failures on debug kernels like this: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 .... Call Trace: xfs_alloc_fixup_trees+0x368/0x5c0 xfs_alloc_ag_vextent_near+0x79a/0xe20 xfs_alloc_ag_vextent+0x1d3/0x330 xfs_alloc_vextent+0x5e9/0x870 Or crashes like this: XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 ..... BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 .... Call Trace: xfs_bmap_add_extent_hole_real+0x67d/0x930 xfs_bmapi_write+0x934/0xc90 xfs_da_grow_inode_int+0x27e/0x2f0 xfs_dir2_grow_inode+0x55/0x130 xfs_dir2_sf_to_block+0x94/0x5d0 xfs_dir2_sf_addname+0xd0/0x590 xfs_dir_createname+0x168/0x1a0 xfs_rename+0x658/0x9b0 By checking that free space records pulled from the trees are within the valid range, we catch many of these corruptions before they can do damage. This is a generic btree record checking deficiency. We need to validate the records we fetch from all the different btrees before we use them to catch corruptions like this. This patch results in a corrupt record emitting an error message and returning -EFSCORRUPTED, and the higher layers catch that and abort: XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! XFS (loop0): start block 0x0 block count 0x0 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 ..... Call Trace: dump_stack+0x85/0xcb xfs_trans_cancel+0x19f/0x1c0 xfs_create+0x42a/0x670 xfs_generic_create+0x1f6/0x2c0 vfs_create+0xf9/0x180 do_mknodat+0x1f9/0x210 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe ..... XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 04 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
The heads of tha AGI unlinked list are only scanned on debug kernels when the verifier runs. Change that to always scan the heads and validate that the inode numbers are valid. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 16 5月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Expose various helpers that the repair code will want to use. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 10 4月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 29 1月, 2018 1 次提交
-
-
由 Carlos Maiolino 提交于
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 18 1月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Add a couple of functions to the inode btrees that will be used to cross-reference metadata against the inobt. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 09 1月, 2018 4 次提交
-
-
由 Darrick J. Wong 提交于
Expose all metadata structure buffer verifier functions via buf_ops. These will be used by the online scrub mechanism to look for problems with buffers that are already sitting around in memory. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Refactor the callers of verifiers to print the instruction address of a failing check. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Modify each function that checks the contents of a metadata buffer to return the instruction address of the failing test so that we can report more precise failure errors to the log. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 09 12月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
If we create a new file we will need an inode, and usually some metadata in the parent direction. Aiming for everything to go well despite the lack of a reservation leads to dirty transactions cancelled under a heavy create/delete load. This patch removes those nospace transactions, which will lead to slightly earlier ENOSPC on some workloads, but instead prevent file system shutdowns due to cancelling dirty transactions for others. A customer could observe assertations failures and shutdowns due to cancelation of dirty transactions during heavy NFS workloads as shown below: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]--- 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s) Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 02 11月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 27 10月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Create some helper functions to check that inode pointers point to somewhere within the filesystem and not at the static AG metadata. Move xfs_internal_inum and create a directory inode check function. We will use these functions in scrub and elsewhere. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 12 10月, 2017 1 次提交
-
-
由 Thomas Meyer 提交于
Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: NThomas Meyer <thomas@m3y3r.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 02 9月, 2017 1 次提交
-
-
由 Brian Foster 提交于
Ordered buffers are attached to transactions and pushed through the logging infrastructure just like normal buffers with the exception that they are not actually written to the log. Therefore, we don't need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is called on ordered buffers to set up all of the dirty state on the transaction, buffer and log item and prepare the buffer for I/O. Now that xfs_trans_dirty_buf() is available, call it from xfs_trans_ordered_buf() so the latter is now mutually exclusive with xfs_trans_log_buf(). This reflects the implementation of ordered buffers and helps eliminate confusion over the need to log ranges of ordered buffers just to set up internal log state. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 23 8月, 2017 1 次提交
-
-
由 Carlos Maiolino 提交于
In a filesystem without finobt, the Space manager selects an AG to alloc a new inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk. When the new inode is in the same AG as its parent, the btree will be searched starting on the parent's record, and then retried from the top if no slot is available beyond the parent's record. To exit this loop though, xfs_dialloc_ag_inobt() relies on the fact that the btree must have a free slot available, once its callers relied on the agi->freecount when deciding how/where to allocate this new inode. In the case when the agi->freecount is corrupted, showing available inodes in an AG, when in fact there is none, this becomes an infinite loop. Add a way to stop the loop when a free slot is not found in the btree, making the function to fall into the whole AG scan which will then, be able to detect the corruption and shut the filesystem down. As pointed by Brian, this might impact performance, giving the fact we don't reset the search distance anymore when we reach the end of the tree, giving it fewer tries before falling back to the whole AG search, but it will only affect searches that start within 10 records to the end of the tree. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 8月, 2017 1 次提交
-
-
由 Omar Sandoval 提交于
When we try to allocate a free inode by searching the inobt, we try to find the inode nearest the parent inode by searching chunks both left and right of the chunk containing the parent. As an optimization, we cache the leftmost and rightmost records that we previously searched; if we do another allocation with the same parent inode, we'll pick up the search where it last left off. There's a bug in the case where we found a free inode to the left of the parent's chunk: we need to update the cached left and right records, but because we already reassigned the right record to point to the left, we end up assigning the left record to both the cached left and right records. This isn't a correctness problem strictly, but it can result in the next allocation rechecking chunks unnecessarily or allocating inodes further away from the parent than it needs to. Fix it by swapping the record pointer after we update the cached left and right records. Fixes: bd169565 ("xfs: speed up free inode search") Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 28 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-
- 20 6月, 2017 2 次提交
-
-
由 Darrick J. Wong 提交于
Create a function to extract an in-core inobt record from a generic btree_rec union so that scrub will be able to check inobt records and check inode block alignment. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 17 2月, 2017 1 次提交
-
-
由 Chandan Rajendra 提交于
On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace, XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026 kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries Modules linked in: CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58 task: c000000102606d80 task.stack: c0000001026b8000 NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290 REGS: c0000001026bb090 TRAP: 0700 Not tainted (4.10.0-rc5) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28004428 XER: 00000000 CFAR: c0000000004ef180 SOFTE: 1 GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0 GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990 GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000 GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0 GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001 GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000 GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0 NIP [c0000000004ef798] .assfail+0x28/0x30 LR [c0000000004ef798] .assfail+0x28/0x30 Call Trace: [c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable) [c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0 [c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480 [c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90 [c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820 [c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320 [c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610 [c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0 [c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790 [c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410 [c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230 [c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120 [c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc Instruction dump: 4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78 38600000 f8010010 f821ff91 4bfff94d <0fe00000> 60000000 7c0802a6 7c892378 When block size is larger than inode cluster size, the call to XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs would have set xfs_sb->sb_inoalignmt to 0. This causes xfs_ialloc_cluster_alignment() to return 0. Due to this args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned equivalent of -1 assigned to it. This later causes alloc_len in xfs_alloc_space_available() to have a value of 0. In such a scenario when args.total is also 0, the assert statement "ASSERT(args->maxlen > 0);" fails. This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb(). Suggested-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 05 12月, 2016 2 次提交
-
-
由 Darrick J. Wong 提交于
There is no such thing as a zero-level AG btree since even a single-node zero-records btree has one level. Btree cursor constructors read cur_nlevels straight from disk and then access things like cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero! Therefore, strengthen the verifiers to prevent this possibility. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
We've missed properly setting the buffer type for an AGI transaction in 3 spots now, so just move it into xfs_read_agi() and set it if we are in a transaction to avoid the problem in the future. This is similar to how it is done in i.e. the dir3 and attr3 read functions. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 08 11月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Since xfsprogs dropped ushort in favor of unsigned short, do that here too. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 03 8月, 2016 4 次提交
-
-
由 Darrick J. Wong 提交于
For the rmap btree to work, we have to feed the extent owner information to the the allocation and freeing functions. This information is what will end up in the rmap btree that tracks allocated extents. While we technically don't need the owner information when freeing extents, passing it allows us to validate that the extent we are removing from the rmap btree actually belonged to the owner we expected it to belong to. We also define a special set of owner values for internal metadata that would otherwise have no owner. This allows us to tell the difference between metadata owned by different per-ag btrees, as well as static fs metadata (e.g. AG headers) and internal journal blocks. There are also a couple of special cases we need to take care of - during EFI recovery, we don't actually know who the original owner was, so we need to pass a wildcard to indicate that we aren't checking the owner for validity. We also need special handling in growfs, as we "free" the space in the last AG when extending it, but because it's new space it has no actual owner... While touching the xfs_bmap_add_free() function, re-order the parameters to put the struct xfs_mount first. Extend the owner field to include both the owner type and some sort of index within the owner. The index field will be used to support reverse mappings when reflink is enabled. When we're freeing extents from an EFI, we don't have the owner information available (rmap updates have their own redo items). xfs_free_extent therefore doesn't need to do an rmap update. Make sure that the log replay code signals this correctly. This is based upon a patch originally from Dave Chinner. It has been extended to add more owner information with the intent of helping recovery operations when things go wrong (e.g. offset of user data block in a file). [dchinner: de-shout the xfs_rmap_*_owner helpers] [darrick: minor style fixes suggested by Christoph Hellwig] Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Mechanical change of flist/free_list to dfops, since they're now deferred ops, not just a freeing list. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Drop the compatibility shims that we were using to integrate the new deferred operation mechanism into the existing code. No new code. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Restructure everything that used xfs_bmap_free to use xfs_defer_ops instead. For now we'll just remove the old symbols and play some cpp magic to make it work; in the next patch we'll actually rename everything. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 21 6月, 2016 2 次提交
-
-
由 Darrick J. Wong 提交于
Create a common function to calculate the maximum height of a per-AG btree. This will eventually be used by the rmapbt and refcountbt code to calculate appropriate maxlevels values for each. This is important because the verifiers and the transaction block reservations depend on accurate estimates of how many blocks are needed to satisfy a btree split. We were mistakenly using the max bnobt height for all the btrees, which creates a dangerous situation since the larger records and keys in an rmapbt make it very possible that the rmapbt will be taller than the bnobt and so we can run out of transaction block reservation. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
This is already in xfsprogs' libxfs, so port it to the kernel. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 07 3月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Commit 88740da18[1] introduced a function to compute the maximum height of the inode btree back in 1994. Back then, apparently, the freespace and inode btrees shared the same geometry; however, it has long since been the case that the inode and freespace btrees have different record and key sizes. Therefore, we must use m_inobt_mnr if we want a correct calculation/log reservation/etc. (Yes, this bug has been around for 21 years and ten months.) (Yes, I was in middle school when this bug was committed.) [1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=88740da18ddd9d7ba3ebaa9502fefc6ef2fd19cdHistorical-research-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 04 1月, 2016 1 次提交
-
-
由 Eric Sandeen 提交于
This adds a name to each buf_ops structure, so that if a verifier fails we can print the type of verifier that failed it. Should be a slight debugging aid, I hope. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 12 10月, 2015 1 次提交
-
-
由 Brian Foster 提交于
Since the onset of v5 superblocks, the LSN of the last modification has been included in a variety of on-disk data structures. This LSN is used to provide log recovery ordering guarantees (e.g., to ensure an older log recovery item is not replayed over a newer target data structure). While this works correctly from the point a filesystem is formatted and mounted, userspace tools have some problematic behaviors that defeat this mechanism. For example, xfs_repair historically zeroes out the log unconditionally (regardless of whether corruption is detected). If this occurs, the LSN of the filesystem is reset and the log is now in a problematic state with respect to on-disk metadata structures that might have a larger LSN. Until either the log catches up to the highest previously used metadata LSN or each affected data structure is modified and written out without incident (which resets the metadata LSN), log recovery is susceptible to filesystem corruption. This problem is ultimately addressed and repaired in the associated userspace tools. The kernel is still responsible to detect the problem and notify the user that something is wrong. Check the superblock LSN at mount time and fail the mount if it is invalid. From that point on, trigger verifier failure on any metadata I/O where an invalid LSN is detected. This results in a filesystem shutdown and guarantees that we do not log metadata changes with invalid LSNs on disk. Since this is a known issue with a known recovery path, present a warning to instruct the user how to recover. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 19 8月, 2015 1 次提交
-
-
由 Brian Foster 提交于
The btree cursor cleanup function takes an error parameter that affects how buffers are released from the cursor. All buffers are released in the event of error. Several callers do not specify the XFS_BTREE_ERROR flag in the event of error, however. This can cause buffers to hang around locked or with an elevated hold count and thus lead to umount hangs in the event of errors. Fix up the xfs_btree_del_cursor() callers to pass XFS_BTREE_ERROR if the cursor is being torn down due to error. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 29 7月, 2015 1 次提交
-
-
由 Eric Sandeen 提交于
This adds a new superblock field, sb_meta_uuid. If set, along with a new incompat flag, the code will use that field on a V5 filesystem to compare to metadata UUIDs, which allows us to change the user- visible UUID at will. Userspace handles the setting and clearing of the incompat flag as appropriate, as the UUID gets changed; i.e. setting the user-visible UUID back to the original UUID (as stored in the new field) will remove the incompatible feature flag. If the incompat flag is not set, this copies the user-visible UUID into into the meta_uuid slot in memory when the superblock is read from disk; the meta_uuid field is not written back to disk in this case. The remainder of this patch simply switches verifiers, initializers, etc to use the new sb_meta_uuid field. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 04 6月, 2015 1 次提交
-
-
由 Brian Foster 提交于
The inode allocator enables random sparse inode chunk allocations in DEBUG mode to facilitate testing. Sparse inode allocations are not always possible, however, depending on the fs geometry. For example, there is no possibility for a sparse inode allocation on filesystems where the block size is large enough to fit one or more inode chunks within a single block. Fix up the DEBUG mode sparse inode allocation logic to trigger random sparse allocations only when the geometry of the fs allows it. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 29 5月, 2015 1 次提交
-
-
由 Brian Foster 提交于
xfs_ifree_cluster() is called to mark all in-memory inodes and inode buffers as stale. This occurs after we've removed the inobt records and dropped any references of inobt data. xfs_ifree_cluster() uses the starting inode number to walk the namespace of inodes expected for a single chunk a cluster buffer at a time. The cluster buffer disk addresses are calculated by decoding the sequential inode numbers expected from the chunk. The problem with this approach is that if the inode chunk being removed is a sparse chunk, not all of the buffer addresses that are calculated as part of this sequence may be inode clusters. Attempting to acquire the buffer based on expected inode characterstics (i.e., cluster length) can lead to errors and is generally incorrect. We already use a couple variables to carry requisite state from xfs_difree() to xfs_ifree_cluster(). Rather than add a third, define a new internal structure to carry the existing parameters through these functions. Add an alloc field that represents the physical allocation bitmap of inodes in the chunk being removed. Modify xfs_ifree_cluster() to check each inode against the bitmap and skip the clusters that were never allocated as real inodes on disk. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-