提交 · 52addcf9d6669fa439387610bc65c92fa0980cef · openeuler / Kernel

04 8月, 2014 2 次提交

xfs: dquot recovery needs verifiers · ad3714b8

由 Dave Chinner 提交于 8月 04, 2014

dquot recovery should add verifiers to the dquot buffers that it
recovers changes into. Unfortunately, it doesn't attached the
verifiers to the buffers in a consistent manner. For example,
xlog_recover_dquot_pass2() reads dquot buffers without a verifier
and then writes it without ever having attached a verifier to the
buffer.

Further, dquot buffer recovery may write a dquot buffer that has not
been modified, or indeed, shoul dbe written because quotas are not
enabled and hence changes to the buffer were not replayed. In this
case, we again write buffers without verifiers attached because that
doesn't happen until after the buffer changes have been replayed.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

ad3714b8

xfs: ensure verifiers are attached to recovered buffers · 67dc288c

由 Dave Chinner 提交于 8月 04, 2014

Crash testing of CRC enabled filesystems has resulted in a number of
reports of bad CRCs being detected after the filesystem was mounted.
Errors such as the following were being seen:

XFS (sdb3): Mounting V5 Filesystem
XFS (sdb3): Starting recovery (logdev: internal)
XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
XFS (sdb3): Unmount and run xfs_repair
XFS (sdb3): First 64 bytes of corrupted metadata buffer:
ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40  XAGF...........@
ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01  ..mS..w.........
ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03  ................
ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00  ................
XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1

The errors were typically being seen in AGF, AGI and their related
btree block buffers some time after log recovery had run. Often it
wasn't until later subsequent mounts that the problem was
discovered. The common symptom was a buffer with the correct
contents, but a CRC and an LSN that matched an older version of the
contents.

Some debug added to _xfs_buf_ioapply() indicated that buffers were
being written without verifiers attached to them from log recovery,
and Jan Kara isolated the cause to log recovery readahead an dit's
interactions with buffers that had a more recent LSN on disk than
the transaction being recovered. In this case, the buffer did not
get a verifier attached, and os when the second phase of log
recovery ran and recovered EFIs and unlinked inodes, the buffers
were modified and written without the verifier running. Hence they
had up to date contents, but stale LSNs and CRCs.

Fix it by attaching verifiers to buffers we skip due to future LSN
values so they don't escape into the buffer cache without the
correct verifier attached.

This patch is based on analysis and a patch from Jan Kara.

cc: <stable@vger.kernel.org>
Reported-by: NJan Kara <jack@suse.cz>
Reported-by: NFanael Linithien <fanael4@gmail.com>
Reported-by: NGrozdan <neutrino8@gmail.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

67dc288c

25 6月, 2014 1 次提交

xfs: global error sign conversion · 2451337d

由 Dave Chinner 提交于 6月 25, 2014

Convert all the errors the core XFs code to negative error signs
like the rest of the kernel and remove all the sign conversion we
do in the interface layers.

Errors for conversion (and comparison) found via searches like:

$ git grep " E" fs/xfs
$ git grep "return E" fs/xfs
$ git grep " E[A-Z].*;$" fs/xfs

Negation points found via searches like:

$ git grep "= -[a-z,A-Z]" fs/xfs
$ git grep "return -[a-z,A-D,F-Z]" fs/xfs
$ git grep " -[a-z].*;" fs/xfs

[ with some bits I missed from Brian Foster ]
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

2451337d

22 6月, 2014 2 次提交

xfs: Nuke XFS_ERROR macro · b474c7ae

由 Eric Sandeen 提交于 6月 22, 2014

XFS_ERROR was designed long ago to trap return values, but it's not
runtime configurable, it's not consistently used, and we can do
similar error trapping with ftrace scripts and triggers from
userspace.

Just nuke XFS_ERROR and associated bits.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

b474c7ae

xfs: return is not a function · d99831ff

由 Eric Sandeen 提交于 6月 22, 2014

return is not a function.  "return(EIO);" is silly;
"return (EIO);" moreso.  return is not a function.
Nuke the pointless parens.

[dchinner: catch a couple of extra cases in xfs_attr_list.c,
xfs_acl.c and xfs_linux.h.]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

d99831ff

24 4月, 2014 1 次提交

xfs: support the XFS_BTNUM_FINOBT free inode btree type · aafc3c24

由 Brian Foster 提交于 4月 24, 2014

Define the AGI fields for the finobt root/level and add magic
numbers. Update the btree code to add support for the new
XFS_BTNUM_FINOBT inode btree.

The finobt root block is reserved immediately following the inobt
root block in the AG. Update XFS_PREALLOC_BLOCKS() to determine the
starting AG data block based on whether finobt support is enabled.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

aafc3c24

14 4月, 2014 2 次提交

xfs: remove unused trans pointer arg from xlog_recover_unmount_trans() · bbe4c668

由 Eric Sandeen 提交于 4月 14, 2014

Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

bbe4c668

xfs: remove unused ail pointer arg from xfs_trans_ail_cursor_done() · e4a1e29c

由 Eric Sandeen 提交于 4月 14, 2014

Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

e4a1e29c

17 12月, 2013 1 次提交

xfs: remove xfsbdstrat error · 83a0adc3

由 Christoph Hellwig 提交于 12月 17, 2013

The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem.  Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.

Remove xfsbdstrat and opencode the error handling in the callers.  All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

83a0adc3

13 12月, 2013 3 次提交

xfs: get rid of XFS_IALLOC_BLOCKS macros · 126cd105

由 Jie Liu 提交于 12月 13, 2013

Get rid of XFS_IALLOC_BLOCKS() marcos, use mp->m_ialloc_blks directly.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

126cd105

xfs: get rid of XFS_INODE_CLUSTER_SIZE macros · 0f49efd8

由 Jie Liu 提交于 12月 13, 2013

Get rid of XFS_INODE_CLUSTER_SIZE() macros, use mp->m_inode_cluster_size
directly.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

0f49efd8

xfs: get rid of XFS_IALLOC_INODES macros · 71783438

由 Jie Liu 提交于 12月 13, 2013

Get rid of XFS_IALLOC_INODES() marcos, use mp->m_ialloc_inos directly.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

71783438

06 12月, 2013 1 次提交

xfs: free the list of recovery items on error · 2a84108f

由 Mark Tinguely 提交于 10月 02, 2013

Recovery builds a list of items on the transaction's
r_itemq head. Normally these items are committed and freed.
But in the event of a recovery error, these allocations
are leaked.

If the error occurs during item reordering, then reconstruct
the r_itemq list before deleting the list to avoid leaking
the entries that were on one of the temporary lists.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

2a84108f

24 10月, 2013 5 次提交

xfs: decouple inode and bmap btree header files · a4fbe6ab

由 Dave Chinner 提交于 10月 23, 2013

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

a4fbe6ab

xfs: decouple log and transaction headers · 239880ef

由 Dave Chinner 提交于 10月 23, 2013

xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

239880ef

xfs: split dquot buffer operations out · 9aede1d8

由 Dave Chinner 提交于 10月 15, 2013

Parts of userspace want to be able to read and modify dquot buffers
(e.g. xfs_db) so we need to split out the reading and writing of
these buffers so it is easy to shared code with libxfs in userspace.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

9aede1d8

xfs: unify directory/attribute format definitions · 57062787

由 Dave Chinner 提交于 10月 15, 2013

The on-disk format definitions for the directory and attribute
structures are spread across 3 header files right now, only one of
which is dedicated to defining on-disk structures and their
manipulation (xfs_dir2_format.h). Pull all the format definitions
into a single header file - xfs_da_format.h - and switch all the
code over to point at that.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

57062787

xfs: create a shared header file for format-related information · 70a9883c

由 Dave Chinner 提交于 10月 23, 2013

All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

70a9883c

18 10月, 2013 1 次提交

xfs: remove newlines from strings passed to __xfs_printk · 08e96e1a

由 Eric Sandeen 提交于 10月 11, 2013

__xfs_printk adds its own "\n".  Having it in the original string
leads to unintentional blank lines from these messages.

Most format strings have no newline, but a few do, leading to
i.e.:

[ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119911] 
[ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119919] 

Fix them all.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

08e96e1a

05 10月, 2013 2 次提交

xfs: Use kmem_free() instead of free() · b2a42f78

由 Thierry Reding 提交于 10月 01, 2013

This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit aaaae980)

b2a42f78

xfs: fix memory leak in xlog_recover_add_to_trans · 9b3b77fe

由 tinguely@sgi.com 提交于 9月 27, 2013

Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit 519ccb81)

9b3b77fe

01 10月, 2013 2 次提交

xfs: Use kmem_free() instead of free() · aaaae980

由 Thierry Reding 提交于 10月 01, 2013

This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

aaaae980

xfs: fix memory leak in xlog_recover_add_to_trans · 519ccb81

由 tinguely@sgi.com 提交于 9月 27, 2013

Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

519ccb81

25 9月, 2013 1 次提交

xfs: log recovery lsn ordering needs uuid check · 566055d3

由 Dave Chinner 提交于 9月 24, 2013

After a fair number of xfstests runs, xfs/182 started to fail
regularly with a corrupted directory - a directory read verifier was
failing after recovery because it found a block with a XARM magic
number (remote attribute block) rather than a directory data block.

The first time I saw this repeated failure I did /something/ and the
problem went away, so I was never able to find the underlying
problem. Test xfs/182 failed again today, and I found the root
cause before I did /something else/ that made it go away.

Tracing indicated that the block in question was being correctly
logged, the log was being flushed by sync, but the buffer was not
being written back before the shutdown occurred. Tracing also
indicated that log recovery was also reading the block, but then
never writing it before log recovery invalidated the cache,
indicating that it was not modified by log recovery.

More detailed analysis of the corpse indicated that the filesystem
had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
indicated it was a block from an older filesystem. The reason that
log recovery didn't replay it was that the LSN in the XARM block was
larger than the LSN of the transaction being replayed, and so the
block was not overwritten by log recovery.

Hence, log recovery cant blindly trust the magic number and LSN in
the block - it must verify that it belongs to the filesystem being
recovered before using the LSN. i.e. if the UUIDs don't match, we
need to unconditionally recovery the change held in the log.

This patch was first tested on a block device that was repeatedly
causing xfs/182 to fail with the same failure on the same block with
the same directory read corruption signature (i.e. XARM block). It
did not fail, and hasn't failed since.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

566055d3

12 9月, 2013 1 次提交

xfs: remove dead code from xlog_recover_inode_pass2 · 08474ed6

由 Mark Tinguely 提交于 9月 12, 2013

Additional code in the error handler of xlog_recover_inode_pass2()
results in the following error:

static checker warning: "fs/xfs/xfs_log_recover.c:2999
xlog_recover_inode_pass2()
	 info: ignoring unreachable code."
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com
Signed-off-by: NBen Myers <bpm@sgi.com>

08474ed6

11 9月, 2013 1 次提交

xfs: recovery of swap extents operations for CRC filesystems · 638f4416

由 Dave Chinner 提交于 8月 30, 2013

This is the recovery side of the btree block owner change operation
performed by swapext on CRC enabled filesystems. We detect that an
owner change is needed by the flag that has been placed on the inode
log format flag field. Because the inode recovery is being replayed
after the buffers that make up the BMBT in the given checkpoint, we
can walk all the buffers and directly modify them when we see the
flag set on an inode.

Because the inode can be relogged and hence present in multiple
chekpoints with the "change owner" flag set, we could do multiple
passes across the inode to do this change. While this isn't optimal,
we can't directly ignore the flag as there may be multiple
independent swap extent operations being replayed on the same inode
in different checkpoints so we can't ignore them.

Further, because the owner change operation uses ordered buffers, we
might have buffers that are newer on disk than the current
checkpoint and so already have the owner changed in them. Hence we
cannot just peek at a buffer in the tree and check that it has the
correct owner and assume that the change was completed.

So, for the moment just brute force the owner change every time we
see an inode with the flag set. Note that we have to be careful here
because the owner of the buffers may point to either the old owner
or the new owner. Currently the verifier can't verify the owner
directly, so there is no failure case here right now. If we verify
the owner exactly in future, then we'll have to take this into
account.

This was tested in terms of normal operation via xfstests - all of
the fsr tests now pass without failure. however, we really need to
modify xfs/227 to stress v3 inodes correctly to ensure we fully
cover this case for v5 filesystems.

In terms of recovery testing, I used a hacked version of xfs_fsr
that held the temp inode open for a few seconds before exiting so
that the filesystem could be shut down with an open owner change
recovery flags set on at least the temp inode. fsr leaves the temp
inode unlinked and in btree format, so this was necessary for the
owner change to be reliably replayed.

logprint confirmed the tmp inode in the log had the correct flag set:

INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
        INODE: #regs:3   ino:0x44  flags:0x209   dsize:88
	                                 ^^^^^

0x200 is set, indicating a data fork owner change needed to be
replayed on inode 0x44.  A printk in the revoery code confirmed that
the inode change was recovered:

XFS (vdc): Mounting Filesystem
XFS (vdc): Starting recovery (logdev: internal)
recovering owner change ino 0x44
XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
Use of these features in this kernel is at your own risk!
XFS (vdc): Ending recovery (logdev: internal)

The script used to test this was:

$ cat ./recovery-fsr.sh
#!/bin/bash

dev=/dev/vdc
mntpt=/mnt/scratch
testfile=$mntpt/testfile

umount $mntpt
mkfs.xfs -f -m crc=1 $dev
mount $dev $mntpt
chmod 777 $mntpt

for i in `seq 10000 -1 0`; do
        xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
done
xfs_bmap -vp $testfile |head -20

xfs_fsr -d -v $testfile &
sleep 10
/home/dave/src/xfstests-dev/src/godown -f $mntpt
wait
umount $mntpt

xfs_logprint -t $dev |tail -20
time mount $dev $mntpt
xfs_bmap -vp $testfile
umount $mntpt
$
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

638f4416

10 9月, 2013 1 次提交

xfs: fix endian warning in xlog_recover_get_buf_lsn() · e9fbbad8

由 Dave Chinner 提交于 9月 02, 2013

sparse reports:

fs/xfs/xfs_log_recover.c:2017:24: sparse: cast to restricted __be64

Because I used the wrong structure for the on-disk superblock cast
in 50d5c8d8 ("xfs: check LSN ordering for v5 superblocks during
recovery"). Fix it.

Reported-by: kbuild test robot
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

e9fbbad8

31 8月, 2013 2 次提交

xfs: inode buffers may not be valid during recovery readahead · d8914002

由 Dave Chinner 提交于 8月 27, 2013

CRC enabled filesystems fail log recovery with 100% reliability on
xfstests xfs/085 with the following failure:

XFS (vdb): Mounting Filesystem
XFS (vdb): Starting recovery (logdev: internal)
XFS (vdb): Corruption detected. Unmount and run xfs_repair
XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0)
XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95

The problem is that the inode buffer has not been recovered before
the readahead on the inode buffer is issued. The checkpoint being
recovered actually allocates the inode chunk we are doing readahead
from, so what comes from disk during readahead is essentially
random and the verifier barfs on it.

This inode buffer readahead problem affects non-crc filesystems,
too, but xfstests does not trigger it at all on such
configurations....
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

d8914002

xfs: check LSN ordering for v5 superblocks during recovery · 50d5c8d8

由 Dave Chinner 提交于 8月 28, 2013

Log recovery has some strict ordering requirements which unordered
or reordered metadata writeback can defeat. This can occur when an
item is logged in a transaction, written back to disk, and then
logged in a new transaction before the tail of the log is moved past
the original modification.

The result of this is that when we read an object off disk for
recovery purposes, the buffer that we read may not contain the
object type that recovery is expecting and hence at the end of the
checkpoint being recovered we have an invalid object in memory.

This isn't usually a problem, as recovery will then replay all the
other checkpoints and that brings the object back to a valid and
correct state, but the issue is that while the object is in the
invalid state it can be flushed to disk. This results in the object
verifier failing and triggering a corruption shutdown of log
recover. This is correct behaviour for the verifiers - the problem
is that we are not detecting that the object we've read off disk is
newer than the transaction we are replaying.

All metadata in v5 filesystems has the LSN of it's last modification
stamped in it. This enabled log recover to read that field and
determine the age of the object on disk correctly. If the LSN of the
object on disk is older than the transaction being replayed, then we
replay the modification. If the LSN of the object matches or is more
recent than the transaction's LSN, then we should avoid overwriting
the object as that is what leads to the transient corrupt state.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

50d5c8d8

29 8月, 2013 2 次提交

xfs: fix bad dquot buffer size in log recovery readahead · 0f0d3345

由 Dave Chinner 提交于 8月 27, 2013

xfstests xfs/087 fails 100% reliably with this assert:

XFS (vdb): Mounting Filesystem
XFS (vdb): Starting recovery (logdev: internal)
XFS: Assertion failed: bp->b_flags & XBF_STALE, file: fs/xfs/xfs_buf.c, line: 548

while trying to read a dquot buffer in xlog_recover_dquot_ra_pass2().

The issue is that the buffer length to read that is passed to
xfs_buf_readahead is in units of filesystem blocks, not disk blocks.
(i.e. FSB, not daddr). Fix it but putting the correct conversion in
place.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

0f0d3345

xfs: don't account buffer cancellation during log recovery readahead · 84a5b730

由 Dave Chinner 提交于 8月 27, 2013

When doing readhaead in log recovery, we check to see if buffers are
cancelled before doing readahead. If we find a cancelled buffer,
however, we always decrement the reference count we have on it, and
that means that readahead is causing a double decrement of the
cancelled buffer reference count.

This results in log recovery *replaying cancelled buffers* as the
actual recovery pass does not find the cancelled buffer entry in the
commit phase of the second pass across a transaction. On debug
kernels, this results in an ASSERT failure like so:

XFS: Assertion failed: !(flags & XFS_BLF_CANCEL), file: fs/xfs/xfs_log_recover.c, line: 1815

xfstests generic/311 reproduces this ASSERT failure with 100%
reproducability.

Fix it by making readahead only peek at the buffer cancelled state
rather than the full accounting that xlog_check_buffer_cancelled()
does.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

84a5b730

24 8月, 2013 1 次提交

xfs: introduce object readahead to log recovery · 00574da1

由 Zhi Yong Wu 提交于 8月 14, 2013

  It can take a long time to run log recovery operation because it is
single threaded and is bound by read latency. We can find that it took
most of the time to wait for the read IO to occur, so if one object
readahead is introduced to log recovery, it will obviously reduce the
log recovery time.

Log recovery time stat:

          w/o this patch        w/ this patch

real:        0m15.023s             0m7.802s
user:        0m0.001s              0m0.001s
sys:         0m0.246s              0m0.107s
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

00574da1

21 8月, 2013 3 次提交

xfs: fix the comment of xlog_recover_do_dquot_buffer() · 8ba701ee

由 Zhi Yong Wu 提交于 8月 12, 2013

Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

8ba701ee

xfs: fix the comment of xlog_find_head() · 0a94da24

由 Zhi Yong Wu 提交于 8月 07, 2013

Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

0a94da24

xfs: fix the comment of xlog_recover_buffer_pass2() · 34be5ff3

由 Zhi Yong Wu 提交于 8月 07, 2013

Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

34be5ff3

14 8月, 2013 2 次提交

xfs:free bp in xlog_find_tail() error path · 050a1952

由 Eric Sandeen 提交于 7月 31, 2013

xlog_find_tail() currently leaks a bp on one error path.

There is no error target, so manually free the bp before
returning the error.

Found by Coverity.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

050a1952

xfs: free bp in xlog_find_zeroed() error path · 5d0a6549

由 Eric Sandeen 提交于 7月 31, 2013

xlog_find_zeroed() currently leaks a bp on one error path.

Using the bp_err: target resolves this.

Found by Coverity.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

5d0a6549

13 8月, 2013 3 次提交

xfs: refactor xfs_trans_reserve() interface · 3d3c8b52

由 Jie Liu 提交于 8月 12, 2013

With the new xfs_trans_res structure has been introduced, the log
reservation size, log count as well as log flags are pre-initialized
at mount time.  So it's time to refine xfs_trans_reserve() interface
to be more neat.

Also, introduce a new helper M_RES() to return a pointer to the
mp->m_resv structure to simplify the input.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

3d3c8b52

xfs: consolidate xfs_utils.c · e546cb79

由 Dave Chinner 提交于 8月 12, 2013

There are a few small helper functions in xfs_util, all related to
xfs_inode modifications. Move them all to xfs_inode.c so all
xfs_inode operations are consiolidated in the one place.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

e546cb79

xfs: reshuffle dir2 definitions around for userspace · 2b9ab5ab

由 Dave Chinner 提交于 8月 12, 2013

Many of the definitions within xfs_dir2_priv.h are needed in
userspace outside libxfs. Definitions within xfs_dir2_priv.h are
wholly contained within libxfs, so we need to shuffle some of the
definitions around to keep consistency across files shared between
user and kernel space.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

2b9ab5ab

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功