提交 · 84d42ea6b6269aee7eb3d91a4425a08b8965fd4a · openanolis / cloud-kernel

16 5月, 2018 1 次提交

xfs: implement the metadata repair ioctl flag · 84d42ea6

由 Darrick J. Wong 提交于 5月 14, 2018

Plumb in the pieces necessary to make the "scrub" subfunction of
the scrub ioctl actually work.  This means that we make the IFLAG_REPAIR
flag to the scrub ioctl actually do something, and we add an errortag
knob so that xfstests can force the kernel to rebuild a metadata
structure even if there's nothing wrong with it.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

84d42ea6

07 11月, 2017 1 次提交

xfs: use a b+tree for the in-core extent list · 6bdcf26a

由 Christoph Hellwig 提交于 11月 03, 2017

Replace the current linear list and the indirection array for the in-core
extent list with a b+tree to avoid the need for larger memory allocations
for the indirection array when lots of extents are present. The current
extent list implementations leads to heavy pressure on the memory
allocator when modifying files with a high extent count, and can lead
to high latencies because of that.

The replacement is a b+tree with a few quirks. The leaf nodes directly
store the extent record in two u64 values. The encoding is a little bit
different from the existing in-core extent records so that the start
offset and length which are required for lookups can be retreived with
simple mask operations. The inner nodes store a 64-bit key containing
the start offset in the first half of the node, and the pointers to the
next lower level in the second half. In either case we walk the node
from the beginninig to the end and do a linear search, as that is more
efficient for the low number of cache lines touched during a search
(2 for the inner nodes, 4 for the leaf nodes) than a binary search.
We store termination markers (zero length for the leaf nodes, an
otherwise impossible high bit for the inner nodes) to terminate the key
list / records instead of storing a count to use the available cache
lines as efficiently as possible.

One quirk of the algorithm is that while we normally split a node half and
half like usual btree implementations we just spill over entries added at
the very end of the list to a new node on its own. This means we get a
100% fill grade for the common cases of bulk insertion when reading an
inode into memory, and when only sequentially appending to a file. The
downside is a slightly higher chance of splits on the first random
insertions.

Both insert and removal manually recurse into the lower levels, but
the bulk deletion of the whole tree is still implemented as a recursive
function call, although one limited by the overall depth and with very
little stack usage in every iteration.

For the first few extents we dynamically grow the list from a single
extent to the next powers of two until we have a first full leaf block
and that building the actual tree.

The code started out based on the generic lib/btree.c code from Joern
Engel based on earlier work from Peter Zijlstra, but has since been
rewritten beyond recognition.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

6bdcf26a

27 10月, 2017 17 次提交

xfs: scrub quota information · c2fc338c

由 Darrick J. Wong 提交于 10月 17, 2017

Perform some quick sanity testing of the disk quota information.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

c2fc338c

xfs: scrub realtime bitmap/summary · 29b0767b

由 Darrick J. Wong 提交于 10月 17, 2017

Perform simple tests of the realtime bitmap and summary.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

29b0767b

xfs: scrub directory parent pointers · 0f28b257

由 Darrick J. Wong 提交于 10月 17, 2017

Scrub parent pointers, sort of.  For directories, we can ride the
'..' entry up to the parent to confirm that there's at most one
dentry that points back to this directory.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

0f28b257

xfs: scrub symbolic links · 2a721dbb

由 Darrick J. Wong 提交于 10月 17, 2017

Create the infrastructure to scrub symbolic link data.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

2a721dbb

xfs: scrub extended attributes · eec0482e

由 Darrick J. Wong 提交于 10月 17, 2017

Scrub the hash tree, keys, and values in an extended attribute structure.
Refactor the attribute code to use the transaction if the caller supplied
one to avoid buffer deadocks.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

eec0482e

xfs: scrub directory metadata · a5c46e5e

由 Darrick J. Wong 提交于 10月 17, 2017

Scrub the hash tree and all the entries in a directory.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

a5c46e5e

xfs: scrub directory/attribute btrees · 7c4a07a4

由 Darrick J. Wong 提交于 10月 17, 2017

Provide a way to check the shape and scrub the hashes and records
in a directory or extended attribute btree.  These are helper functions
for the directory & attribute scrubbers in subsequent patches.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
[fengguang: remove unneeded variable to store return value]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

7c4a07a4

xfs: scrub inode block mappings · 99d9d8d0

由 Darrick J. Wong 提交于 10月 17, 2017

Scrub an individual inode's block mappings to make sure they make sense.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

99d9d8d0

xfs: scrub inodes · 80e4e126

由 Darrick J. Wong 提交于 10月 17, 2017

Scrub the fields within an inode.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

80e4e126

xfs: scrub refcount btrees · edc09b52

由 Darrick J. Wong 提交于 10月 17, 2017

Plumb in the pieces necessary to check the refcount btree.  If rmap is
available, check the reference count by performing an interval query
against the rmapbt.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

edc09b52

xfs: scrub rmap btrees · c7e693d9

由 Darrick J. Wong 提交于 10月 17, 2017

Check the reverse mapping records to make sure that the contents
make sense.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

c7e693d9

xfs: scrub inode btrees · 3daa6641

由 Darrick J. Wong 提交于 10月 17, 2017

Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

3daa6641

xfs: scrub free space btrees · efa7a99c

由 Darrick J. Wong 提交于 10月 17, 2017

Check the extent records free space btrees to ensure that the values
look sane.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

efa7a99c

xfs: scrub the secondary superblocks · 21fb4cb1

由 Darrick J. Wong 提交于 10月 17, 2017

Ensure that the geometry presented in the backup superblocks matches
the primary superblock so that repair can recover the filesystem if
that primary gets corrupted.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

21fb4cb1

xfs: create helpers to scrub a metadata btree · 537964bc

由 Darrick J. Wong 提交于 10月 17, 2017

Create helper functions and tracepoints to deal with errors while
scrubbing a metadata btree.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

537964bc

xfs: probe the scrub ioctl · dcb660f9

由 Darrick J. Wong 提交于 10月 17, 2017

Create a probe scrubber with id 0.  This will be used by xfs_scrub to
probe the kernel's abilities to scrub (and repair) the metadata.  We do
this by validating the ioctl inputs from userspace, preparing the
filesystem for a scrub (or a repair) operation, and immediately
returning to userspace.  Userspace can use the returned errno and
structure state to decide (in broad terms) if scrub/repair are
supported by the running kernel.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

dcb660f9

xfs: create an ioctl to scrub AG metadata · 36fd6e86

由 Darrick J. Wong 提交于 10月 17, 2017

Create an ioctl that can be used to scrub internal filesystem metadata.
The new ioctl takes the metadata type, an (optional) AG number, an
(optional) inode number and generation, and a flags argument.  This will
be used by the upcoming XFS online scrub tool.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

36fd6e86

05 6月, 2017 1 次提交

xfs: use the common helper uuid_is_null() · d905fdaa

由 Amir Goldstein 提交于 5月 04, 2017

Use the common helper uuid_is_null() and remove the xfs specific
helper uuid_is_nil().

The common helper does not check for the NULL pointer value as
xfs helper did, but xfs code never calls the helper with a pointer
that can be NULL.

Conform comments and warning strings to use the term 'null uuid'
instead of 'nil uuid', because this is the terminology used by
lib/uuid.c and its users. It is also the terminology used in
userspace by libuuid and xfsprogs.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
[hch: remove now unused uuid.[ch]]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

d905fdaa

04 4月, 2017 1 次提交

xfs: implement the GETFSMAP ioctl · e89c0413

由 Darrick J. Wong 提交于 3月 28, 2017

Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

e89c0413

05 10月, 2016 3 次提交

xfs: introduce the CoW fork · 3993baeb

由 Darrick J. Wong 提交于 10月 03, 2016

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3993baeb

xfs: log bmap intent items · 77d61fe4

由 Darrick J. Wong 提交于 10月 03, 2016

Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

77d61fe4

xfs: create bmbt update intent log items · 6413a014

由 Darrick J. Wong 提交于 10月 03, 2016

Create bmbt update intent/done log items to record redo information in
the log.  Because we roll transactions multiple times for reflink
operations, we also have to track the status of the metadata updates
that will be recorded in the post-roll transactions in case we crash
before committing the final transaction.  This mechanism enables log
recovery to finish what was already started.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

6413a014

04 10月, 2016 4 次提交

xfs: log refcount intent items · f997ee21

由 Darrick J. Wong 提交于 10月 03, 2016

Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f997ee21

xfs: create refcount update intent log items · baf4bcac

由 Darrick J. Wong 提交于 10月 03, 2016

Create refcount update intent/done log items to record redo
information in the log.  Because we need to roll transactions between
updating the bmbt mapping and updating the reverse mapping, we also
have to track the status of the metadata updates that will be recorded
in the post-roll transactions, just in case we crash before committing
the final transaction.  This mechanism enables log recovery to finish
what was already started.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

baf4bcac

xfs: add refcount btree operations · bdf28630

由 Darrick J. Wong 提交于 10月 03, 2016

Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bdf28630

xfs: define the on-disk refcount btree format · 1946b91c

由 Darrick J. Wong 提交于 10月 03, 2016

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

1946b91c

19 9月, 2016 1 次提交

xfs: set up per-AG free space reservations · 3fd129b6

由 Darrick J. Wong 提交于 9月 19, 2016

One unfortunate quirk of the reference count and reverse mapping
btrees -- they can expand in size when blocks are written to *other*
allocation groups if, say, one large extent becomes a lot of tiny
extents. Since we don't want to start throwing errors in the middle
of CoWing, we need to reserve some blocks to handle future expansion.
The transaction block reservation counters aren't sufficient here
because we have to have a reserve of blocks in every AG, not just
somewhere in the filesystem.

Therefore, create two per-AG block reservation pools. One feeds the
AGFL so that rmapbt expansion always succeeds, and the other feeds all
other metadata so that refcountbt expansion never fails.

Use the count of how many reserved blocks we need to have on hand to
create a virtual reservation in the AG. Through selective clamping of
the maximum length of allocation requests and of the length of the
longest free extent, we can make it look like there's less free space
in the AG unless the reservation owner is asking for blocks.

In other words, play some accounting tricks in-core to make sure that
we always have blocks available. On the plus side, there's nothing to
clean up if we crash, which is contrast to the strategy that the rough
draft used (actually removing extents from the freespace btrees).
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

3fd129b6

03 8月, 2016 5 次提交

xfs: log rmap intent items · 9e88b5d8

由 Darrick J. Wong 提交于 8月 03, 2016

Provide a mechanism for higher levels to create RUI/RUD items, submit
them to the log, and a stub function to deal with recovered RUI items.
These parts will be connected to the rmapbt in a later patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

9e88b5d8

xfs: create rmap update intent log items · 5880f2d7

由 Darrick J. Wong 提交于 8月 03, 2016

Create rmap update intent/done log items to record redo information in
the log.  Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction.  This mechanism enables log recovery to finish what
was already started.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

5880f2d7

xfs: define the on-disk rmap btree format · 035e00ac

由 Darrick J. Wong 提交于 8月 03, 2016

Originally-From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

035e00ac

xfs: introduce rmap extent operation stubs · 673930c3

由 Darrick J. Wong 提交于 8月 03, 2016

Originally-From: Dave Chinner <dchinner@redhat.com>

Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.

[darrick.wong@oracle.com: Extend the stubs to take full owner info.]
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

673930c3

xfs: move deferred operations into a separate file · 4e0cc29b

由 Darrick J. Wong 提交于 8月 03, 2016

All the code around struct xfs_bmap_free basically implements a
deferred operation framework through which we can roll transactions
(to unlock buffers and avoid violating lock order rules) while
managing all the necessary log redo items.  Previously we only used
this code to free extents after some sort of mapping operation, but
with the advent of rmap and reflink, we suddenly need to do more than
that.

With that in mind, xfs_bmap_free really becomes a deferred ops control
structure.  Rename the structure and move the deferred ops into their
own file to avoid further bloating of the bmap code.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4e0cc29b

16 7月, 2016 1 次提交

xfs: abstract block export operations from nfsd layouts · 15d66ac2

由 Benjamin Coddington 提交于 7月 08, 2016

Instead of creeping pnfs layout configuration into filesystems, move the
definition of block-based export operations under a more abstract
configuration.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

15d66ac2

18 3月, 2016 2 次提交

nfsd: add SCSI layout support · f99d4fbd

由 Christoph Hellwig 提交于 3月 04, 2016

This is a simple extension to the block layout driver to use SCSI
persistent reservations for access control and fencing, as well as
SCSI VPD pages for device identification.

For this we need to pass the nfs4_client to the proc_getdeviceinfo method
to generate the reservation key, and add a new fence_client method
to allow for fence actions in the layout driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f99d4fbd

nfsd: add a new config option for the block layout driver · 81c39329

由 Christoph Hellwig 提交于 3月 04, 2016

Split the config symbols into a generic pNFS one, which is invisible
and gets selected by the layout drivers, and one for the block layout
driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

81c39329

19 10月, 2015 1 次提交

xfs: stats are no longer dependent on CONFIG_PROC_FS · 985ef4dc

由 Dave Chinner 提交于 10月 19, 2015

So we need to fix the makefile to understand this, otherwise build
errors with CONFIG_PROC_FS=n occur.
Reported-and-tested-by: NJim Davis <jim.epost@gmail.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

985ef4dc

29 7月, 2015 1 次提交

libxfs: add xfs_bit.c · 1cfc4a9c

由 Dave Chinner 提交于 7月 29, 2015

The header side of xfs_bit.c is already in libxfs, and the sparse
inode code requires the xfs_next_bit() function so pull in the
xfs_bit.c file so that a sparse inode enabled libxfs compiles
cleanly in userspace.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

1cfc4a9c

16 2月, 2015 1 次提交

xfs: implement pNFS export operations · 52785112

由 Christoph Hellwig 提交于 2月 16, 2015

Add operations to export pNFS block layouts from an XFS filesystem.  See
the previous commit adding the operations for an explanation of them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

52785112

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功