提交 · be51f8119c2f5e27437d2c4271f6419f3b8e609f · openeuler / raspberrypi-kernel

05 10月, 2016 12 次提交

xfs: support bmapping delalloc extents in the CoW fork · be51f811

由 Darrick J. Wong 提交于 10月 03, 2016

Allow the creation of delayed allocation extents in the CoW fork.  In
a subsequent patch we'll wire up iomap_begin to actually do this via
reflink helper functions.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

be51f811

xfs: introduce the CoW fork · 3993baeb

由 Darrick J. Wong 提交于 10月 03, 2016

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3993baeb

xfs: don't allow reflinked dir/dev/fifo/socket/pipe files · 11715a21

由 Darrick J. Wong 提交于 10月 03, 2016

Only non-rt files can be reflinked, so check that when we load an
inode.  Also, don't leak the attr fork if there's a failure.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

11715a21

xfs: add reflink feature flag to geometry · f0ec1b8e

由 Darrick J. Wong 提交于 10月 03, 2016

Report the reflink feature in the XFS geometry so that xfs_info and
friends know the filesystem has this feature.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f0ec1b8e

xfs: define tracepoints for reflink activities · 53aa1c34

由 Darrick J. Wong 提交于 10月 03, 2016

Define all the tracepoints we need to inspect the runtime operation
of reflink/dedupe/copy-on-write.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

53aa1c34

xfs: return work remaining at the end of a bunmapi operation · 4453593b

由 Darrick J. Wong 提交于 10月 03, 2016

Return the range of file blocks that bunmapi didn't free.  This hint
is used by CoW and reflink to figure out what part of an extent
actually got freed so that it can set up the appropriate atomic
remapping of just the freed range.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4453593b

xfs: when replaying bmap operations, don't let unlinked inodes get reaped · 17c12bcd

由 Darrick J. Wong 提交于 10月 03, 2016

Log recovery will iget an inode to replay BUI items and iput the inode
when it's done.  Unfortunately, if the inode was unlinked, the iput
will see that i_nlink == 0 and decide to truncate & free the inode,
which prevents us from replaying subsequent BUIs.  We can't skip the
BUIs because we have to replay all the redo items to ensure that
atomic operations complete.

Since unlinked inode recovery will reap the inode anyway, we can
safely introduce a new inode flag to indicate that an inode is in this
'unlinked recovery' state and should not be auto-reaped in the
drop_inode path.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

17c12bcd

xfs: implement deferred bmbt map/unmap operations · 9f3afb57

由 Darrick J. Wong 提交于 10月 03, 2016

Implement deferred versions of the inode block map/unmap functions.
These will be used in subsequent patches to make reflink operations
atomic.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

9f3afb57

xfs: pass bmapi flags through to bmap_del_extent · 4847acf8

由 Darrick J. Wong 提交于 10月 03, 2016

Pass BMAPI_ flags from bunmapi into bmap_del_extent and extend
BMAPI_REMAP (which means "don't touch the allocator or the quota
accounting") to apply to bunmapi as well.  This will be used to
implement the unmap operation, which will be used by swapext.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4847acf8

xfs: map an inode's offset to an exact physical block · f65306ea

由 Darrick J. Wong 提交于 10月 03, 2016

Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks.  This enables reflink to map a file to blocks that are already
in use.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f65306ea

xfs: log bmap intent items · 77d61fe4

由 Darrick J. Wong 提交于 10月 03, 2016

Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

77d61fe4

xfs: create bmbt update intent log items · 6413a014

由 Darrick J. Wong 提交于 10月 03, 2016

Create bmbt update intent/done log items to record redo information in
the log.  Because we roll transactions multiple times for reflink
operations, we also have to track the status of the metadata updates
that will be recorded in the post-roll transactions in case we crash
before committing the final transaction.  This mechanism enables log
recovery to finish what was already started.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

6413a014

04 10月, 2016 18 次提交

xfs: introduce reflink utility functions · 350a27a6

由 Darrick J. Wong 提交于 10月 03, 2016

These functions will be used by the other reflink functions to find
the maximum length of a range of shared blocks.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.coM>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

350a27a6

xfs: reserve AG space for the refcount btree root · d0e853f3

由 Darrick J. Wong 提交于 10月 03, 2016

Reduce the max AG usable space size so that we always have space for
the refcount btree root.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

d0e853f3

xfs: add refcount btree block detection to log recovery · a90c00f0

由 Darrick J. Wong 提交于 10月 03, 2016

Identify refcountbt blocks in the log correctly so that we can
validate them during log recovery.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a90c00f0

xfs: adjust refcount when unmapping file blocks · 62aab20f

由 Darrick J. Wong 提交于 10月 03, 2016

When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

62aab20f

xfs: connect refcount adjust functions to upper layers · 33ba6129

由 Darrick J. Wong 提交于 10月 03, 2016

Plumb in the upper level interface to schedule and finish deferred
refcount operations via the deferred ops mechanism.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

33ba6129

xfs: adjust refcount of an extent of blocks in refcount btree · 31727258

由 Darrick J. Wong 提交于 10月 03, 2016

Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

31727258

xfs: log refcount intent items · f997ee21

由 Darrick J. Wong 提交于 10月 03, 2016

Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f997ee21

xfs: create refcount update intent log items · baf4bcac

由 Darrick J. Wong 提交于 10月 03, 2016

Create refcount update intent/done log items to record redo
information in the log.  Because we need to roll transactions between
updating the bmbt mapping and updating the reverse mapping, we also
have to track the status of the metadata updates that will be recorded
in the post-roll transactions, just in case we crash before committing
the final transaction.  This mechanism enables log recovery to finish
what was already started.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

baf4bcac

xfs: add refcount btree operations · bdf28630

由 Darrick J. Wong 提交于 10月 03, 2016

Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bdf28630

xfs: account for the refcount btree in the alloc/free log reservation · f310bd2e

由 Darrick J. Wong 提交于 10月 03, 2016

Every time we allocate or free a data extent, we might need to split
the refcount btree.  Reserve some blocks in the transaction to handle
this possibility.  Even though the deferred refcount code can roll a
transaction to avoid overloading the transaction, we can still exceed
the reservation.

Certain pathological workloads (1k blocks, no cowextsize hint, random
directio writes), cause a perfect storm wherein a refcount adjustment
of a large range of blocks causes full tree splits in two separate
extents in two separate refcount tree blocks; allocating new refcount
tree blocks causes rmap btree splits; and all the allocation activity
causes the freespace btrees to split, blowing the reservation.

(Reproduced by generic/167 over NFS atop XFS)
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[darrick.wong@oracle.com: add commit message]
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f310bd2e

xfs: add refcount btree support to growfs · ac4fef69

由 Darrick J. Wong 提交于 10月 03, 2016

Modify the growfs code to initialize new refcount btree blocks.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

ac4fef69

xfs: define the on-disk refcount btree format · 1946b91c

由 Darrick J. Wong 提交于 10月 03, 2016

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

1946b91c

xfs: refcount btree add more reserved blocks · af30dfa1

由 Darrick J. Wong 提交于 10月 03, 2016

Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

af30dfa1

xfs: introduce refcount btree definitions · 46eeb521

由 Darrick J. Wong 提交于 10月 03, 2016

Add new per-AG refcount btree definitions to the per-AG structures.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

46eeb521

xfs: define tracepoints for refcount btree activities · c75c752d

由 Darrick J. Wong 提交于 10月 03, 2016

Define all the tracepoints we need to inspect the refcount btree
runtime operation.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

c75c752d

xfs: return an error when an inline directory is too small · 9cdafd8a

由 Darrick J. Wong 提交于 10月 03, 2016

If the size of an inline directory is so small that it doesn't
even cover the required header size, return an error to userspace
instead of ASSERTing and returning 0 like everything's ok.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reported-by: NJan Kara <jack@suse.cz>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

9cdafd8a

vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks · 71be6b49

由 Darrick J. Wong 提交于 10月 03, 2016

Add a new fallocate mode flag that explicitly unshares blocks on
filesystems that support such features.  The new flag can only
be used with an allocate-mode fallocate call.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

71be6b49

vfs: support FS_XFLAG_COWEXTSIZE and get/set of CoW extent size hint · 0a6eab8b

由 Darrick J. Wong 提交于 10月 03, 2016

Introduce XFLAGs for the new XFS CoW extent size hint, and actually
plumb the CoW extent size hint into the fsxattr structure.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

0a6eab8b

03 10月, 2016 8 次提交

D

Merge branch 'xfs-4.9-log-recovery-fixes' into for-next · 155cd433
由 Dave Chinner 提交于 10月 03, 2016

155cd433
D

Merge branch 'iomap-4.9-dax' into for-next · a1f45e66
由 Dave Chinner 提交于 10月 03, 2016

a1f45e66
D

Merge branch 'xfs-4.9-delalloc-rework' into for-next · a89b3f97
由 Dave Chinner 提交于 10月 03, 2016

a89b3f97
D

Merge branch 'xfs-4.9-reflink-prep' into for-next · 79ad5761
由 Dave Chinner 提交于 10月 03, 2016

79ad5761
D

Merge branch 'iomap-4.9-misc-fixes-1' into for-next · b036b970
由 Dave Chinner 提交于 10月 03, 2016

b036b970

fs: update atime before I/O in generic_file_read_iter · 0d5b0cf2

由 Christoph Hellwig 提交于 10月 03, 2016

After the call to ->direct_IO the final reference to the file might have
been dropped by aio_complete already, and the call to file_accessed might
cause a use after free.

Instead update the access time before the I/O, similar to how we
update the time stamps before writes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

0d5b0cf2

xfs: update atime before I/O in xfs_file_dio_aio_read · a447d7cd

由 Christoph Hellwig 提交于 10月 03, 2016

After the call to __blkdev_direct_IO the final reference to the file
might have been dropped by aio_complete already, and the call to
file_accessed might cause a use after free.

Instead update the access time before the I/O, similar to how we
update the time stamps before writes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-and-tested-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a447d7cd

ext2: fix possible integer truncation in ext2_iomap_begin · d5bfccdf

由 Christoph Hellwig 提交于 10月 03, 2016

For 32-bit architectures we need to cast first_block to u64 before
shifting it left.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

d5bfccdf

26 9月, 2016 2 次提交

xfs: log recovery tracepoints to track current lsn and buffer submission · 5cd9cee9

由 Brian Foster 提交于 9月 26, 2016

Log recovery has particular rules around buffer submission along with
tricky corner cases where independent transactions can share an LSN. As
such, it can be difficult to follow when/why buffers are submitted
during recovery.

Add a couple tracepoints to post the current LSN of a record when a new
record is being processed and when a buffer is being skipped due to LSN
ordering. Also, update the recover item class to include the LSN of the
current transaction for the item being processed.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

5cd9cee9

xfs: update metadata LSN in buffers during log recovery · 60a4a222

由 Brian Foster 提交于 9月 26, 2016

Log recovery is currently broken for v5 superblocks in that it never
updates the metadata LSN of buffers written out during recovery. The
metadata LSN is recorded in various bits of metadata to provide recovery
ordering criteria that prevents transient corruption states reported by
buffer write verifiers. Without such ordering logic, buffer updates can
be replayed out of order and lead to false positive transient corruption
states. This is generally not a corruption vector on its own, but
corruption detection shuts down the filesystem and ultimately prevents a
mount if it occurs during log recovery. This requires an xfs_repair run
that clears the log and potentially loses filesystem updates.

This problem is avoided in most cases as metadata writes during normal
filesystem operation update the metadata LSN appropriately. The problem
with log recovery not updating metadata LSNs manifests if the system
happens to crash shortly after log recovery itself. In this scenario, it
is possible for log recovery to complete all metadata I/O such that the
filesystem is consistent. If a crash occurs after that point but before
the log tail is pushed forward by subsequent operations, however, the
next mount performs the same log recovery over again. If a buffer is
updated multiple times in the dirty range of the log, an earlier update
in the log might not be valid based on the current state of the
associated buffer after all of the updates in the log had been replayed
(before the previous crash). If a verifier happens to detect such a
problem, the filesystem claims corruption and immediately shuts down.

This commonly manifests in practice as directory block verifier failures
such as the following, likely due to directory verifiers being
particularly detailed in their checks as compared to most others:

  ...
  Mounting V5 Filesystem
  XFS (dm-0): Starting recovery (logdev: internal)
  XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line ... of \
    file fs/xfs/libxfs/xfs_dir2_data.c.  Caller xfs_dir3_data_verify ...
  ...

Update log recovery to update the metadata LSN of recovered buffers.
Since metadata LSNs are already updated by write verifer functions via
attached log items, attach a dummy log item to the buffer during
validation and explicitly set the LSN of the current transaction. This
ensures that the metadata LSN of a buffer is updated based on whether
the recovery I/O actually completes, and if so, that subsequent recovery
attempts identify that the buffer is already up to date with respect to
the current transaction.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

60a4a222