提交 c8d85669 编写于 作者: L Linus Torvalds

Merge tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "For 3.10-rc1 we have a number of bug fixes and cleanups and a
  currently experimental feature from David Chinner, CRCs protection for
  metadata.  CRCs are enabled by using mkfs.xfs to create a filesystem
  with the feature bits set.

   - numerous fixes for speculative preallocation
   - don't verify buffers on IO errors
   - rename of random32 to prandom32
   - refactoring/rearrangement in xfs_bmap.c
   - removal of unused m_inode_shrink in struct xfs_mount
   - fix error handling of xfs_bufs and readahead
   - quota driven preallocation throttling
   - fix WARN_ON in xfs_vm_releasepage
   - add ratelimited printk for different alert levels
   - fix spurious forced shutdowns due to freed Extent Free Intents
   - remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
   - remove some obsoleted comments
   - (experimental) CRC support for metadata"

* tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs: (46 commits)
  xfs: fix da node magic number mismatches
  xfs: Remote attr validation fixes and optimisations
  xfs: Teach dquot recovery about CONFIG_XFS_QUOTA
  xfs: add metadata CRC documentation
  xfs: implement extended feature masks
  xfs: add CRC checks to the superblock
  xfs: buffer type overruns blf_flags field
  xfs: add buffer types to directory and attribute buffers
  xfs: add CRC protection to remote attributes
  xfs: split remote attribute code out
  xfs: add CRCs to attr leaf blocks
  xfs: add CRCs to dir2/da node blocks
  xfs: shortform directory offsets change for dir3 format
  xfs: add CRC checking to dir2 leaf blocks
  xfs: add CRC checking to dir2 data blocks
  xfs: add CRC checking to dir2 free blocks
  xfs: add CRC checks to block format directory blocks
  xfs: add CRC checks to remote symlinks
  xfs: split out symlink code into it's own file.
  xfs: add version 3 inode format with CRCs
  ...
XFS Self Describing Metadata
----------------------------
Introduction
------------
The largest scalability problem facing XFS is not one of algorithmic
scalability, but of verification of the filesystem structure. Scalabilty of the
structures and indexes on disk and the algorithms for iterating them are
adequate for supporting PB scale filesystems with billions of inodes, however it
is this very scalability that causes the verification problem.
Almost all metadata on XFS is dynamically allocated. The only fixed location
metadata is the allocation group headers (SB, AGF, AGFL and AGI), while all
other metadata structures need to be discovered by walking the filesystem
structure in different ways. While this is already done by userspace tools for
validating and repairing the structure, there are limits to what they can
verify, and this in turn limits the supportable size of an XFS filesystem.
For example, it is entirely possible to manually use xfs_db and a bit of
scripting to analyse the structure of a 100TB filesystem when trying to
determine the root cause of a corruption problem, but it is still mainly a
manual task of verifying that things like single bit errors or misplaced writes
weren't the ultimate cause of a corruption event. It may take a few hours to a
few days to perform such forensic analysis, so for at this scale root cause
analysis is entirely possible.
However, if we scale the filesystem up to 1PB, we now have 10x as much metadata
to analyse and so that analysis blows out towards weeks/months of forensic work.
Most of the analysis work is slow and tedious, so as the amount of analysis goes
up, the more likely that the cause will be lost in the noise. Hence the primary
concern for supporting PB scale filesystems is minimising the time and effort
required for basic forensic analysis of the filesystem structure.
Self Describing Metadata
------------------------
One of the problems with the current metadata format is that apart from the
magic number in the metadata block, we have no other way of identifying what it
is supposed to be. We can't even identify if it is the right place. Put simply,
you can't look at a single metadata block in isolation and say "yes, it is
supposed to be there and the contents are valid".
Hence most of the time spent on forensic analysis is spent doing basic
verification of metadata values, looking for values that are in range (and hence
not detected by automated verification checks) but are not correct. Finding and
understanding how things like cross linked block lists (e.g. sibling
pointers in a btree end up with loops in them) are the key to understanding what
went wrong, but it is impossible to tell what order the blocks were linked into
each other or written to disk after the fact.
Hence we need to record more information into the metadata to allow us to
quickly determine if the metadata is intact and can be ignored for the purpose
of analysis. We can't protect against every possible type of error, but we can
ensure that common types of errors are easily detectable. Hence the concept of
self describing metadata.
The first, fundamental requirement of self describing metadata is that the
metadata object contains some form of unique identifier in a well known
location. This allows us to identify the expected contents of the block and
hence parse and verify the metadata object. IF we can't independently identify
the type of metadata in the object, then the metadata doesn't describe itself
very well at all!
Luckily, almost all XFS metadata has magic numbers embedded already - only the
AGFL, remote symlinks and remote attribute blocks do not contain identifying
magic numbers. Hence we can change the on-disk format of all these objects to
add more identifying information and detect this simply by changing the magic
numbers in the metadata objects. That is, if it has the current magic number,
the metadata isn't self identifying. If it contains a new magic number, it is
self identifying and we can do much more expansive automated verification of the
metadata object at runtime, during forensic analysis or repair.
As a primary concern, self describing metadata needs some form of overall
integrity checking. We cannot trust the metadata if we cannot verify that it has
not been changed as a result of external influences. Hence we need some form of
integrity check, and this is done by adding CRC32c validation to the metadata
block. If we can verify the block contains the metadata it was intended to
contain, a large amount of the manual verification work can be skipped.
CRC32c was selected as metadata cannot be more than 64k in length in XFS and
hence a 32 bit CRC is more than sufficient to detect multi-bit errors in
metadata blocks. CRC32c is also now hardware accelerated on common CPUs so it is
fast. So while CRC32c is not the strongest of possible integrity checks that
could be used, it is more than sufficient for our needs and has relatively
little overhead. Adding support for larger integrity fields and/or algorithms
does really provide any extra value over CRC32c, but it does add a lot of
complexity and so there is no provision for changing the integrity checking
mechanism.
Self describing metadata needs to contain enough information so that the
metadata block can be verified as being in the correct place without needing to
look at any other metadata. This means it needs to contain location information.
Just adding a block number to the metadata is not sufficient to protect against
mis-directed writes - a write might be misdirected to the wrong LUN and so be
written to the "correct block" of the wrong filesystem. Hence location
information must contain a filesystem identifier as well as a block number.
Another key information point in forensic analysis is knowing who the metadata
block belongs to. We already know the type, the location, that it is valid
and/or corrupted, and how long ago that it was last modified. Knowing the owner
of the block is important as it allows us to find other related metadata to
determine the scope of the corruption. For example, if we have a extent btree
object, we don't know what inode it belongs to and hence have to walk the entire
filesystem to find the owner of the block. Worse, the corruption could mean that
no owner can be found (i.e. it's an orphan block), and so without an owner field
in the metadata we have no idea of the scope of the corruption. If we have an
owner field in the metadata object, we can immediately do top down validation to
determine the scope of the problem.
Different types of metadata have different owner identifiers. For example,
directory, attribute and extent tree blocks are all owned by an inode, whilst
freespace btree blocks are owned by an allocation group. Hence the size and
contents of the owner field are determined by the type of metadata object we are
looking at. The owner information can also identify misplaced writes (e.g.
freespace btree block written to the wrong AG).
Self describing metadata also needs to contain some indication of when it was
written to the filesystem. One of the key information points when doing forensic
analysis is how recently the block was modified. Correlation of set of corrupted
metadata blocks based on modification times is important as it can indicate
whether the corruptions are related, whether there's been multiple corruption
events that lead to the eventual failure, and even whether there are corruptions
present that the run-time verification is not detecting.
For example, we can determine whether a metadata object is supposed to be free
space or still allocated if it is still referenced by its owner by looking at
when the free space btree block that contains the block was last written
compared to when the metadata object itself was last written. If the free space
block is more recent than the object and the object's owner, then there is a
very good chance that the block should have been removed from the owner.
To provide this "written timestamp", each metadata block gets the Log Sequence
Number (LSN) of the most recent transaction it was modified on written into it.
This number will always increase over the life of the filesystem, and the only
thing that resets it is running xfs_repair on the filesystem. Further, by use of
the LSN we can tell if the corrupted metadata all belonged to the same log
checkpoint and hence have some idea of how much modification occurred between
the first and last instance of corrupt metadata on disk and, further, how much
modification occurred between the corruption being written and when it was
detected.
Runtime Validation
------------------
Validation of self-describing metadata takes place at runtime in two places:
- immediately after a successful read from disk
- immediately prior to write IO submission
The verification is completely stateless - it is done independently of the
modification process, and seeks only to check that the metadata is what it says
it is and that the metadata fields are within bounds and internally consistent.
As such, we cannot catch all types of corruption that can occur within a block
as there may be certain limitations that operational state enforces of the
metadata, or there may be corruption of interblock relationships (e.g. corrupted
sibling pointer lists). Hence we still need stateful checking in the main code
body, but in general most of the per-field validation is handled by the
verifiers.
For read verification, the caller needs to specify the expected type of metadata
that it should see, and the IO completion process verifies that the metadata
object matches what was expected. If the verification process fails, then it
marks the object being read as EFSCORRUPTED. The caller needs to catch this
error (same as for IO errors), and if it needs to take special action due to a
verification error it can do so by catching the EFSCORRUPTED error value. If we
need more discrimination of error type at higher levels, we can define new
error numbers for different errors as necessary.
The first step in read verification is checking the magic number and determining
whether CRC validating is necessary. If it is, the CRC32c is calculated and
compared against the value stored in the object itself. Once this is validated,
further checks are made against the location information, followed by extensive
object specific metadata validation. If any of these checks fail, then the
buffer is considered corrupt and the EFSCORRUPTED error is set appropriately.
Write verification is the opposite of the read verification - first the object
is extensively verified and if it is OK we then update the LSN from the last
modification made to the object, After this, we calculate the CRC and insert it
into the object. Once this is done the write IO is allowed to continue. If any
error occurs during this process, the buffer is again marked with a EFSCORRUPTED
error for the higher layers to catch.
Structures
----------
A typical on-disk structure needs to contain the following information:
struct xfs_ondisk_hdr {
__be32 magic; /* magic number */
__be32 crc; /* CRC, not logged */
uuid_t uuid; /* filesystem identifier */
__be64 owner; /* parent object */
__be64 blkno; /* location on disk */
__be64 lsn; /* last modification in log, not logged */
};
Depending on the metadata, this information may be part of a header structure
separate to the metadata contents, or may be distributed through an existing
structure. The latter occurs with metadata that already contains some of this
information, such as the superblock and AG headers.
Other metadata may have different formats for the information, but the same
level of information is generally provided. For example:
- short btree blocks have a 32 bit owner (ag number) and a 32 bit block
number for location. The two of these combined provide the same
information as @owner and @blkno in eh above structure, but using 8
bytes less space on disk.
- directory/attribute node blocks have a 16 bit magic number, and the
header that contains the magic number has other information in it as
well. hence the additional metadata headers change the overall format
of the metadata.
A typical buffer read verifier is structured as follows:
#define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc)
static void
xfs_foo_read_verify(
struct xfs_buf *bp)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
if ((xfs_sb_version_hascrc(&mp->m_sb) &&
!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
XFS_FOO_CRC_OFF)) ||
!xfs_foo_verify(bp)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
}
The code ensures that the CRC is only checked if the filesystem has CRCs enabled
by checking the superblock of the feature bit, and then if the CRC verifies OK
(or is not needed) it verifies the actual contents of the block.
The verifier function will take a couple of different forms, depending on
whether the magic number can be used to determine the format of the block. In
the case it can't, the code is structured as follows:
static bool
xfs_foo_verify(
struct xfs_buf *bp)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_ondisk_hdr *hdr = bp->b_addr;
if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
return false;
if (!xfs_sb_version_hascrc(&mp->m_sb)) {
if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
return false;
if (bp->b_bn != be64_to_cpu(hdr->blkno))
return false;
if (hdr->owner == 0)
return false;
}
/* object specific verification checks here */
return true;
}
If there are different magic numbers for the different formats, the verifier
will look like:
static bool
xfs_foo_verify(
struct xfs_buf *bp)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_ondisk_hdr *hdr = bp->b_addr;
if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) {
if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
return false;
if (bp->b_bn != be64_to_cpu(hdr->blkno))
return false;
if (hdr->owner == 0)
return false;
} else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
return false;
/* object specific verification checks here */
return true;
}
Write verifiers are very similar to the read verifiers, they just do things in
the opposite order to the read verifiers. A typical write verifier:
static void
xfs_foo_write_verify(
struct xfs_buf *bp)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_buf_log_item *bip = bp->b_fspriv;
if (!xfs_foo_verify(bp)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
return;
}
if (!xfs_sb_version_hascrc(&mp->m_sb))
return;
if (bip) {
struct xfs_ondisk_hdr *hdr = bp->b_addr;
hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn);
}
xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF);
}
This will verify the internal structure of the metadata before we go any
further, detecting corruptions that have occurred as the metadata has been
modified in memory. If the metadata verifies OK, and CRCs are enabled, we then
update the LSN field (when it was last modified) and calculate the CRC on the
metadata. Once this is done, we can issue the IO.
Inodes and Dquots
-----------------
Inodes and dquots are special snowflakes. They have per-object CRC and
self-identifiers, but they are packed so that there are multiple objects per
buffer. Hence we do not use per-buffer verifiers to do the work of per-object
verification and CRC calculations. The per-buffer verifiers simply perform basic
identification of the buffer - that they contain inodes or dquots, and that
there are magic numbers in all the expected spots. All further CRC and
verification checks are done when each inode is read from or written back to the
buffer.
The structure of the verifiers and the identifiers checks is very similar to the
buffer code described above. The only difference is where they are called. For
example, inode read verification is done in xfs_iread() when the inode is first
read out of the buffer and the struct xfs_inode is instantiated. The inode is
already extensively verified during writeback in xfs_iflush_int, so the only
addition here is to add the LSN and CRC to the inode as it is copied back into
the buffer.
XXX: inode unlinked list modification doesn't recalculate the inode CRC! None of
the unlinked list modifications check or update CRCs, neither during unlink nor
log recovery. So, it's gone unnoticed until now. This won't matter immediately -
repair will probably complain about it - but it needs to be fixed.
......@@ -45,11 +45,11 @@ xfs-y += xfs_aops.o \
xfs_itable.o \
xfs_message.o \
xfs_mru_cache.o \
xfs_super.o \
xfs_xattr.o \
xfs_rename.o \
xfs_super.o \
xfs_utils.o \
xfs_vnodeops.o \
xfs_xattr.o \
kmem.o \
uuid.o
......@@ -58,6 +58,7 @@ xfs-y += xfs_alloc.o \
xfs_alloc_btree.o \
xfs_attr.o \
xfs_attr_leaf.o \
xfs_attr_remote.o \
xfs_bmap.o \
xfs_bmap_btree.o \
xfs_btree.o \
......@@ -73,6 +74,7 @@ xfs-y += xfs_alloc.o \
xfs_inode.o \
xfs_log_recover.o \
xfs_mount.o \
xfs_symlink.o \
xfs_trans.o
# low-level transaction/log code
......
......@@ -30,6 +30,7 @@ struct xfs_trans;
#define XFS_AGF_MAGIC 0x58414746 /* 'XAGF' */
#define XFS_AGI_MAGIC 0x58414749 /* 'XAGI' */
#define XFS_AGFL_MAGIC 0x5841464c /* 'XAFL' */
#define XFS_AGF_VERSION 1
#define XFS_AGI_VERSION 1
......@@ -63,12 +64,29 @@ typedef struct xfs_agf {
__be32 agf_spare0; /* spare field */
__be32 agf_levels[XFS_BTNUM_AGF]; /* btree levels */
__be32 agf_spare1; /* spare field */
__be32 agf_flfirst; /* first freelist block's index */
__be32 agf_fllast; /* last freelist block's index */
__be32 agf_flcount; /* count of blocks in freelist */
__be32 agf_freeblks; /* total free blocks */
__be32 agf_longest; /* longest free space */
__be32 agf_btreeblks; /* # of blocks held in AGF btrees */
uuid_t agf_uuid; /* uuid of filesystem */
/*
* reserve some contiguous space for future logged fields before we add
* the unlogged fields. This makes the range logging via flags and
* structure offsets much simpler.
*/
__be64 agf_spare64[16];
/* unlogged fields, written during buffer writeback. */
__be64 agf_lsn; /* last write sequence */
__be32 agf_crc; /* crc of agf sector */
__be32 agf_spare2;
/* structure must be padded to 64 bit alignment */
} xfs_agf_t;
#define XFS_AGF_MAGICNUM 0x00000001
......@@ -83,7 +101,8 @@ typedef struct xfs_agf {
#define XFS_AGF_FREEBLKS 0x00000200
#define XFS_AGF_LONGEST 0x00000400
#define XFS_AGF_BTREEBLKS 0x00000800
#define XFS_AGF_NUM_BITS 12
#define XFS_AGF_UUID 0x00001000
#define XFS_AGF_NUM_BITS 13
#define XFS_AGF_ALL_BITS ((1 << XFS_AGF_NUM_BITS) - 1)
#define XFS_AGF_FLAGS \
......@@ -98,7 +117,8 @@ typedef struct xfs_agf {
{ XFS_AGF_FLCOUNT, "FLCOUNT" }, \
{ XFS_AGF_FREEBLKS, "FREEBLKS" }, \
{ XFS_AGF_LONGEST, "LONGEST" }, \
{ XFS_AGF_BTREEBLKS, "BTREEBLKS" }
{ XFS_AGF_BTREEBLKS, "BTREEBLKS" }, \
{ XFS_AGF_UUID, "UUID" }
/* disk block (xfs_daddr_t) in the AG */
#define XFS_AGF_DADDR(mp) ((xfs_daddr_t)(1 << (mp)->m_sectbb_log))
......@@ -132,6 +152,7 @@ typedef struct xfs_agi {
__be32 agi_root; /* root of inode btree */
__be32 agi_level; /* levels in inode btree */
__be32 agi_freecount; /* number of free inodes */
__be32 agi_newino; /* new inode just allocated */
__be32 agi_dirino; /* last directory inode chunk */
/*
......@@ -139,6 +160,13 @@ typedef struct xfs_agi {
* still being referenced.
*/
__be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS];
uuid_t agi_uuid; /* uuid of filesystem */
__be32 agi_crc; /* crc of agi sector */
__be32 agi_pad32;
__be64 agi_lsn; /* last write sequence */
/* structure must be padded to 64 bit alignment */
} xfs_agi_t;
#define XFS_AGI_MAGICNUM 0x00000001
......@@ -171,11 +199,31 @@ extern const struct xfs_buf_ops xfs_agi_buf_ops;
*/
#define XFS_AGFL_DADDR(mp) ((xfs_daddr_t)(3 << (mp)->m_sectbb_log))
#define XFS_AGFL_BLOCK(mp) XFS_HDR_BLOCK(mp, XFS_AGFL_DADDR(mp))
#define XFS_AGFL_SIZE(mp) ((mp)->m_sb.sb_sectsize / sizeof(xfs_agblock_t))
#define XFS_BUF_TO_AGFL(bp) ((xfs_agfl_t *)((bp)->b_addr))
#define XFS_BUF_TO_AGFL_BNO(mp, bp) \
(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
&(XFS_BUF_TO_AGFL(bp)->agfl_bno[0]) : \
(__be32 *)(bp)->b_addr)
/*
* Size of the AGFL. For CRC-enabled filesystes we steal a couple of
* slots in the beginning of the block for a proper header with the
* location information and CRC.
*/
#define XFS_AGFL_SIZE(mp) \
(((mp)->m_sb.sb_sectsize - \
(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
sizeof(struct xfs_agfl) : 0)) / \
sizeof(xfs_agblock_t))
typedef struct xfs_agfl {
__be32 agfl_bno[1]; /* actually XFS_AGFL_SIZE(mp) */
__be32 agfl_magicnum;
__be32 agfl_seqno;
uuid_t agfl_uuid;
__be64 agfl_lsn;
__be32 agfl_crc;
__be32 agfl_bno[]; /* actually XFS_AGFL_SIZE(mp) */
} xfs_agfl_t;
/*
......
......@@ -33,7 +33,9 @@
#include "xfs_alloc.h"
#include "xfs_extent_busy.h"
#include "xfs_error.h"
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_buf_item.h"
struct workqueue_struct *xfs_alloc_wq;
......@@ -430,53 +432,84 @@ xfs_alloc_fixup_trees(
return 0;
}
static void
static bool
xfs_agfl_verify(
struct xfs_buf *bp)
{
#ifdef WHEN_CRCS_COME_ALONG
/*
* we cannot actually do any verification of the AGFL because mkfs does
* not initialise the AGFL to zero or NULL. Hence the only valid part of
* the AGFL is what the AGF says is active. We can't get to the AGF, so
* we can't verify just those entries are valid.
*
* This problem goes away when the CRC format change comes along as that
* requires the AGFL to be initialised by mkfs. At that point, we can
* verify the blocks in the agfl -active or not- lie within the bounds
* of the AG. Until then, just leave this check ifdef'd out.
*/
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_agfl *agfl = XFS_BUF_TO_AGFL(bp);
int agfl_ok = 1;
int i;
if (!uuid_equal(&agfl->agfl_uuid, &mp->m_sb.sb_uuid))
return false;
if (be32_to_cpu(agfl->agfl_magicnum) != XFS_AGFL_MAGIC)
return false;
/*
* during growfs operations, the perag is not fully initialised,
* so we can't use it for any useful checking. growfs ensures we can't
* use it by using uncached buffers that don't have the perag attached
* so we can detect and avoid this problem.
*/
if (bp->b_pag && be32_to_cpu(agfl->agfl_seqno) != bp->b_pag->pag_agno)
return false;
for (i = 0; i < XFS_AGFL_SIZE(mp); i++) {
if (be32_to_cpu(agfl->agfl_bno[i]) == NULLAGBLOCK ||
if (be32_to_cpu(agfl->agfl_bno[i]) != NULLAGBLOCK &&
be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
agfl_ok = 0;
return false;
}
return true;
}
static void
xfs_agfl_read_verify(
struct xfs_buf *bp)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
int agfl_ok = 1;
/*
* There is no verification of non-crc AGFLs because mkfs does not
* initialise the AGFL to zero or NULL. Hence the only valid part of the
* AGFL is what the AGF says is active. We can't get to the AGF, so we
* can't verify just those entries are valid.
*/
if (!xfs_sb_version_hascrc(&mp->m_sb))
return;
agfl_ok = xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
offsetof(struct xfs_agfl, agfl_crc));
agfl_ok = agfl_ok && xfs_agfl_verify(bp);
if (!agfl_ok) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, agfl);
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
#endif
}
static void
xfs_agfl_write_verify(
struct xfs_buf *bp)
{
xfs_agfl_verify(bp);
}
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_buf_log_item *bip = bp->b_fspriv;
static void
xfs_agfl_read_verify(
struct xfs_buf *bp)
{
xfs_agfl_verify(bp);
/* no verification of non-crc AGFLs */
if (!xfs_sb_version_hascrc(&mp->m_sb))
return;
if (!xfs_agfl_verify(bp)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
return;
}
if (bip)
XFS_BUF_TO_AGFL(bp)->agfl_lsn = cpu_to_be64(bip->bli_item.li_lsn);
xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
offsetof(struct xfs_agfl, agfl_crc));
}
const struct xfs_buf_ops xfs_agfl_buf_ops = {
......@@ -842,7 +875,7 @@ xfs_alloc_ag_vextent_near(
*/
int dofirst; /* set to do first algorithm */
dofirst = random32() & 1;
dofirst = prandom_u32() & 1;
#endif
restart:
......@@ -1982,18 +2015,18 @@ xfs_alloc_get_freelist(
int btreeblk) /* destination is a AGF btree */
{
xfs_agf_t *agf; /* a.g. freespace structure */
xfs_agfl_t *agfl; /* a.g. freelist structure */
xfs_buf_t *agflbp;/* buffer for a.g. freelist structure */
xfs_agblock_t bno; /* block number returned */
__be32 *agfl_bno;
int error;
int logflags;
xfs_mount_t *mp; /* mount structure */
xfs_mount_t *mp = tp->t_mountp;
xfs_perag_t *pag; /* per allocation group data */
agf = XFS_BUF_TO_AGF(agbp);
/*
* Freelist is empty, give up.
*/
agf = XFS_BUF_TO_AGF(agbp);
if (!agf->agf_flcount) {
*bnop = NULLAGBLOCK;
return 0;
......@@ -2001,15 +2034,17 @@ xfs_alloc_get_freelist(
/*
* Read the array of free blocks.
*/
mp = tp->t_mountp;
if ((error = xfs_alloc_read_agfl(mp, tp,
be32_to_cpu(agf->agf_seqno), &agflbp)))
error = xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno),
&agflbp);
if (error)
return error;
agfl = XFS_BUF_TO_AGFL(agflbp);
/*
* Get the block number and update the data structures.
*/
bno = be32_to_cpu(agfl->agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
bno = be32_to_cpu(agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
be32_add_cpu(&agf->agf_flfirst, 1);
xfs_trans_brelse(tp, agflbp);
if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
......@@ -2058,11 +2093,14 @@ xfs_alloc_log_agf(
offsetof(xfs_agf_t, agf_freeblks),
offsetof(xfs_agf_t, agf_longest),
offsetof(xfs_agf_t, agf_btreeblks),
offsetof(xfs_agf_t, agf_uuid),
sizeof(xfs_agf_t)
};
trace_xfs_agf(tp->t_mountp, XFS_BUF_TO_AGF(bp), fields, _RET_IP_);
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_AGF_BUF);
xfs_btree_offsets(fields, offsets, XFS_AGF_NUM_BITS, &first, &last);
xfs_trans_log_buf(tp, bp, (uint)first, (uint)last);
}
......@@ -2099,12 +2137,13 @@ xfs_alloc_put_freelist(
int btreeblk) /* block came from a AGF btree */
{
xfs_agf_t *agf; /* a.g. freespace structure */
xfs_agfl_t *agfl; /* a.g. free block array */
__be32 *blockp;/* pointer to array entry */
int error;
int logflags;
xfs_mount_t *mp; /* mount structure */
xfs_perag_t *pag; /* per allocation group data */
__be32 *agfl_bno;
int startoff;
agf = XFS_BUF_TO_AGF(agbp);
mp = tp->t_mountp;
......@@ -2112,7 +2151,6 @@ xfs_alloc_put_freelist(
if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
be32_to_cpu(agf->agf_seqno), &agflbp)))
return error;
agfl = XFS_BUF_TO_AGFL(agflbp);
be32_add_cpu(&agf->agf_fllast, 1);
if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
agf->agf_fllast = 0;
......@@ -2133,32 +2171,38 @@ xfs_alloc_put_freelist(
xfs_alloc_log_agf(tp, agbp, logflags);
ASSERT(be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp));
blockp = &agfl->agfl_bno[be32_to_cpu(agf->agf_fllast)];
agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
blockp = &agfl_bno[be32_to_cpu(agf->agf_fllast)];
*blockp = cpu_to_be32(bno);
startoff = (char *)blockp - (char *)agflbp->b_addr;
xfs_alloc_log_agf(tp, agbp, logflags);
xfs_trans_log_buf(tp, agflbp,
(int)((xfs_caddr_t)blockp - (xfs_caddr_t)agfl),
(int)((xfs_caddr_t)blockp - (xfs_caddr_t)agfl +
sizeof(xfs_agblock_t) - 1));
xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
xfs_trans_log_buf(tp, agflbp, startoff,
startoff + sizeof(xfs_agblock_t) - 1);
return 0;
}
static void
static bool
xfs_agf_verify(
struct xfs_mount *mp,
struct xfs_buf *bp)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_agf *agf;
int agf_ok;
struct xfs_agf *agf = XFS_BUF_TO_AGF(bp);
agf = XFS_BUF_TO_AGF(bp);
if (xfs_sb_version_hascrc(&mp->m_sb) &&
!uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_uuid))
return false;
agf_ok = agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) &&
XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
be32_to_cpu(agf->agf_freeblks) <= be32_to_cpu(agf->agf_length) &&
be32_to_cpu(agf->agf_flfirst) < XFS_AGFL_SIZE(mp) &&
be32_to_cpu(agf->agf_fllast) < XFS_AGFL_SIZE(mp) &&
be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp);
if (!(agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) &&
XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
be32_to_cpu(agf->agf_freeblks) <= be32_to_cpu(agf->agf_length) &&
be32_to_cpu(agf->agf_flfirst) < XFS_AGFL_SIZE(mp) &&
be32_to_cpu(agf->agf_fllast) < XFS_AGFL_SIZE(mp) &&
be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp)))
return false;
/*
* during growfs operations, the perag is not fully initialised,
......@@ -2166,33 +2210,58 @@ xfs_agf_verify(
* use it by using uncached buffers that don't have the perag attached
* so we can detect and avoid this problem.
*/
if (bp->b_pag)
agf_ok = agf_ok && be32_to_cpu(agf->agf_seqno) ==
bp->b_pag->pag_agno;
if (bp->b_pag && be32_to_cpu(agf->agf_seqno) != bp->b_pag->pag_agno)
return false;
if (xfs_sb_version_haslazysbcount(&mp->m_sb))
agf_ok = agf_ok && be32_to_cpu(agf->agf_btreeblks) <=
be32_to_cpu(agf->agf_length);
if (xfs_sb_version_haslazysbcount(&mp->m_sb) &&
be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length))
return false;
return true;;
if (unlikely(XFS_TEST_ERROR(!agf_ok, mp, XFS_ERRTAG_ALLOC_READ_AGF,
XFS_RANDOM_ALLOC_READ_AGF))) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, agf);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
}
static void
xfs_agf_read_verify(
struct xfs_buf *bp)
{
xfs_agf_verify(bp);
struct xfs_mount *mp = bp->b_target->bt_mount;
int agf_ok = 1;
if (xfs_sb_version_hascrc(&mp->m_sb))
agf_ok = xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
offsetof(struct xfs_agf, agf_crc));
agf_ok = agf_ok && xfs_agf_verify(mp, bp);
if (unlikely(XFS_TEST_ERROR(!agf_ok, mp, XFS_ERRTAG_ALLOC_READ_AGF,
XFS_RANDOM_ALLOC_READ_AGF))) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
}
static void
xfs_agf_write_verify(
struct xfs_buf *bp)
{
xfs_agf_verify(bp);
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_buf_log_item *bip = bp->b_fspriv;
if (!xfs_agf_verify(mp, bp)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
return;
}
if (!xfs_sb_version_hascrc(&mp->m_sb))
return;
if (bip)
XFS_BUF_TO_AGF(bp)->agf_lsn = cpu_to_be64(bip->bli_item.li_lsn);
xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length),
offsetof(struct xfs_agf, agf_crc));
}
const struct xfs_buf_ops xfs_agf_buf_ops = {
......
......@@ -33,6 +33,7 @@
#include "xfs_extent_busy.h"
#include "xfs_error.h"
#include "xfs_trace.h"
#include "xfs_cksum.h"
STATIC struct xfs_btree_cur *
......@@ -272,7 +273,7 @@ xfs_allocbt_key_diff(
return (__int64_t)be32_to_cpu(kp->ar_startblock) - rec->ar_startblock;
}
static void
static bool
xfs_allocbt_verify(
struct xfs_buf *bp)
{
......@@ -280,66 +281,103 @@ xfs_allocbt_verify(
struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp);
struct xfs_perag *pag = bp->b_pag;
unsigned int level;
int sblock_ok; /* block passes checks */
/*
* magic number and level verification
*
* During growfs operations, we can't verify the exact level as the
* perag is not fully initialised and hence not attached to the buffer.
* In this case, check against the maximum tree depth.
* During growfs operations, we can't verify the exact level or owner as
* the perag is not fully initialised and hence not attached to the
* buffer. In this case, check against the maximum tree depth.
*
* Similarly, during log recovery we will have a perag structure
* attached, but the agf information will not yet have been initialised
* from the on disk AGF. Again, we can only check against maximum limits
* in this case.
*/
level = be16_to_cpu(block->bb_level);
switch (block->bb_magic) {
case cpu_to_be32(XFS_ABTB_CRC_MAGIC):
if (!xfs_sb_version_hascrc(&mp->m_sb))
return false;
if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
return false;
if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
return false;
if (pag &&
be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
return false;
/* fall through */
case cpu_to_be32(XFS_ABTB_MAGIC):
if (pag)
sblock_ok = level < pag->pagf_levels[XFS_BTNUM_BNOi];
else
sblock_ok = level < mp->m_ag_maxlevels;
if (pag && pag->pagf_init) {
if (level >= pag->pagf_levels[XFS_BTNUM_BNOi])
return false;
} else if (level >= mp->m_ag_maxlevels)
return false;
break;
case cpu_to_be32(XFS_ABTC_CRC_MAGIC):
if (!xfs_sb_version_hascrc(&mp->m_sb))
return false;
if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
return false;
if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
return false;
if (pag &&
be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
return false;
/* fall through */
case cpu_to_be32(XFS_ABTC_MAGIC):
if (pag)
sblock_ok = level < pag->pagf_levels[XFS_BTNUM_CNTi];
else
sblock_ok = level < mp->m_ag_maxlevels;
if (pag && pag->pagf_init) {
if (level >= pag->pagf_levels[XFS_BTNUM_CNTi])
return false;
} else if (level >= mp->m_ag_maxlevels)
return false;
break;
default:
sblock_ok = 0;
break;
return false;
}
/* numrecs verification */
sblock_ok = sblock_ok &&
be16_to_cpu(block->bb_numrecs) <= mp->m_alloc_mxr[level != 0];
if (be16_to_cpu(block->bb_numrecs) > mp->m_alloc_mxr[level != 0])
return false;
/* sibling pointer verification */
sblock_ok = sblock_ok &&
(block->bb_u.s.bb_leftsib == cpu_to_be32(NULLAGBLOCK) ||
be32_to_cpu(block->bb_u.s.bb_leftsib) < mp->m_sb.sb_agblocks) &&
block->bb_u.s.bb_leftsib &&
(block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK) ||
be32_to_cpu(block->bb_u.s.bb_rightsib) < mp->m_sb.sb_agblocks) &&
block->bb_u.s.bb_rightsib;
if (!sblock_ok) {
trace_xfs_btree_corrupt(bp, _RET_IP_);
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, block);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
if (!block->bb_u.s.bb_leftsib ||
(be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
return false;
if (!block->bb_u.s.bb_rightsib ||
(be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
return false;
return true;
}
static void
xfs_allocbt_read_verify(
struct xfs_buf *bp)
{
xfs_allocbt_verify(bp);
if (!(xfs_btree_sblock_verify_crc(bp) &&
xfs_allocbt_verify(bp))) {
trace_xfs_btree_corrupt(bp, _RET_IP_);
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
bp->b_target->bt_mount, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
}
static void
xfs_allocbt_write_verify(
struct xfs_buf *bp)
{
xfs_allocbt_verify(bp);
if (!xfs_allocbt_verify(bp)) {
trace_xfs_btree_corrupt(bp, _RET_IP_);
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW,
bp->b_target->bt_mount, bp->b_addr);
xfs_buf_ioerror(bp, EFSCORRUPTED);
}
xfs_btree_sblock_calc_crc(bp);
}
const struct xfs_buf_ops xfs_allocbt_buf_ops = {
......@@ -444,6 +482,9 @@ xfs_allocbt_init_cursor(
cur->bc_private.a.agbp = agbp;
cur->bc_private.a.agno = agno;
if (xfs_sb_version_hascrc(&mp->m_sb))
cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
return cur;
}
......
......@@ -31,8 +31,10 @@ struct xfs_mount;
* by blockcount and blockno. All blocks look the same to make the code
* simpler; if we have time later, we'll make the optimizations.
*/
#define XFS_ABTB_MAGIC 0x41425442 /* 'ABTB' for bno tree */
#define XFS_ABTC_MAGIC 0x41425443 /* 'ABTC' for cnt tree */
#define XFS_ABTB_MAGIC 0x41425442 /* 'ABTB' for bno tree */
#define XFS_ABTB_CRC_MAGIC 0x41423342 /* 'AB3B' */
#define XFS_ABTC_MAGIC 0x41425443 /* 'ABTC' for cnt tree */
#define XFS_ABTC_CRC_MAGIC 0x41423343 /* 'AB3C' */
/*
* Data record/key structure
......@@ -59,10 +61,10 @@ typedef __be32 xfs_alloc_ptr_t;
/*
* Btree block header size depends on a superblock flag.
*
* (not quite yet, but soon)
*/
#define XFS_ALLOC_BLOCK_LEN(mp) XFS_BTREE_SBLOCK_LEN
#define XFS_ALLOC_BLOCK_LEN(mp) \
(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
XFS_BTREE_SBLOCK_CRC_LEN : XFS_BTREE_SBLOCK_LEN)
/*
* Record, key, and pointer address macros for btree blocks.
......
......@@ -953,13 +953,13 @@ xfs_vm_writepage(
unsigned offset_into_page = offset & (PAGE_CACHE_SIZE - 1);
/*
* Just skip the page if it is fully outside i_size, e.g. due
* to a truncate operation that is in progress.
* Skip the page if it is fully outside i_size, e.g. due to a
* truncate operation that is in progress. We must redirty the
* page so that reclaim stops reclaiming it. Otherwise
* xfs_vm_releasepage() is called on it and gets confused.
*/
if (page->index >= end_index + 1 || offset_into_page == 0) {
unlock_page(page);
return 0;
}
if (page->index >= end_index + 1 || offset_into_page == 0)
goto redirty;
/*
* The page straddles i_size. It must be zeroed out on each
......
此差异已折叠。
......@@ -140,7 +140,6 @@ typedef struct xfs_attr_list_context {
* Overall external interface routines.
*/
int xfs_attr_inactive(struct xfs_inode *dp);
int xfs_attr_rmtval_get(struct xfs_da_args *args);
int xfs_attr_list_int(struct xfs_attr_list_context *);
#endif /* __XFS_ATTR_H__ */
此差异已折叠。
/*
* Copyright (c) 2000,2002-2003,2005 Silicon Graphics, Inc.
* Copyright (c) 2013 Red Hat, Inc.
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
......@@ -89,7 +90,7 @@ typedef struct xfs_attr_leaf_hdr { /* constant-structure header block */
typedef struct xfs_attr_leaf_entry { /* sorted on key, not name */
__be32 hashval; /* hash value of name */
__be16 nameidx; /* index into buffer of name/value */
__be16 nameidx; /* index into buffer of name/value */
__u8 flags; /* LOCAL/ROOT/SECURE/INCOMPLETE flag */
__u8 pad2; /* unused pad byte */
} xfs_attr_leaf_entry_t;
......@@ -114,6 +115,54 @@ typedef struct xfs_attr_leafblock {
xfs_attr_leaf_name_remote_t valuelist; /* grows from bottom of buf */
} xfs_attr_leafblock_t;
/*
* CRC enabled leaf structures. Called "version 3" structures to match the
* version number of the directory and dablk structures for this feature, and
* attr2 is already taken by the variable inode attribute fork size feature.
*/
struct xfs_attr3_leaf_hdr {
struct xfs_da3_blkinfo info;
__be16 count;
__be16 usedbytes;
__be16 firstused;
__u8 holes;
__u8 pad1;
struct xfs_attr_leaf_map freemap[XFS_ATTR_LEAF_MAPSIZE];
};
#define XFS_ATTR3_LEAF_CRC_OFF (offsetof(struct xfs_attr3_leaf_hdr, info.crc))
struct xfs_attr3_leafblock {
struct xfs_attr3_leaf_hdr hdr;
struct xfs_attr_leaf_entry entries[1];
/*
* The rest of the block contains the following structures after the
* leaf entries, growing from the bottom up. The variables are never
* referenced, the locations accessed purely from helper functions.
*
* struct xfs_attr_leaf_name_local
* struct xfs_attr_leaf_name_remote
*/
};
/*
* incore, neutral version of the attribute leaf header
*/
struct xfs_attr3_icleaf_hdr {
__uint32_t forw;
__uint32_t back;
__uint16_t magic;
__uint16_t count;
__uint16_t usedbytes;
__uint16_t firstused;
__u8 holes;
struct {
__uint16_t base;
__uint16_t size;
} freemap[XFS_ATTR_LEAF_MAPSIZE];
};
/*
* Flags used in the leaf_entry[i].flags field.
* NOTE: the INCOMPLETE bit must not collide with the flags bits specified
......@@ -147,26 +196,43 @@ typedef struct xfs_attr_leafblock {
*/
#define XFS_ATTR_LEAF_NAME_ALIGN ((uint)sizeof(xfs_dablk_t))
static inline int
xfs_attr3_leaf_hdr_size(struct xfs_attr_leafblock *leafp)
{
if (leafp->hdr.info.magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC))
return sizeof(struct xfs_attr3_leaf_hdr);
return sizeof(struct xfs_attr_leaf_hdr);
}
static inline struct xfs_attr_leaf_entry *
xfs_attr3_leaf_entryp(xfs_attr_leafblock_t *leafp)
{
if (leafp->hdr.info.magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC))
return &((struct xfs_attr3_leafblock *)leafp)->entries[0];
return &leafp->entries[0];
}
/*
* Cast typed pointers for "local" and "remote" name/value structs.
*/
static inline xfs_attr_leaf_name_remote_t *
xfs_attr_leaf_name_remote(xfs_attr_leafblock_t *leafp, int idx)
static inline char *
xfs_attr3_leaf_name(xfs_attr_leafblock_t *leafp, int idx)
{
return (xfs_attr_leaf_name_remote_t *)
&((char *)leafp)[be16_to_cpu(leafp->entries[idx].nameidx)];
struct xfs_attr_leaf_entry *entries = xfs_attr3_leaf_entryp(leafp);
return &((char *)leafp)[be16_to_cpu(entries[idx].nameidx)];
}
static inline xfs_attr_leaf_name_local_t *
xfs_attr_leaf_name_local(xfs_attr_leafblock_t *leafp, int idx)
static inline xfs_attr_leaf_name_remote_t *
xfs_attr3_leaf_name_remote(xfs_attr_leafblock_t *leafp, int idx)
{
return (xfs_attr_leaf_name_local_t *)
&((char *)leafp)[be16_to_cpu(leafp->entries[idx].nameidx)];
return (xfs_attr_leaf_name_remote_t *)xfs_attr3_leaf_name(leafp, idx);
}
static inline char *xfs_attr_leaf_name(xfs_attr_leafblock_t *leafp, int idx)
static inline xfs_attr_leaf_name_local_t *
xfs_attr3_leaf_name_local(xfs_attr_leafblock_t *leafp, int idx)
{
return &((char *)leafp)[be16_to_cpu(leafp->entries[idx].nameidx)];
return (xfs_attr_leaf_name_local_t *)xfs_attr3_leaf_name(leafp, idx);
}
/*
......@@ -221,37 +287,37 @@ int xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes);
/*
* Internal routines when attribute fork size == XFS_LBSIZE(mp).
*/
int xfs_attr_leaf_to_node(struct xfs_da_args *args);
int xfs_attr_leaf_to_shortform(struct xfs_buf *bp,
int xfs_attr3_leaf_to_node(struct xfs_da_args *args);
int xfs_attr3_leaf_to_shortform(struct xfs_buf *bp,
struct xfs_da_args *args, int forkoff);
int xfs_attr_leaf_clearflag(struct xfs_da_args *args);
int xfs_attr_leaf_setflag(struct xfs_da_args *args);
int xfs_attr_leaf_flipflags(xfs_da_args_t *args);
int xfs_attr3_leaf_clearflag(struct xfs_da_args *args);
int xfs_attr3_leaf_setflag(struct xfs_da_args *args);
int xfs_attr3_leaf_flipflags(struct xfs_da_args *args);
/*
* Routines used for growing the Btree.
*/
int xfs_attr_leaf_split(struct xfs_da_state *state,
int xfs_attr3_leaf_split(struct xfs_da_state *state,
struct xfs_da_state_blk *oldblk,
struct xfs_da_state_blk *newblk);
int xfs_attr_leaf_lookup_int(struct xfs_buf *leaf,
int xfs_attr3_leaf_lookup_int(struct xfs_buf *leaf,
struct xfs_da_args *args);
int xfs_attr_leaf_getvalue(struct xfs_buf *bp, struct xfs_da_args *args);
int xfs_attr_leaf_add(struct xfs_buf *leaf_buffer,
int xfs_attr3_leaf_getvalue(struct xfs_buf *bp, struct xfs_da_args *args);
int xfs_attr3_leaf_add(struct xfs_buf *leaf_buffer,
struct xfs_da_args *args);
int xfs_attr_leaf_remove(struct xfs_buf *leaf_buffer,
int xfs_attr3_leaf_remove(struct xfs_buf *leaf_buffer,
struct xfs_da_args *args);
int xfs_attr_leaf_list_int(struct xfs_buf *bp,
int xfs_attr3_leaf_list_int(struct xfs_buf *bp,
struct xfs_attr_list_context *context);
/*
* Routines used for shrinking the Btree.
*/
int xfs_attr_leaf_toosmall(struct xfs_da_state *state, int *retval);
void xfs_attr_leaf_unbalance(struct xfs_da_state *state,
int xfs_attr3_leaf_toosmall(struct xfs_da_state *state, int *retval);
void xfs_attr3_leaf_unbalance(struct xfs_da_state *state,
struct xfs_da_state_blk *drop_blk,
struct xfs_da_state_blk *save_blk);
int xfs_attr_root_inactive(struct xfs_trans **trans, struct xfs_inode *dp);
int xfs_attr3_root_inactive(struct xfs_trans **trans, struct xfs_inode *dp);
/*
* Utility routines.
......@@ -261,10 +327,12 @@ int xfs_attr_leaf_order(struct xfs_buf *leaf1_bp,
struct xfs_buf *leaf2_bp);
int xfs_attr_leaf_newentsize(int namelen, int valuelen, int blocksize,
int *local);
int xfs_attr_leaf_read(struct xfs_trans *tp, struct xfs_inode *dp,
int xfs_attr3_leaf_read(struct xfs_trans *tp, struct xfs_inode *dp,
xfs_dablk_t bno, xfs_daddr_t mappedbno,
struct xfs_buf **bpp);
void xfs_attr3_leaf_hdr_from_disk(struct xfs_attr3_icleaf_hdr *to,
struct xfs_attr_leafblock *from);
extern const struct xfs_buf_ops xfs_attr_leaf_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
#endif /* __XFS_ATTR_LEAF_H__ */
此差异已折叠。
/*
* Copyright (c) 2013 Red Hat, Inc.
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef __XFS_ATTR_REMOTE_H__
#define __XFS_ATTR_REMOTE_H__
#define XFS_ATTR3_RMT_MAGIC 0x5841524d /* XARM */
struct xfs_attr3_rmt_hdr {
__be32 rm_magic;
__be32 rm_offset;
__be32 rm_bytes;
__be32 rm_crc;
uuid_t rm_uuid;
__be64 rm_owner;
__be64 rm_blkno;
__be64 rm_lsn;
};
#define XFS_ATTR3_RMT_CRC_OFF offsetof(struct xfs_attr3_rmt_hdr, rm_crc)
#define XFS_ATTR3_RMT_BUF_SPACE(mp, bufsize) \
((bufsize) - (xfs_sb_version_hascrc(&(mp)->m_sb) ? \
sizeof(struct xfs_attr3_rmt_hdr) : 0))
extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
int xfs_attr_rmtval_get(struct xfs_da_args *args);
int xfs_attr_rmtval_set(struct xfs_da_args *args);
int xfs_attr_rmtval_remove(struct xfs_da_args *args);
#endif /* __XFS_ATTR_REMOTE_H__ */
此差异已折叠。
此差异已折叠。
......@@ -18,7 +18,8 @@
#ifndef __XFS_BMAP_BTREE_H__
#define __XFS_BMAP_BTREE_H__
#define XFS_BMAP_MAGIC 0x424d4150 /* 'BMAP' */
#define XFS_BMAP_MAGIC 0x424d4150 /* 'BMAP' */
#define XFS_BMAP_CRC_MAGIC 0x424d4133 /* 'BMA3' */
struct xfs_btree_cur;
struct xfs_btree_block;
......@@ -136,10 +137,10 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
/*
* Btree block header size depends on a superblock flag.
*
* (not quite yet, but soon)
*/
#define XFS_BMBT_BLOCK_LEN(mp) XFS_BTREE_LBLOCK_LEN
#define XFS_BMBT_BLOCK_LEN(mp) \
(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN)
#define XFS_BMBT_REC_ADDR(mp, block, index) \
((xfs_bmbt_rec_t *) \
......@@ -186,12 +187,12 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
#define XFS_BMAP_BROOT_PTR_ADDR(mp, bb, i, sz) \
XFS_BMBT_PTR_ADDR(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0))
#define XFS_BMAP_BROOT_SPACE_CALC(nrecs) \
(int)(XFS_BTREE_LBLOCK_LEN + \
#define XFS_BMAP_BROOT_SPACE_CALC(mp, nrecs) \
(int)(XFS_BMBT_BLOCK_LEN(mp) + \
((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
#define XFS_BMAP_BROOT_SPACE(bb) \
(XFS_BMAP_BROOT_SPACE_CALC(be16_to_cpu((bb)->bb_numrecs)))
#define XFS_BMAP_BROOT_SPACE(mp, bb) \
(XFS_BMAP_BROOT_SPACE_CALC(mp, be16_to_cpu((bb)->bb_numrecs)))
#define XFS_BMDR_SPACE_CALC(nrecs) \
(int)(sizeof(xfs_bmdr_block_t) + \
((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t))))
......@@ -204,7 +205,7 @@ typedef __be64 xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
/*
* Prototypes for xfs_bmap.c to call.
*/
extern void xfs_bmdr_to_bmbt(struct xfs_mount *, xfs_bmdr_block_t *, int,
extern void xfs_bmdr_to_bmbt(struct xfs_inode *, xfs_bmdr_block_t *, int,
struct xfs_btree_block *, int);
extern void xfs_bmbt_get_all(xfs_bmbt_rec_host_t *r, xfs_bmbt_irec_t *s);
extern xfs_filblks_t xfs_bmbt_get_blockcount(xfs_bmbt_rec_host_t *r);
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -72,6 +72,7 @@
#include <linux/kthread.h>
#include <linux/freezer.h>
#include <linux/list_sort.h>
#include <linux/ratelimit.h>
#include <asm/page.h>
#include <asm/div64.h>
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册