提交 · 58c75cfb51393a52b45262394c1fa81514b4d9bd · openeuler / Kernel

20 1月, 2010 8 次提交

xfs: make compile warn about char sign mismatches again · 58c75cfb

由 Dave Chinner 提交于 1月 20, 2010

The -fno-unsigned-char directive has no effect anymore as the
XFs build is clean. However, the kernel build hides pointer sign
differences so turn that back on so that we can clean up all the
mismatches prior to a userspace code resync.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

58c75cfb

xfs: clean up sign warnings in dir2 code · 4a24cb71

由 Dave Chinner 提交于 1月 20, 2010

We are now consistently using unsigned char strings for names
so fix up the remaining warnings in the dir2 code to complete
the cleanup.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4a24cb71

xfs: convert attr to use unsigned names · a9273ca5

由 Dave Chinner 提交于 1月 20, 2010

To be consistent with the directory code, the attr code should use
unsigned names. Convert the names from the vfs at the highest level
to unsigned, and ænsure they are consistenly used as unsigned down
to disk.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a9273ca5

xfs: xfs_buf_iomove() doesn't care about signedness · b9c48649

由 Dave Chinner 提交于 1月 20, 2010

xfs_buf_iomove() uses xfs_caddr_t as it's parameter types, but it doesn't
care about the signedness of the variables as it is just copying the
data. Change the prototype to use void * so that we don't get sign
warnings at call sites.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

b9c48649

xfs: make xfs_dir_cilookup_result use unsigned char · a3380ae3

由 Dave Chinner 提交于 1月 20, 2010

For consistency with the result of the code.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a3380ae3

xfs: convert dirnameops to unsigned char names · 2bc75421

由 Dave Chinner 提交于 1月 20, 2010

To be consistent across the codebase, convert the dirnameops to pass
the directory names by unsigned char strings.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2bc75421

xfs: convert DM ops to use unsigned char names · 046ea753

由 Dave Chinner 提交于 1月 20, 2010

dmops uses a signed char for it's namespace event. To be consistent
with the rest of the code, convert them to unsigned char for the
namespace string.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

046ea753

xfs: directory names are unsigned · e2bcd936

由 Dave Chinner 提交于 1月 20, 2010

Convert the struct xfs_name to use unsigned chars for the name
strings to match both what is stored on disk (__uint8_t) and what
the VFS expects (unsigned char).
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

e2bcd936

16 1月, 2010 26 次提交

xfs: move more buffer helpers into xfs_buf.c · 4e23471a

由 Christoph Hellwig 提交于 1月 13, 2010

Move xfsbdstrat and xfs_bdstrat_cb from xfs_lrw.c and xfs_bioerror
and xfs_bioerror_relse from xfs_rw.c into xfs_buf.c.  This also
means xfs_bioerror and xfs_bioerror_relse can be marked static now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4e23471a

xfs: clean up xfs_bwrite · 64e0bc7d

由 Christoph Hellwig 提交于 1月 13, 2010

Fold XFS_bwrite into it's only caller, xfs_bwrite and move it into
xfs_buf.c instead of leaving it as a fairly large inline function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

64e0bc7d

xfs: clean up log buffer writes · 873ff550

由 Christoph Hellwig 提交于 1月 13, 2010

Don't bother using XFS_bwrite as it doesn't provide much code for
our use case.  Instead opencode it and fold xlog_bdstrat_cb into the
new xlog_bdstrat helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

873ff550

xfs: embed the pagb_list array in the perag structure · e57336ff

由 Dave Chinner 提交于 1月 11, 2010

Now that the perag structure is allocated memory rather than held in
an array, we don't need to have the busy extent array external to
the structure. Embed it into the perag structure to avoid needing an
extra allocation when setting up.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e57336ff

xfs: handle ENOMEM correctly during initialisation of perag structures · 8b26c582

由 Dave Chinner 提交于 1月 11, 2010

Add proper error handling in case an error occurs while initializing
new perag structures for a mount point.  The mount structure is
restored to its previous state by deleting and freeing any perag
structures added during the call.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

8b26c582

xfs: Kill filestreams cache flush · b657fc82

由 Dave Chinner 提交于 1月 11, 2010

The filestreams cache flush is not needed in the sync code as it
does not affect data writeback, and it is now not used by the growfs
code, either, so kill it.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b657fc82

xfs: Add trace points for per-ag refcount debugging. · 0fa800fb

由 Dave Chinner 提交于 1月 11, 2010

Uninline xfs_perag_{get,put} so that tracepoints can be inserted
into them to speed debugging of reference count problems.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0fa800fb

xfs: Reference count per-ag structures · aed3bb90

由 Dave Chinner 提交于 1月 11, 2010

Reference count the per-ag structures to ensure that we keep get/put
pairs balanced. Assert that the reference counts are zero at unmount
time to catch leaks. In future, reference counts will enable us to
safely remove perag structures by allowing us to detect when they
are no longer in use.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

aed3bb90

xfs: Replace per-ag array with a radix tree · 1c1c6ebc

由 Dave Chinner 提交于 1月 11, 2010

The use of an array for the per-ag structures requires reallocation
of the array when growing the filesystem. This requires locking
access to the array to avoid use after free situations, and the
locking is difficult to get right. To avoid needing to reallocate an
array, change the per-ag structures to an allocated object per ag
and index them using a tree structure.

The AGs are always densely indexed (hence the use of an array), but
the number supported is 2^32 and lookups tend to be random and hence
indexing needs to scale. A simple choice is a radix tree - it works
well with this sort of index. This change also removes another
large contiguous allocation from the mount/growfs path in XFS.

The growing process now needs to change to only initialise the new
AGs required for the extra space, and as such only needs to
exclusively lock the tree for inserts. The rest of the code only
needs to lock the tree while doing lookups, and hence this will
remove all the deadlocks that currently occur on the m_perag_lock as
it is now an innermost lock. The lock is also changed to a spinlock
from a read/write lock as the hold time is now extremely short.

To complete the picture, the per-ag structures will need to be
reference counted to ensure that we don't free/modify them while
they are still in use. This will be done in subsequent patch.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1c1c6ebc

xfs: convert remaining direct references to m_perag · 44b56e0a

由 Dave Chinner 提交于 1月 11, 2010

Convert the remaining direct lookups of the per ag structures to use
get/put accesses. Ensure that the loops across AGs and prior users
of the interface balance gets and puts correctly.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

44b56e0a

xfs: Convert filestreams code to use per-ag get/put routines · 4196ac08

由 Dave Chinner 提交于 1月 11, 2010

Use xfs_perag_get() and xfs_perag_put() in the filestreams code.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4196ac08

xfs: Don't directly reference m_perag in allocation code · a862e0fd

由 Dave Chinner 提交于 1月 11, 2010

Start abstracting the perag references so that the indexing of the
structures is not directly coded into all the places that uses the
perag structures. This will allow us to separate the use of the
perag structure and the way it is indexed and hence avoid the known
deadlocks related to growing a busy filesystem.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a862e0fd

xfs: rename xfs_get_perag · 5017e97d

由 Dave Chinner 提交于 1月 11, 2010

xfs_get_perag is really getting the perag that an inode belongs to
based on it's inode number. Convert the use of this function to just
get the perag from a provided ag number.  Use this new function to
obtain the per-ag structure when traversing the per AG inode trees
for sync and reclaim.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5017e97d

xfs: Don't wake xfsbufd when idle · c9c12971

由 Dave Chinner 提交于 1月 11, 2010

The xfsbufd wakes every xfsbufd_centisecs (once per second by
default) for each filesystem even when the filesystem is idle.  If
the xfsbufd has nothing to do, put it into a long term sleep and
only wake it up when there is work pending (i.e. dirty buffers to
flush soon). This will make laptop power misers happy.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c9c12971

xfs: Don't wake the aild once per second · 453eac8a

由 Dave Chinner 提交于 1月 11, 2010

Now that the AIL push algorithm is traversal safe, we don't need a
watchdog function in the xfsaild to catch pushes that fail to make
progress. Remove the watchdog timeout and make pushes purely driven
by demand. This will remove the once-per-second wakeup that is seen
when the filesystem is idle and make laptop power misers happy.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

453eac8a

xfs: Use list_heads for log recovery item lists · f0a76953

由 Dave Chinner 提交于 1月 11, 2010

Remove the roll-your-own linked list operations.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

f0a76953

xfs: make several more functions static · 5d77c0dc

由 Eric Sandeen 提交于 11月 19, 2009

Just minor housekeeping, a lot more functions can be trivially made
static; others could if we reordered things a bit...
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5d77c0dc

xfs: clean up inconsistent variable naming in xfs_swap_extent · 6bded0f3

由 Dave Chinner 提交于 1月 14, 2010

The swap extent ioctl passes in a target inode and a temporary inode
which are clearly named in the ioctl structure. The code then
assigns temp to target and vice versa, making it extremely difficult
to work out which inode is which later in the code.  Make this
consistent throughout the code.

Also make xfs_swap_extent static as there are no external users of
the function.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

6bded0f3

xfs: add tracing to xfs_swap_extents · 3a85cd96

由 Dave Chinner 提交于 1月 14, 2010

To be able to diagnose whether the swap extents function is
detecting compatible inode data fork configurations for swapping
extents, add tracing points to the code to allow us to see the
format of the inode forks before and after the swap.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

3a85cd96

xfs: xfs_swap_extents needs to handle dynamic fork offsets · e09f9860

由 Dave Chinner 提交于 1月 14, 2010

When swapping extents, we can corrupt inodes by swapping data forks
that are in incompatible formats.  This is caused by the two indoes
having different fork offsets due to the presence of an attribute
fork on an attr2 filesystem.  xfs_fsr tries to be smart about
setting the fork offset, but the trick it plays only works on attr1
(old fixed format attribute fork) filesystems.

Changing the way xfs_fsr sets up the attribute fork will prevent
this situation from ever occurring, so in the kernel code we can get
by with a preventative fix - check that the data fork in the
defragmented inode is in a format valid for the inode it is being
swapped into.  This will lead to files that will silently and
potentially repeatedly fail defragmentation, so issue a warning to
the log when this particular failure occurs to let us know that
xfs_fsr needs updating/fixing.

To help identify how to improve xfs_fsr to avoid this issue, add
trace points for the inodes being swapped so that we can determine
why the swap was rejected and to confirm that the code is making the
right decisions and modifications when swapping forks.

A further complication is even when the swap is allowed to proceed
when the fork offset is different between the two inodes then value
for the maximum number of extents the data fork can hold can be
wrong. Make sure these are also set correctly after the swap occurs.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e09f9860

xfs: fix missing error check in xfs_rtfree_range · 3daeb42c

由 Dave Chinner 提交于 1月 14, 2010

When xfs_rtfind_forw() returns an error, the block is returned
uninitialised.  xfs_rtfree_range() is not checking the error return,
so could be using an uninitialised block number for modifying bitmap
summary info.

The problem was found by gcc when compiling the *userspace* libxfs
code - it is an copy of the kernel code with the exact same bug.
gcc gives an uninitialised variable warning on the userspace code
but not on the kernel code. You gotta love the consistency (Mmmm,
slightly chewy today!).
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

3daeb42c

xfs: fix stale inode flush avoidance · 4b6a4688

由 Dave Chinner 提交于 1月 11, 2010

When reclaiming stale inodes, we need to guarantee that inodes are
unpinned before returning with a "clean" status. If we don't we can
reclaim inodes that are pinned, leading to use after free in the
transaction subsystem as transactions complete.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4b6a4688

xfs: Remove inode iolock held check during allocation · 126976c7

由 Dave Chinner 提交于 1月 10, 2010

lockdep complains about a the lock not being initialised as we do an
ASSERT based check that the lock is not held before we initialise it
to catch inodes freed with the lock held.

lockdep does this check for us in the lock initialisation code, so
remove the ASSERT to stop the lockdep warning.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

126976c7

xfs: reclaim all inodes by background tree walks · 57817c68

由 Dave Chinner 提交于 1月 10, 2010

We cannot do direct inode reclaim without taking the flush lock to
ensure that we do not reclaim an inode under IO. We check the inode
is clean before doing direct reclaim, but this is not good enough
because the inode flush code marks the inode clean once it has
copied the in-core dirty state to the backing buffer.

It is the flush lock that determines whether the inode is still
under IO, even though it is marked clean, and the inode is still
required at IO completion so we can't reclaim it even though it is
clean in core. Hence the requirement that we need to take the flush
lock even on clean inodes because this guarantees that the inode
writeback IO has completed and it is safe to reclaim the inode.

With delayed write inode flushing, we coul dend up waiting a long
time on the flush lock even for a clean inode. The background
reclaim already handles this efficiently, so avoid all the problems
by killing the direct reclaim path altogether.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

57817c68

xfs: Avoid inodes in reclaim when flushing from inode cache · 018027be

由 Dave Chinner 提交于 1月 10, 2010

The reclaim code will handle flushing of dirty inodes before reclaim
occurs, so avoid them when determining whether an inode is a
candidate for flushing to disk when walking the radix trees.  This
is based on a test patch from Christoph Hellwig.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

018027be

xfs: reclaim inodes under a write lock · c8e20be0

由 Dave Chinner 提交于 1月 10, 2010

Make the inode tree reclaim walk exclusive to avoid races with
concurrent sync walkers and lookups. This is a version of a patch
posted by Christoph Hellwig that avoids all the code duplication.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c8e20be0

13 1月, 2010 1 次提交

lib: Introduce generic list_sort function · 2c761270

由 Dave Chinner 提交于 1月 12, 2010

There are two copies of list_sort() in the tree already, one in the DRM
code, another in ubifs.  Now XFS needs this as well.  Create a generic
list_sort() function from the ubifs version and convert existing users
to it so we don't end up with yet another copy in the tree.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Acked-by: NDave Airlie <airlied@redhat.com>
Acked-by: NArtem Bityutskiy <dedekind@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c761270

12 1月, 2010 2 次提交

smaps: fix wrong rss count · 7f53a09e

由 Minchan Kim 提交于 1月 08, 2010

A long time ago we regarded zero page as file_rss and vm_normal_page
doesn't return NULL.

But now, we reinstated ZERO_PAGE and vm_normal_page's implementation can
return NULL in case of zero page.  Also we don't count it with file_rss
any more.

Then, RSS and PSS can't be matched.  For consistency, Let's ignore zero
page in smaps_pte_range.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: NMatt Mackall <mpm@selenic.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f53a09e

proc: partially revert "procfs: provide stack information for threads" · 1306d603

由 KOSAKI Motohiro 提交于 1月 08, 2010

Commit d899bf7b (procfs: provide stack information for threads) introduced
to show stack information in /proc/{pid}/status.  But it cause large
performance regression.  Unfortunately /proc/{pid}/status is used ps
command too and ps is one of most important component.  Because both to
take mmap_sem and page table walk are heavily operation.

If many process run, the ps performance is,

[before d899bf7b]

% perf stat ps >/dev/null

 Performance counter stats for 'ps':

     4090.435806  task-clock-msecs         #      0.032 CPUs
             229  context-switches         #      0.000 M/sec
               0  CPU-migrations           #      0.000 M/sec
             234  page-faults              #      0.000 M/sec
      8587565207  cycles                   #   2099.425 M/sec
      9866662403  instructions             #      1.149 IPC
      3789415411  cache-references         #    926.409 M/sec
        30419509  cache-misses             #      7.437 M/sec

   128.859521955  seconds time elapsed

[after d899bf7b]

% perf stat  ps  > /dev/null

 Performance counter stats for 'ps':

     4305.081146  task-clock-msecs         #      0.028 CPUs
             480  context-switches         #      0.000 M/sec
               2  CPU-migrations           #      0.000 M/sec
             237  page-faults              #      0.000 M/sec
      9021211334  cycles                   #   2095.480 M/sec
     10605887536  instructions             #      1.176 IPC
      3612650999  cache-references         #    839.160 M/sec
        23917502  cache-misses             #      5.556 M/sec

   152.277819582  seconds time elapsed

Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
provide almost same information. we can use it.

Commit d899bf7b introduced two features:

 1) Add the annotattion of [thread stack: xxxx] mark to
    /proc/{pid}/task/{tid}/maps.
 2) Add StackUsage field to /proc/{pid}/status.

I only revert (2), because I haven't seen (1) cause regression.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1306d603

11 1月, 2010 3 次提交

quota: Fix dquot_transfer for filesystems different from ext4 · 05b5d898

由 Jan Kara 提交于 1月 06, 2010

Commit fd8fbfc1 modified the way we find amount of reserved space
belonging to an inode. The amount of reserved space is checked
from dquot_transfer and thus inode_reserved_space gets called
even for filesystems that don't provide get_reserved_space callback
which results in a BUG.

Fix the problem by checking get_reserved_space callback and return 0 if
the filesystem does not provide it.

CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

05b5d898

GFS2: Use MAX_LFS_FILESIZE for meta inode size · ba198098

由 Steven Whitehouse 提交于 1月 08, 2010

Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so
use MAX_LFS_FILESIZE instead.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ba198098

xfs: Ensure we force all busy extents in range to disk · fd45e478

由 Dave Chinner 提交于 1月 02, 2010

When we search for and find a busy extent during allocation we
force the log out to ensure the extent free transaction is on
disk before the allocation transaction. The current implementation
has a subtle bug in it--it does not handle multiple overlapping
ranges.

That is, if we free lots of little extents into a single
contiguous extent, then allocate the contiguous extent, the busy
search code stops searching at the first extent it finds that
overlaps the allocated range. It then uses the commit LSN of the
transaction to force the log out to.

Unfortunately, the other busy ranges might have more recent
commit LSNs than the first busy extent that is found, and this
results in xfs_alloc_search_busy() returning before all the
extent free transactions are on disk for the range being
allocated. This can lead to potential metadata corruption or
stale data exposure after a crash because log replay won't replay
all the extent free transactions that cover the allocation range.
Modified-by: NAlex Elder <aelder@sgi.com>

(Dropped the "found" argument from the xfs_alloc_busysearch trace
event.)
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

fd45e478

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功