提交 · 84e1e99f112dead8f9ba036c02d24a9f5ce7f544 · openeuler / raspberrypi-kernel

14 7月, 2007 19 次提交

[XFS] Prevent ENOSPC from aborting transactions that need to succeed · 84e1e99f

由 David Chinner 提交于 6月 18, 2007

During delayed allocation extent conversion or unwritten extent
conversion, we need to reserve some blocks for transactions reservations.
We need to reserve these blocks in case a btree split occurs and we need
to allocate some blocks.

Unfortunately, we've only ever reserved the number of data blocks we are
allocating, so in both the unwritten and delalloc case we can get ENOSPC
to the transaction reservation. This is bad because in both cases we
cannot report the failure to the writing application.

The fix is two-fold:

1 - leverage the reserved block infrastructure XFS already
has to reserve a small pool of blocks by default to allow
specially marked transactions to dip into when we are at
ENOSPC.
Default setting is min(5%, 1024 blocks).

2 - convert critical transaction reservations to be allowed
to dip into this pool. Spots changed are delalloc
conversion, unwritten extent conversion and growing a
filesystem at ENOSPC.
This also allows growing the filesytsem to succeed at ENOSPC.

SGI-PV: 964468
SGI-Modid: xfs-linux-melb:xfs-kern:28865a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

84e1e99f

[XFS] Prevent deadlock when flushing inodes on unmount · 641c56fb

由 David Chinner 提交于 6月 18, 2007

When we are unmounting the filesystem, we flush all the inodes to disk.
Unfortunately, if we have an inode cluster that has just been freed and
marked stale sitting in an incore log buffer (i.e. hasn't been flushed to
disk), it will be holding all the flush locks on the inodes in that
cluster.

xfs_iflush_all() which is called during unmount walks all the inodes
trying to reclaim them, and it doing so calls xfs_finish_reclaim() on each
inode. If the inode is dirty, if grabs the flush lock and flushes it.
Unfortunately, find dirty inodes that already have their flush lock held
and so we sleep.

At this point in the unmount process, we are running single-threaded.
There is nothing more that can push on the log to force the transaction
holding the inode flush locks to disk and hence we deadlock.

The fix is to issue a log force before flushing the inodes on unmount so
that all the flush locks will be released before we start flushing the
inodes.

SGI-PV: 964538
SGI-Modid: xfs-linux-melb:xfs-kern:28862a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

641c56fb

[XFS] Log the agf_length change in xfs_growfs_data_private(). · 0164af51

由 Tim Shimmin 提交于 6月 18, 2007

SGI-PV: 963528
SGI-Modid: xfs-linux-melb:xfs-kern:28856a
Signed-off-by: NTim Shimmin <tes@sgi.com>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>

0164af51

[XFS] Map unwritten extents correctly for I/o completion processing · effd120e

由 David Chinner 提交于 6月 18, 2007

If we have multiple unwritten extents within a single page, we fail to
tell the I/o completion construction handlers we need a new handle for the
second and subsequent blocks in the page. While we still issue the I/O
correctly, we do not have the correct ranges recorded in the ioend
structures and hence when we go to convert the unwritten extents we screw
it up.

Make sure we start a new ioend every time the mapping changes so that we
convert the correct ranges on I/O completion.

SGI-PV: 964647
SGI-Modid: xfs-linux-melb:xfs-kern:28797a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

effd120e

[XFS] Apply transaction delta counts atomically to incore counters · 45c34141

由 David Chinner 提交于 6月 18, 2007

With the per-cpu superblock counters, batch updates are no longer atomic
across the entire batch of changes. This is not an issue if each
individual change in the batch is applied atomically. Unfortunately, free
block count changes are not applied atomically, and they are applied in a
manner guaranteed to cause problems.

Essentially, the free block count reservation that the transaction took
initially is returned to the in core counters before a second delta takes
away what is used. because these two operations are not atomic, we can
race with another thread that can use the returned transaction reservation
before the transaction takes the space away again and we can then get
ENOSPC being reported in a spot where we don't have an ENOSPC condition,
nor should we ever see one there.

Fix it up by rolling the two deltas into the one so it can be applied
safely (i.e. atomically) to the incore counters.

SGI-PV: 964465
SGI-Modid: xfs-linux-melb:xfs-kern:28796a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

45c34141

[XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize(). · b2826136

由 David Chinner 提交于 6月 05, 2007

SGI-PV: 965636
SGI-Modid: xfs-linux-melb:xfs-kern:28777a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NOlaf Weber <olaf@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

b2826136

[XFS] Block on unwritten extent conversion during synchronous direct I/O. · e927af90

由 David Chinner 提交于 6月 05, 2007

Currently we do not wait on extent conversion to occur, and hence we can
return to userspace from a synchronous direct I/O write without having
completed all the actions in the write. Hence a read after the write may
see zeroes (unwritten extent) rather than the data that was written.

Block the I/O completion by triggering a synchronous workqueue flush to
ensure that the conversion has occurred before we return to userspace.

SGI-PV: 964092
SGI-Modid: xfs-linux-melb:xfs-kern:28775a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

e927af90

[XFS] Flush the block device before closing it on unmount. · f4a9f28a

由 David Chinner 提交于 6月 05, 2007

SGI-PV: 965630
SGI-Modid: xfs-linux-melb:xfs-kern:28774a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

f4a9f28a

[XFS] xfs_bmapi fails to update the previous extent pointer · 4e5ae838

由 David Chinner 提交于 6月 05, 2007

When processing multiple extent maps, xfs_bmapi needs to keep track of the
extent behind the one it is currently working on to be able to trim extent
ranges correctly. Failing to update the previous pointer can result in
corrupted extent lists in memory and this will result in panics or assert
failures.

Update the previous pointer correctly when we move to the next extent to
process.

SGI-PV: 965631
SGI-Modid: xfs-linux-melb:xfs-kern:28773a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NVlad Apostolov <vapo@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

4e5ae838

[XFS] Fix the transaction flags to make lazy superblock counters work. · 210c6f1c

由 David Chinner 提交于 5月 24, 2007

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28653a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

210c6f1c

[XFS] Lazy Superblock Counters · 92821e2b

由 David Chinner 提交于 5月 24, 2007

When we have a couple of hundred transactions on the fly at once, they all
typically modify the on disk superblock in some way.
create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
free block counts.

When these counts are modified in a transaction, they must eventually lock
the superblock buffer and apply the mods. The buffer then remains locked
until the transaction is committed into the incore log buffer. The result
of this is that with enough transactions on the fly the incore superblock
buffer becomes a bottleneck.

The result of contention on the incore superblock buffer is that
transaction rates fall - the more pressure that is put on the superblock
buffer, the slower things go.

The key to removing the contention is to not require the superblock fields
in question to be locked. We do that by not marking the superblock dirty
in the transaction. IOWs, we modify the incore superblock but do not
modify the cached superblock buffer. In short, we do not log superblock
modifications to critical fields in the superblock on every transaction.
In fact we only do it just before we write the superblock to disk every
sync period or just before unmount.

This creates an interesting problem - if we don't log or write out the
fields in every transaction, then how do the values get recovered after a
crash? the answer is simple - we keep enough duplicate, logged information
in other structures that we can reconstruct the correct count after log
recovery has been performed.

It is the AGF and AGI structures that contain the duplicate information;
after recovery, we walk every AGI and AGF and sum their individual
counters to get the correct value, and we do a transaction into the log to
correct them. An optimisation of this is that if we have a clean unmount
record, we know the value in the superblock is correct, so we can avoid
the summation walk under normal conditions and so mount/recovery times do
not change under normal operation.

One wrinkle that was discovered during development was that the blocks
used in the freespace btrees are never accounted for in the AGF counters.
This was once a valid optimisation to make; when the filesystem is full,
the free space btrees are empty and consume no space. Hence when it
matters, the "accounting" is correct. But that means the when we do the
AGF summations, we would not have a correct count and xfs_check would
complain. Hence a new counter was added to track the number of blocks used
by the free space btrees. This is an *on-disk format change*.

As a result of this, lazy superblock counters are a mkfs option and at the
moment on linux there is no way to convert an old filesystem. This is
possible - xfs_db can be used to twiddle the right bits and then
xfs_repair will do the format conversion for you. Similarly, you can
convert backwards as well. At some point we'll add functionality to
xfs_admin to do the bit twiddling easily....

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28652a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

92821e2b

[XFS] Use generic shrinker interfaces in XFS. · 3260f78a

由 Andrew Morton 提交于 5月 24, 2007

SGI-PV: 964986
SGI-Modid: xfs-linux-melb:xfs-kern:28642a
Signed-Off-By: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

3260f78a

[XFS] Make hole punching at EOF atomic. · 92dfe8d2

由 David Chinner 提交于 5月 24, 2007

If hole punching at EOF is done as two steps (i.e. truncate then extend)
the file is in a transient state between the two steps where an
application can see the incorrect file size. Punching a hole to EOF needs
to be treated in teh same way as all other hole punching cases so that the
file size is never seen to change.

SGI-PV: 962012
SGI-Modid: xfs-linux-melb:xfs-kern:28641a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NVlad Apostolov <vapo@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

92dfe8d2

[XFS] Fix vmalloc leak on mount/unmount. · 511105b3

由 David Chinner 提交于 5月 24, 2007

When setting the length of the iclogbuf to write out we should just be
changing the desired byte count rather completely reassociating the buffer
memory with the buffer. Reassociating the buffer memory changes the
apparent length of the buffer and hence when we free the buffer, we don't
free all the vmap()d space we originally allocated.

SGI-PV: 964983
SGI-Modid: xfs-linux-melb:xfs-kern:28640a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

511105b3

[XFS] Fix double free in xfs_buf_get_noaddr error handling path · ca165b88

由 Christoph Hellwig 提交于 5月 24, 2007

SGI-PV: 964983
SGI-Modid: xfs-linux-melb:xfs-kern:28639a
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

ca165b88

[XFS] Fix use-after-free during log unmount. · 3db296f3

由 David Chinner 提交于 5月 14, 2007

Don't reference the log buffer after running the callbacks as the callback
can trigger the log buffers to be freed during unmount.

SGI-PV: 964545
SGI-Modid: xfs-linux-melb:xfs-kern:28567a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

3db296f3

[XFS] Sleeping with the ilock waiting for I/O completion is Bad. · 40095b64

由 David Chinner 提交于 5月 14, 2007

Recent fixes to the filesystem freezing code introduced a vn_iowait call
in the middle of the sync code. Unfortunately, at the point where this
call was added we are holding the ilock. The ilock is needed by I/O
completion for unwritten extent conversion and now updating the file size.
Hence I/o cannot complete if we hold the ilock while waiting for I/O
completion.

Fix up the bug and clean the code up around it.

SGI-PV: 963674
SGI-Modid: xfs-linux-melb:xfs-kern:28566a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

40095b64

[XFS] Don't grow filesystems past the size they can index. · 4cc929ee

由 Nathan Scott 提交于 5月 14, 2007

When growing a filesystem we don't check to see if the new size overflows
the page cache index range, so we can do silly things like grow a
filesystem page 16TB on a 32bit. Check new filesystem sizes against the
limits the kernel can support.

SGI-PV: 957886
SGI-Modid: xfs-linux-melb:xfs-kern:28563a
Signed-Off-By: NNathan Scott <nscott@aconex.com>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

4cc929ee

[XFS] Only use refcounted pages for I/O · 1fa40b01

由 Christoph Hellwig 提交于 5月 14, 2007

Many block drivers (aoe, iscsi) really want refcountable pages in bios,
which is what almost everyone send down. XFS unfortunately has a few
places where it sends down buffers that may come from kmalloc, which
breaks them.

Fix the places that use kmalloc()d buffers.

SGI-PV: 964546
SGI-Modid: xfs-linux-melb:xfs-kern:28562a
Signed-Off-By: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

1fa40b01

11 7月, 2007 1 次提交

Make common helpers for seq_files that work with list_heads · bcf67e16

由 Pavel Emelianov 提交于 7月 10, 2007

Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.

This makes code about 300 lines smaller:

The first version of this patch made the helper functions static inline
in the seq_file.h header. This patch moves them to the fs/seq_file.c as
Andrew proposed. The vmlinux .text section sizes are as follows:

2.6.22-rc1-mm1:              0x001794d5
with the previous version:   0x00179505
with this patch:             0x00179135

The config file used was make allnoconfig with the "y" inclusion of all
the possible options to make the files modified by the patch compile plus
drivers I have on the test node.

This patch:

Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
Signed-off-by: NPavel Emelianov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bcf67e16

10 7月, 2007 20 次提交

[GFS2] Accept old format NFS filehandles · 3ebf4490

由 Steven Whitehouse 提交于 7月 10, 2007

On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote:
> > -#define GFS2_LARGE_FH_SIZE 10
> > -
> > -struct gfs2_fh_obj {
> > -   struct gfs2_inum_host this;
> > -   u32 imode;
> > -};
> > +#define GFS2_LARGE_FH_SIZE 8
>
> Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE
> or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older
> gfs version anymore.  Stale filehandles because of a new kernel version
> are a big no-no, so please add back code to handle the old filehandles
> on the decode side.
>

This should fix that problem I think since its only relating to end of
the fh we can just ignore that field in order to accept the older
format.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wendy Cheng <wcheng@redhat.com>

3ebf4490

[S390] fixed cdl-format detection. · bf1a95a2

由 Stefan Haberland 提交于 7月 10, 2007

CDL formated DASDs are now detected correctly even if no VOL1 label is
on the disk. This prevents possible loss of data.
Signed-off-by: NStefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bf1a95a2

pipe: add documentation and comments · 0845718d

由 Jens Axboe 提交于 6月 12, 2007

As per Andrew Mortons request, here's a set of documentation for
the generic pipe_buf_operations hooks, the pipe, and pipe_buffer
structures.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0845718d

pipe: change the ->pin() operation to ->confirm() · cac36bb0

由 Jens Axboe 提交于 6月 14, 2007

The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cac36bb0

Remove remnants of sendfile() · d96e6e71

由 Jens Axboe 提交于 6月 11, 2007

There are now zero users of .sendfile() in the kernel, so kill
it from the file_operations structure and in do_sendfile().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d96e6e71

xip sendfile removal · d054fe3d

由 Carsten Otte 提交于 6月 15, 2007

This patch removes xip_file_sendfile, the sendfile implementation for
xip without replacement. Those customers that use xip on s390 are not
using sendfile() as far as we know, and so far s390 is the only platform
this could potentially be used on so far.
Having sendfile is not a popular feature for execute in place file
systems, however we have a working implementation of splice_read() based
on fs/splice.c if anyone asks for it.
At this point in time, it does not seem preferable to merge
splice_read() for xip because it causes extra maintenence effort due to
code duplication and it requires struct page behind the xip memory
segment. We'd like to get rid of that in favor of supporting flash based
embedded platforms (Monta Vista work) soon.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d054fe3d

splice: completely document external interface with kerneldoc · 932cc6d4

由 Jens Axboe 提交于 6月 21, 2007

Also add fs/splice.c as a kerneldoc target with a smaller blurb that
should be expanded to better explain the overview of splice.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

932cc6d4

sendfile: remove bad_sendfile() from bad_file_ops · d6f51756

由 Jens Axboe 提交于 6月 04, 2007

do_sendfile() prefers splice over sendfile, so it should not trigger
(directly, at least).
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d6f51756

pipe: allow passing around of ops private pointer · 497f9625

由 Jens Axboe 提交于 6月 11, 2007

relay needs this for proper consumption handling, and the network
receive support needs it as well to lookup the sk_buff on pipe
release.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

497f9625

splice: divorce the splice structure/function definitions from the pipe header · d6b29d7c

由 Jens Axboe 提交于 6月 04, 2007

We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d6b29d7c

J
sendfile: convert nfsd to splice_direct_to_actor() · cf8208d0
由 Jens Axboe 提交于 6月 12, 2007
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
cf8208d0

sendfile: convert nfs to using splice_read() · f0930fff

由 Jens Axboe 提交于 6月 01, 2007

Acked-by: NTrond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f0930fff

sendfile: remove .sendfile from filesystems that use generic_file_sendfile() · 5ffc4ef4

由 Jens Axboe 提交于 6月 01, 2007

They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5ffc4ef4

sys_sendfile: switch to using ->splice_read, if available · 534f2aaa

由 Jens Axboe 提交于 6月 01, 2007

This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

534f2aaa

vmsplice: add vmsplice-to-user support · 6a14b90b

由 Jens Axboe 提交于 6月 14, 2007

A bit of a cheat, it actually just copies the data to userspace. But
this makes the interface nice and symmetric and enables people to build
on splice, with room for future improvement in performance.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6a14b90b

splice: abstract out actor data · c66ab6fa

由 Jens Axboe 提交于 6月 12, 2007

For direct splicing (or private splicing), the output may not be a file.
So abstract out the handling into a specified actor function and put
the data in the splice_desc structure earlier, so we can build on top
of that.

This is the first step in better splice handling for drivers, and also
for implementing vmsplice _to_ user memory.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c66ab6fa

unexport bio_{,un}map_user · 72d3a38e

由 Adrian Bunk 提交于 7月 09, 2007

bio_{,un}map_user no longer have any modular users.
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

72d3a38e

sched: scheduler debugging, core · 43ae34cb

由 Ingo Molnar 提交于 7月 09, 2007

scheduler debugging core: implement /proc/sched_debug and
/proc/<PID>/sched files for scheduler debugging.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

43ae34cb

B
sched: update delay-accounting to use CFS's precise stats · 172ba844
由 Balbir Singh 提交于 7月 09, 2007
```
update delay-accounting to use CFS's precise stats.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
172ba844

sched: make use of precise accounting for /proc task stats · b27f03d4

由 Ingo Molnar 提交于 7月 09, 2007

make use of CFS's precise accounting to drive /proc/<pid>/stat statistics.

this code was co-authored by:

 Balbir Singh <balbir@linux.vnet.ibm.com>
 Dmitry Adamushko <dmitry.adamushko@gmail.com>
 Ingo Molnar <mingo@elte.hu>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>

b27f03d4