提交 · 2f6f7b3d9b5600e1f6e7622c62ab30f36bd0f57f · openanolis / cloud-kernel

16 10月, 2007 2 次提交

[XFS] kill v_vfsp member from struct bhv_vnode · 2f6f7b3d

由 Christoph Hellwig 提交于 8月 29, 2007

We can easily get at the vfsp through the super_block but it will soon be
gone anyway.

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29494a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

2f6f7b3d

[XFS] call common xfs vnode-level helpers directly and remove vnode operations · 739bfb2a

由 Christoph Hellwig 提交于 8月 29, 2007

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29493a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

739bfb2a

15 10月, 2007 10 次提交

[XFS] decontaminate vnode operations from behavior details · 993386c1

由 Christoph Hellwig 提交于 8月 28, 2007

All vnode ops now take struct xfs_inode pointers and the behaviour related
glue is split out into methods of it's own. This required fixing
xfs_create/mkdir/symlink to not mess with the inode pointer but rather use
a separate boolean for error handling. Thanks to Dave Chinner for that
fix.

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29492a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

993386c1

[XFS] Radix tree based inode caching · da353b0d

由 David Chinner 提交于 8月 28, 2007

One of the perpetual scaling problems XFS has is indexing it's incore
inodes. We currently uses hashes and the default hash sizes chosen can
only ever be a tradeoff between memory consumption and the maximum
realistic size of the cache.

As a result, anyone who has millions of inodes cached on a filesystem
needs to tunes the size of the cache via the ihashsize mount option to
allow decent scalability with inode cache operations.

A further problem is the separate inode cluster hash, whose size is based
on the ihashsize but is smaller, and so under certain conditions (sparse
cluster cache population) this can become a limitation long before the
inode hash is causing issues.

The following patchset removes the inode hash and cluster hash and
replaces them with radix trees to avoid the scalability limitations of the
hashes. It also reduces the size of the inodes by 3 pointers....

SGI-PV: 969561
SGI-Modid: xfs-linux-melb:xfs-kern:29481a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

da353b0d

[XFS] kill move.[ch] · 39cd9f87

由 Christoph Hellwig 提交于 8月 28, 2007

Kill uio related functions and defines now that they're unused.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29480a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

39cd9f87

[XFS] stop using uio in the readlink code · 804c83c3

由 Christoph Hellwig 提交于 8月 28, 2007

Simplify the readlink code to get rid of the last user of uio.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29479a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

804c83c3

[XFS] use filldir internally · 051e7cd4

由 Christoph Hellwig 提交于 8月 28, 2007

Currently xfs has a rather complicated internal scheme to allow for
different directory formats in IRIX. This patch rips all code related to
this out and pushes useage of the Linux filldir callback into the lowlevel
directory code. This does not make the code any less portable because
filldir can be used to create dirents of all possible variations
(including the IRIX ones as proved by the IRIX binary emulation code under
arch/mips/).

This patch get rid of an unessecary copy in the readdir path, about 400
lines of code and one of the last two users of the uio structure.

This version is updated to deal with dmapi aswell which greatly simplifies
the get_dirattrs code. The dmapi part has been tested using the
get_dirattrs tools from the xfstest dmapi suite1 with various small and
large directories.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29478a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

051e7cd4

[XFS] remove unessecary vfs argument to DM_EVENT_ENABLED · eb9df39d

由 Christoph Hellwig 提交于 8月 16, 2007

SGI-PV: 968690
SGI-Modid: xfs-linux-melb:xfs-kern:29340a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NVlad Apostolov <vapo@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

eb9df39d

[XFS] move linux/log2.h header to xfs_linux.h · af3a2e8a

由 Eric Sandeen 提交于 8月 16, 2007

Generally we try not to directly include linux header files in core xfs
code; xfs_linux.h is the spot for that.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29326a
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

af3a2e8a

[XFS] Remove xfs_physmem · 6385f4d5

由 Eric Sandeen 提交于 8月 16, 2007

Now that nobody's using it, remove xfs_physmem & friends.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29325a
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

6385f4d5

[XFS] Remove m_nreadaheads · 40906630

由 Eric Sandeen 提交于 8月 16, 2007

m_nreadaheads in the mount struct is never used; remove it and the various
macros assigned to it. Also remove a couple other unused macros in the
same areas.

Removes one user of xfs_physmem.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29322a
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

40906630

[XFS] Barriers need to be dynamically checked and switched off · 0bfefc46

由 David Chinner 提交于 5月 14, 2007

If the underlying block device suddenly stops supporting barriers, we need
to handle the -EOPNOTSUPP error in a sane manner rather than shutting
down the filesystem. If we get this error, clear the barrier flag, reissue
the I/O, and tell the world bad things are occurring.

SGI-PV: 964544
SGI-Modid: xfs-linux-melb:xfs-kern:28568a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

0bfefc46

18 9月, 2007 1 次提交

[XFS] Ensure file size updates have been completed before writing inode to disk. · 776a75fa

由 Lachlan McIlroy 提交于 9月 14, 2007

SGI-PV: 968767
SGI-Modid: xfs-linux-melb:xfs-kern:29675a
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

776a75fa

05 9月, 2007 3 次提交

[XFS] fix sparse shadowed variable warnings · 265c1fac

由 Christoph Hellwig 提交于 8月 16, 2007

- in xfs_probe_cluster rename the inner len to pg_len. There's no harm
  here because the outer len isn't used after the inner len comes into
  existence but it keeps the code clean.
- in xfs_da_do_buf remove the inner i because they don't overlap
  and they are both the same type.

SGI-PV: 968555
SGI-Modid: xfs-linux-melb:xfs-kern:29311a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

265c1fac

[XFS] Fix sparse warning in kmem_shake_allow · 34521c5e

由 Christoph Hellwig 提交于 8月 16, 2007

We can't return a masked result of a __bitwise type. Compare it to 0 first
to keep the behaviour without the warning.

SGI-PV: 968555
SGI-Modid: xfs-linux-melb:xfs-kern:29309a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

34521c5e

[XFS] Set filestreams object timeout to something sane. · 8da22d7a

由 David Chinner 提交于 8月 16, 2007

SGI-PV: 968554
SGI-Modid: xfs-linux-melb:xfs-kern:29303a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

8da22d7a

27 7月, 2007 1 次提交

xfs ioctl __user annotations · ad690ef9

由 Al Viro 提交于 7月 26, 2007

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad690ef9

20 7月, 2007 4 次提交

mm: Remove slab destructors from kmem_cache_create(). · 20c2df83

由 Paul Mundt 提交于 7月 20, 2007

Slab destructors were no longer supported after Christoph's
c59def9f change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

20c2df83

mm: fault feedback · d0217ac0

由 Nick Piggin 提交于 7月 19, 2007

Change ->fault prototype.  We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
 FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things.  This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0217ac0

mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821

由 Nick Piggin 提交于 7月 19, 2007

Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping.  The hitch here
is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
 But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn.  Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache.  Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

54cb8821

mm: fix fault vs invalidate race for linear mappings · d00806b1

由 Nick Piggin 提交于 7月 19, 2007

Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache. Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.

The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size. do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size). The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data. Valid mappings to the same
place will see a different page.

Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit. He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight. However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit. Scalability is not an issue.

This patch implements this latter approach. ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so). do_no_page only unlocks the page after setting up the mapping
completely. invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).

This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d00806b1

19 7月, 2007 1 次提交

[XFS] Implement ->page_mkwrite in XFS. · 4f57dbc6

由 David Chinner 提交于 7月 19, 2007

Hook XFS up to ->page_mkwrite to ensure that we know about mmap pages
being written to. This allows use to do correct delayed allocation and
ENOSPC checking as well as remap unwritten extents so that they get
converted correctly during writeback. This is done via the generic
block_page_mkwrite code.

SGI-PV: 940392
SGI-Modid: xfs-linux-melb:xfs-kern:29149a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

4f57dbc6

18 7月, 2007 3 次提交

knfsd: exportfs: add exportfs.h header · a5694255

由 Christoph Hellwig 提交于 7月 17, 2007

currently the export_operation structure and helpers related to it are in
fs.h.  fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5694255

Freezer: make kernel threads nonfreezable by default · 83144186

由 Rafael J. Wysocki 提交于 7月 17, 2007

Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NNigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83144186

mm: clean up and kernelify shrinker registration · 8e1f936b

由 Rusty Russell 提交于 7月 17, 2007

I can never remember what the function to register to receive VM pressure
is called.  I have to trace down from __alloc_pages() to find it.

It's called "set_shrinker()", and it needs Your Help.

1) Don't hide struct shrinker.  It contains no magic.
2) Don't allocate "struct shrinker".  It's not helpful.
3) Call them "register_shrinker" and "unregister_shrinker".
4) Call the function "shrink" not "shrinker".
5) Reduce the 17 lines of waffly comments to 13, but document it properly.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: David Chinner <dgc@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e1f936b

14 7月, 2007 14 次提交

[XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode · faa63e95

由 Michal Marek 提交于 7月 11, 2007

* 32bit struct xfs_fsop_bulkreq has different size and layout of
members, no matter the alignment. Move the code out of the #else
branch (why was it there in the first place?). Define _32 variants of
the ioctl constants.
* 32bit struct xfs_bstat is different because of time_t and on
i386 because of different padding. Make xfs_bulkstat_one() accept a
custom "output formatter" in the private_data argument which takes care
of the xfs_bulkstat_one_compat() that takes care of the different
layout in the compat case.
* i386 struct xfs_inogrp has different padding.
Add a similar "output formatter" mecanism to xfs_inumbers().

SGI-PV: 967354
SGI-Modid: xfs-linux-melb:xfs-kern:29102a
Signed-off-by: NMichal Marek <mmarek@suse.cz>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

faa63e95

[XFS] Compat ioctl handler for handle operations · 1fa503df

由 Michal Marek 提交于 7月 11, 2007

32bit struct xfs_fsop_handlereq has different size and offsets (due to
pointers). TODO: case XFS_IOC_{FSSETDM,ATTRLIST,ATTRMULTI}_BY_HANDLE still
not handled.

SGI-PV: 967354
SGI-Modid: xfs-linux-melb:xfs-kern:29101a
Signed-off-by: NMichal Marek <mmarek@suse.cz>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

1fa503df

[XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1. · 547e00c3

由 Michal Marek 提交于 7月 11, 2007

i386 struct xfs_fsop_geom_v1 has no padding after the last member, so the
size is different.

SGI-PV: 967354
SGI-Modid: xfs-linux-melb:xfs-kern:29100a
Signed-off-by: NMichal Marek <mmarek@suse.cz>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

547e00c3

[XFS] Concurrent Multi-File Data Streams · 2a82b8be

由 David Chinner 提交于 7月 11, 2007

In media spaces, video is often stored in a frame-per-file format. When
dealing with uncompressed realtime HD video streams in this format, it is
crucial that files do not get fragmented and that multiple files a placed
contiguously on disk.

When multiple streams are being ingested and played out at the same time,
it is critical that the filesystem does not cross the streams and
interleave them together as this creates seek and readahead cache miss
latency and prevents both ingest and playout from meeting frame rate
targets.

This patch set creates a "stream of files" concept into the allocator to
place all the data from a single stream contiguously on disk so that RAID
array readahead can be used effectively. Each additional stream gets
placed in different allocation groups within the filesystem, thereby
ensuring that we don't cross any streams. When an AG fills up, we select a
new AG for the stream that is not in use.

The core of the functionality is the stream tracking - each inode that we
create in a directory needs to be associated with the directories' stream.
Hence every time we create a file, we look up the directories' stream
object and associate the new file with that object.

Once we have a stream object for a file, we use the AG that the stream
object point to for allocations. If we can't allocate in that AG (e.g. it
is full) we move the entire stream to another AG. Other inodes in the same
stream are moved to the new AG on their next allocation (i.e. lazy
update).

Stream objects are kept in a cache and hold a reference on the inode.
Hence the inode cannot be reclaimed while there is an outstanding stream
reference. This means that on unlink we need to remove the stream
association and we also need to flush all the associations on certain
events that want to reclaim all unreferenced inodes (e.g. filesystem
freeze).

SGI-PV: 964469
SGI-Modid: xfs-linux-melb:xfs-kern:29096a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NBarry Naujok <bnaujok@sgi.com>
Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>
Signed-off-by: NVlad Apostolov <vapo@sgi.com>

2a82b8be

[XFS] XFS should not be looking at filp reference counts · fbf3ce8d

由 Christoph Hellwig 提交于 6月 28, 2007

A check for file_count is always a bad idea. Linux has the ->release
method to deal with cleanups on last close and ->flush is only for the
very rare case where we want to perform an operation on every drop of a
reference to a file struct.

This patch gets rid of vop_close and surrounding code in favour of simply
doing the page flushing from ->release.

SGI-PV: 966562
SGI-Modid: xfs-linux-melb:xfs-kern:28952a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

fbf3ce8d

[XFS] Fix remount,readonly path to flush everything correctly. · 516b2e7c

由 David Chinner 提交于 6月 18, 2007

The remount readonly path can fail to writeback properly because we still
have active transactions after calling xfs_quiesce_fs(). Further
investigation shows that this path is broken in the same ways that the xfs
freeze path was broken so fix it the same way.

SGI-PV: 964464
SGI-Modid: xfs-linux-melb:xfs-kern:28869a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

516b2e7c

[XFS] Map unwritten extents correctly for I/o completion processing · effd120e

由 David Chinner 提交于 6月 18, 2007

If we have multiple unwritten extents within a single page, we fail to
tell the I/o completion construction handlers we need a new handle for the
second and subsequent blocks in the page. While we still issue the I/O
correctly, we do not have the correct ranges recorded in the ioend
structures and hence when we go to convert the unwritten extents we screw
it up.

Make sure we start a new ioend every time the mapping changes so that we
convert the correct ranges on I/O completion.

SGI-PV: 964647
SGI-Modid: xfs-linux-melb:xfs-kern:28797a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

effd120e

[XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize(). · b2826136

由 David Chinner 提交于 6月 05, 2007

SGI-PV: 965636
SGI-Modid: xfs-linux-melb:xfs-kern:28777a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NOlaf Weber <olaf@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

b2826136

[XFS] Block on unwritten extent conversion during synchronous direct I/O. · e927af90

由 David Chinner 提交于 6月 05, 2007

Currently we do not wait on extent conversion to occur, and hence we can
return to userspace from a synchronous direct I/O write without having
completed all the actions in the write. Hence a read after the write may
see zeroes (unwritten extent) rather than the data that was written.

Block the I/O completion by triggering a synchronous workqueue flush to
ensure that the conversion has occurred before we return to userspace.

SGI-PV: 964092
SGI-Modid: xfs-linux-melb:xfs-kern:28775a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

e927af90

[XFS] Flush the block device before closing it on unmount. · f4a9f28a

由 David Chinner 提交于 6月 05, 2007

SGI-PV: 965630
SGI-Modid: xfs-linux-melb:xfs-kern:28774a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

f4a9f28a

[XFS] Lazy Superblock Counters · 92821e2b

由 David Chinner 提交于 5月 24, 2007

When we have a couple of hundred transactions on the fly at once, they all
typically modify the on disk superblock in some way.
create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
free block counts.

When these counts are modified in a transaction, they must eventually lock
the superblock buffer and apply the mods. The buffer then remains locked
until the transaction is committed into the incore log buffer. The result
of this is that with enough transactions on the fly the incore superblock
buffer becomes a bottleneck.

The result of contention on the incore superblock buffer is that
transaction rates fall - the more pressure that is put on the superblock
buffer, the slower things go.

The key to removing the contention is to not require the superblock fields
in question to be locked. We do that by not marking the superblock dirty
in the transaction. IOWs, we modify the incore superblock but do not
modify the cached superblock buffer. In short, we do not log superblock
modifications to critical fields in the superblock on every transaction.
In fact we only do it just before we write the superblock to disk every
sync period or just before unmount.

This creates an interesting problem - if we don't log or write out the
fields in every transaction, then how do the values get recovered after a
crash? the answer is simple - we keep enough duplicate, logged information
in other structures that we can reconstruct the correct count after log
recovery has been performed.

It is the AGF and AGI structures that contain the duplicate information;
after recovery, we walk every AGI and AGF and sum their individual
counters to get the correct value, and we do a transaction into the log to
correct them. An optimisation of this is that if we have a clean unmount
record, we know the value in the superblock is correct, so we can avoid
the summation walk under normal conditions and so mount/recovery times do
not change under normal operation.

One wrinkle that was discovered during development was that the blocks
used in the freespace btrees are never accounted for in the AGF counters.
This was once a valid optimisation to make; when the filesystem is full,
the free space btrees are empty and consume no space. Hence when it
matters, the "accounting" is correct. But that means the when we do the
AGF summations, we would not have a correct count and xfs_check would
complain. Hence a new counter was added to track the number of blocks used
by the free space btrees. This is an *on-disk format change*.

As a result of this, lazy superblock counters are a mkfs option and at the
moment on linux there is no way to convert an old filesystem. This is
possible - xfs_db can be used to twiddle the right bits and then
xfs_repair will do the format conversion for you. Similarly, you can
convert backwards as well. At some point we'll add functionality to
xfs_admin to do the bit twiddling easily....

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28652a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

92821e2b

[XFS] Use generic shrinker interfaces in XFS. · 3260f78a

由 Andrew Morton 提交于 5月 24, 2007

SGI-PV: 964986
SGI-Modid: xfs-linux-melb:xfs-kern:28642a
Signed-Off-By: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

3260f78a

[XFS] Fix double free in xfs_buf_get_noaddr error handling path · ca165b88

由 Christoph Hellwig 提交于 5月 24, 2007

SGI-PV: 964983
SGI-Modid: xfs-linux-melb:xfs-kern:28639a
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

ca165b88

[XFS] Only use refcounted pages for I/O · 1fa40b01

由 Christoph Hellwig 提交于 5月 14, 2007

Many block drivers (aoe, iscsi) really want refcountable pages in bios,
which is what almost everyone send down. XFS unfortunately has a few
places where it sends down buffers that may come from kmalloc, which
breaks them.

Fix the places that use kmalloc()d buffers.

SGI-PV: 964546
SGI-Modid: xfs-linux-melb:xfs-kern:28562a
Signed-Off-By: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

1fa40b01

10 7月, 2007 1 次提交

sendfile: remove .sendfile from filesystems that use generic_file_sendfile() · 5ffc4ef4

由 Jens Axboe 提交于 6月 01, 2007

They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5ffc4ef4

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功