提交 · 0743b86800cf1dfbf96df4a438938127bbe4476c · openeuler / raspberrypi-kernel

14 12月, 2006 28 次提交

[PATCH] knfsd: Fix up some bit-rot in exp_export · f988443a

由 NeilBrown 提交于 12月 13, 2006

The nfsservctl system call isn't used but recent nfs-utils releases for
exporting filesystems, and consequently the code that is uses - exp_export -
has suffered some bitrot.

Particular:
  - some newly added fields in 'struct svc_export' are being initialised
    properly.
  - the return value is now always -ENOMEM ...

This patch fixes both these problems.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f988443a

[PATCH] knfsd: nfsd4: simplify filehandle check · 27d630ec

由 J.Bruce Fields 提交于 12月 13, 2006

Kill another big "if" clause.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

27d630ec

[PATCH] knfsd: nfsd4: simplify migration op check · eeac294e

由 J.Bruce Fields 提交于 12月 13, 2006

I'm not too fond of these big if conditions.  Replace them by checks of a flag
in the operation descriptor.  To my eye this makes the code a bit more
self-documenting, and makes the complicated part of the code (proc_compound) a
little more compact.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

eeac294e

[PATCH] knfsd: nfsd4: reorganize compound ops · b591480b

由 J.Bruce Fields 提交于 12月 13, 2006

Define an op descriptor struct, use it to simplify nfsd4_proc_compound().
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b591480b

[PATCH] knfsd: nfsd4: make verify and nverify wrappers · c954e2a5

由 J.Bruce Fields 提交于 12月 13, 2006

Make wrappers for verify and nverify, for consistency with other ops.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c954e2a5

[PATCH] knfsd: nfsd4: don't inline nfsd4 compound op functions · 7191155b

由 J.Bruce Fields 提交于 12月 13, 2006

The inlining contributes to bloating the stack of nfsd4_compound, and I want
to change the compound op functions to function pointers anyway.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7191155b

[PATCH] knfsd: nfsd4: move replay_owner to cstate · a4f1706a

由 J.Bruce Fields 提交于 12月 13, 2006

Tuck away the replay_owner in the cstate while we're at it.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a4f1706a

[PATCH] knfsd: nfsd4: remove spurious replay_owner check · d9e626f1

由 J.Bruce Fields 提交于 12月 13, 2006

OK, this is embarassing--I've even looked back at the history, and cannot for
the life of me figure out why I added this check.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d9e626f1

[PATCH] knfsd: nfsd4: pass saved and current fh together into nfsd4 operations · ca364317

由 J.Bruce Fields 提交于 12月 13, 2006

Pass the saved and current filehandles together into all the nfsd4 compound
operations.

I want a unified interface to these operations so we can just call them by
pointer and throw out the huge switch statement.

Also I'll eventually want a structure like this--that holds the state used
during compound processing--for deferral.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ca364317

[PATCH] knfsd: nfsd: don't drop silently on upcall deferral · e0bb89ef

由 J.Bruce Fields 提交于 12月 13, 2006

To avoid tying up server threads when nfsd makes an upcall (to mountd, to get
export options, to idmapd, for nfsv4 name<->id mapping, etc.), we temporarily
"drop" the request and save enough information so that we can revisit it
later.

Certain failures during the deferral process can cause us to really drop the
request and never revisit it.

This is often less than ideal, and is unacceptable in the NFSv4 case--rfc 3530
forbids the server from dropping a request without also closing the
connection.

As a first step, we modify the deferral code to return -ETIMEDOUT (which is
translated to nfserr_jukebox in the v3 and v4 cases, and remains a drop in the
v2 case).
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e0bb89ef

[PATCH] knfsd: nfsd4: handling more nfsd_cross_mnt errors in nfsd4 readdir · 021d3a72

由 J.Bruce Fields 提交于 12月 13, 2006

This patch on its own causes no change in behavior, since nfsd_cross_mnt()
only returns -EAGAIN; but in the future I'd like it to also be able to return
-ETIMEDOUT, so we may as well handle any possible error here.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

021d3a72

[PATCH] knfsd: nfsd: simplify exp_pseudoroot · 6899320c

由 J.Bruce Fields 提交于 12月 13, 2006

Note there's no need for special handling of -EAGAIN here; nfserrno() does
what we want already.  So this is a pure cleanup with no change in
functionality.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6899320c

[PATCH] knfsd: nfsd: make exp_rootfh handle exp_parent errors · 4b41bd85

由 J.Bruce Fields 提交于 12月 13, 2006

Since exp_parent can fail by returning an error (-EAGAIN) in addition to by
returning NULL, we should check for that case in exp_rootfh.

(TODO: we should check that userland handles these errors too.)
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4b41bd85

[PATCH] knfsd: nfsd4: clarify units of COMPOUND_SLACK_SPACE · e5710199

由 J.Bruce Fields 提交于 12月 13, 2006

A comment here incorrectly states that "slack_space" is measured in words, not
bytes.  Remove the comment, and adjust a variable name and a few comments to
clarify the situation.

This is pure cleanup; there should be no change in functionality.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e5710199

[PATCH] knfsd: nfsd4: remove a dprink from nfsd4_lock · 451c11a1

由 J.Bruce Fields 提交于 12月 13, 2006

This dprintk is printing the wrong error now, but it's probably an unnecessary
dprintk anyway; just remove it.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

451c11a1

[PATCH] one more EXPORT_UNUSED_SYMBOL removal · 029530f8

由 Adrian Bunk 提交于 12月 13, 2006

Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

029530f8

[PATCH] update Tigran's email addresses · 69688262

由 Tigran Aivazian 提交于 12月 13, 2006

As Adrian pointed out recently, there were still a couple of places where
I should have fixed my email address.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

69688262

[PATCH] ncpfs: ensure we free wdog_pid on parse_option or fill_inode failure · 1de24126

由 Eric W. Biederman 提交于 12月 13, 2006

This took a little refactoring but now errors are handled cleanly.  When
this code used pid_t values this wasn't necessary because you can't
leak a pid_t.

Thanks to Peter Vandrovec for spotting this.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Peter Vandrovec <vandrove@vc.cvut.cz>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1de24126

[PATCH] ncpfs: Use struct pid to track the userspace watchdog process · 2154227a

由 Eric W. Biederman 提交于 12月 13, 2006

This patch converts the tracking of the user space watchdog process from using
a pid_t to use struct pid.  This makes us safe from pid wrap around issues and
prepares the way for the pid namespace.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2154227a

[PATCH] smbfs: Make conn_pid a struct pid · a71113da

由 Eric W. Biederman 提交于 12月 13, 2006

smbfs keeps track of the user space server process in conn_pid. This converts
that track to use a struct pid instead of pid_t. This keeps us safe from pid
wrap around issues and prepares the way for the pid namespace.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a71113da

[PATCH] lockd endianness annotations · e8c5c045

由 Al Viro 提交于 12月 13, 2006

Annotated, all places switched to keeping status net-endian.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e8c5c045

[PATCH] Fix numerous kcalloc() calls, convert to kzalloc() · cd861280

由 Robert P. J. Day 提交于 12月 13, 2006

All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cd861280

[PATCH] Use activate_mm() in fs/aio.c:use_mm() · 90aef12e

由 Jeremy Fitzhardinge 提交于 12月 13, 2006

activate_mm() is not the right thing to be using in use_mm().  It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm).  I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

>From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

90aef12e

[PATCH] optimize o_direct on block devices · e61c9018

由 Chen, Kenneth W 提交于 12月 13, 2006

Implement block device specific .direct_IO method instead of going through
generic direct_io_worker for block device.

direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
file system, where it needs to perform block allocation, hole detection,
extents file on write, and tons of other corner cases.  The end result is
that it takes tons of CPU time to submit an I/O.

For block device, the block allocation is much simpler and a tight triple
loop can be written to iterate each iovec and each page within the iovec in
order to construct/prepare bio structure and then subsequently submit it to
the block layer.  This significantly speeds up O_D on block device.

[akpm@osdl.org: small speedup]
Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e61c9018

[PATCH] ocfs2: relative atime support · 7e913c53

由 Mark Fasheh 提交于 12月 13, 2006

Update ocfs2_should_update_atime() to understand the MNT_RELATIME flag and
to test against mtime / ctime accordingly.

[akpm@osdl.org: cleanups]
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
Cc: Valerie Henson <val_henson@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7e913c53

[PATCH] relative atime · 47ae32d6

由 Valerie Henson 提交于 12月 13, 2006

Add "relatime" (relative atime) support.  Relative atime only updates the
atime if the previous atime is older than the mtime or ctime.  Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.

A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txtSigned-off-by: NValerie Henson <val_henson@linux.intel.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

47ae32d6

[PATCH] touch_atime() cleanup · b2276138

由 Andrew Morton 提交于 12月 13, 2006

Simplify touch_atime() layout.

Cc: Valerie Henson <val_henson@linux.intel.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b2276138

[PATCH] constify pipe_buf_operations · d4c3cca9

由 Eric Dumazet 提交于 12月 13, 2006

- pipe/splice should use const pipe_buf_operations and file_operations

- struct pipe_inode_info has an unused field "start" : get rid of it.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d4c3cca9

13 12月, 2006 3 次提交

fs: Convert kmalloc() + memset() to kzalloc() in fs/. · b87576d5

由 Robert P. J. Day 提交于 12月 12, 2006

Convert the single available instance of kmalloc() + memset() to
kzalloc() in the fs/ directory.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

b87576d5

kconfig: Standardize "depends" -> "depends on" in Kconfig files · bef1f402

由 Robert P. J. Day 提交于 12月 12, 2006

Standardize the miniscule percentage of occurrences of "depends" in
Kconfig files to "depends on", and update kconfig-language.txt to
reflect that.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

bef1f402

EXT{2,3,4}_FS: remove outdated part of the help text · d23edbd3

由 Jan Engelhardt 提交于 12月 12, 2006

Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
Acked-by: NDave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

d23edbd3

12 12月, 2006 2 次提交

[patch 3/3] OCFS2 Configurable timeouts - Protocol changes · 828ae6af

由 Andrew Beekhof 提交于 12月 04, 2006

Modify the OCFS2 handshake to ensure essential timeouts are configured
identically on all nodes.

Only allow changes when there are no connected peers

Improves the logic in o2net_advance_rx() which broke now that
sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)

Included is the field for userspace-heartbeat timeout to avoid the need for
further protocol changes.

Uses a global spinlock to ensure the decisions to update configfs entries
are made on the correct value.  The region covered by the spinlock when
incrementing the counter is much larger as this is the more critical case.

Small cleanup contributed by Adrian Bunk <bunk@stusta.de>
Signed-off-by: NAndrew Beekhof <abeekhof@suse.de>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

828ae6af

Make SLES9 "get_kernel_version" work on the kernel binary again · 8993780a

由 Linus Torvalds 提交于 12月 11, 2006

As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary.  It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)

That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649:
"[PATCH] Fix linux banner utsname information").

This just restores "linux_banner" as a static string, which should fix
the version finding.  And /proc/version simply uses a different string.

To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.

Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: NHerbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: NOlaf Hering <olaf@aepfle.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8993780a

11 12月, 2006 7 次提交

[PATCH] user of the jiffies rounding code: JBD · 44d306e1

由 Arjan van de Ven 提交于 12月 10, 2006

This patch introduces a user: of the round_jiffies() function; the "5 second"
ext3/jbd wakeup.

While "every 5 seconds" doesn't sound as a problem, there can be many of these
(and these timers do add up over all the kernel).  The "5 second" wakeup isn't
really timing sensitive; in addition even with rounding it'll still happen
every 5 seconds (with the exception of the very first time, which is likely to
be rounded up to somewhere closer to 6 seconds)
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

44d306e1

[PATCH] fdtable: Implement new pagesize-based fdtable allocator · 5466b456

由 Vadim Lobanov 提交于 12月 10, 2006

This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries.  The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.

The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:

    pagesize / 4
    pagesize / 2
    pagesize      <- memory allocator switch point
    pagesize * 2
    pagesize * 4
    ...etc...

Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.

Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop.  Even together, they will still be, by far, smaller
than the fdarray.  The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5466b456

[PATCH] fdtable: Remove the free_files field · 4fd45812

由 Vadim Lobanov 提交于 12月 10, 2006

An fdtable can either be embedded inside a files_struct or standalone (after
being expanded).  When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.

Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded.  We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4fd45812

[PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69

由 Vadim Lobanov 提交于 12月 10, 2006

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bbea9f69

[PATCH] dio: lock refcount operations · 5eb6c7a2

由 Zach Brown 提交于 12月 10, 2006

The wait_for_more_bios() function name was poorly chosen.  While looking to
clean it up it I noticed that the dio struct refcounting between the bio
completion and dio submission paths was racey.

The bio submission path was simply freeing the dio struct if
atomic_dec_and_test() indicated that it dropped the final reference.

The aio bio completion path was dereferencing its dio struct pointer *after
dropping its reference* based on the remaining number of references.

These two paths could race and result in the aio bio completion path
dereferencing a freed dio, though this was not observed in the wild.

This moves the refcount under the bio lock so that bio completion can drop
its reference and decide to wake all in one atomic step.

Once testing and waking is locked dio_await_one() can test its sleeping
condition and mark itself uninterruptible under the lock.  It gets simpler
and wait_for_more_bios() disappears.

The addition of the interrupt masking spin lock acquiry in dio_bio_submit()
looks alarming.  This lock acquiry existed in that path before the recent
dio completion patch set.  We shouldn't expect significant performance
regression from returning to the behaviour that existed before the
completion clean up work.

This passed 4k block ext3 O_DIRECT fsx and aio-stress on an SMP machine.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5eb6c7a2

[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED · 8459d86a

由 Zach Brown 提交于 12月 10, 2006

The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
conditionals in sync. direct_io_worker() knew when finished_one_bio() was
going to call aio_complete(). It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.

Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio(). direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.

The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain. It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.

This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio. Instead we return the return
code of the operation and let the aio core call aio_complete(). This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.

Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it. XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8459d86a

[PATCH] dio: remove duplicate bio wait code · 20258b2b

由 Zach Brown 提交于 12月 10, 2006

Now that we have a single refcount and waiting path we can reuse it in the
async 'should_wait' path.  It continues to rely on the fragile link between
the conditional in dio_complete_aio() which decides to complete the AIO and
the conditional in direct_io_worker() which decides to wait and free.

By waiting before dropping the reference we stop dio_bio_end_aio() from
calling dio_complete_aio() which used to wake up the waiter after seeing the
reference count drop to 0.  We hoist this wake up into dio_bio_end_aio() which
now notices when it's left a single remaining reference that is held by the
waiter.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

20258b2b