提交 · 6110e02b97377a2903853faf3ecaff0e742fbe93 · xiphi1978 / linux

21 9月, 2007 5 次提交

ocfs2: Pack vote message and response structures · 813d974c

由 Sunil Mushran 提交于 9月 20, 2007

The ocfs2_vote_msg and ocfs2_response_msg structs needed to be
packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this,
we had inadvertantly broken 32/64 bit cross mounts.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

813d974c

ocfs2: Don't double set write parameters · 5c26a7b7

由 Mark Fasheh 提交于 9月 18, 2007

The target page offsets were being incorrectly set a second time in
ocfs2_prepare_page_for_write(), which was causing problems on a 16k page
size kernel. Additionally, ocfs2_write_failure() was incorrectly using those
parameters instead of the parameters for the individual page being cleaned
up.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

5c26a7b7

ocfs2: Fix pos/len passed to ocfs2_write_cluster · db56246c

由 Mark Fasheh 提交于 9月 17, 2007

This was broken for file systems whose cluster size is greater than page
size. Pos needs to be incremented as we loop through the descriptors, and
len needs to be capped to the size of a single cluster.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

db56246c

ocfs2: Allow smaller allocations during large writes · 415cb800

由 Mark Fasheh 提交于 9月 16, 2007

The ocfs2 write code loops through a page much like the block code, except
that ocfs2 allocation units can be any size, including larger than page
size. Typically it's equal to or larger than page size - most kernels run 4k
pages, the minimum ocfs2 allocation (cluster) size.

Some changes introduced during 2.6.23 changed the way writes to pages are
handled, and inadvertantly broke support for > 4k page size. Instead of just
writing one cluster at a time, we now handle the whole page in one pass.

This means that multiple (small) seperate allocations might happen in the
same pass. The allocation code howver typically optimizes by getting the
maximum which was reserved. This triggered a BUG_ON in the extend code where
it'd ask for a single bit (for one part of a > 4k page) and get back more
than it asked for.

Fix this by providing a variant of the high level allocation function which
allows the caller to specify a maximum. The traditional function remains and
just calls the new one with a maximum determined from the initial
reservation.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

415cb800

signalfd simplification · b8fceee1

由 Davide Libenzi 提交于 9月 20, 2007

This simplifies signalfd code, by avoiding it to remain attached to the
sighand during its lifetime.

In this way, the signalfd remain attached to the sighand only during
poll(2) (and select and epoll) and read(2).  This also allows to remove
all the custom "tsk == current" checks in kernel/signal.c, since
dequeue_signal() will only be called by "current".

I think this is also what Ben was suggesting time ago.

The external effect of this, is that a thread can extract only its own
private signals and the group ones.  I think this is an acceptable
behaviour, in that those are the signals the thread would be able to
fetch w/out signalfd.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8fceee1

20 9月, 2007 5 次提交

[XFS] fix valid but harmless sparse warning · 1bc5858d

由 Christoph Hellwig 提交于 9月 19, 2007

The new xlog_recover_do_reg_buffer checks call be16_to_cpu on di_gen which
is a 32bit value so sparse rightly complains. Fortunately the warning is
harmless because we don't care for the value, but only whether it's
non-NULL. Due to that fact we can simply kill the endian swaps on this and
the previous di_mode check entirely.

SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29709a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

1bc5858d

[XFS] fix filestreams on 32-bit boxes · bcc7b445

由 Eric Sandeen 提交于 8月 30, 2007

xfs_filestream_mount() sets up an mru cache with:
  err = xfs_mru_cache_create(&mp->m_filestream, lifetime, grp_count,
  (xfs_mru_cache_free_func_t)xfs_fstrm_free_func);
but that cast is causing problems...
  typedef void (*xfs_mru_cache_free_func_t)(unsigned long, void*);
but:
  void xfs_fstrm_free_func( xfs_ino_t ino, fstrm_item_t *item)
so on a 32-bit box, it's casting (32, 32) args into (64, 32) and I assume
it's getting garbage for *item, which subsequently causes an explosion.
With this change the filestreams xfsqa tests don't oops on my 32-bit box.

SGI-PV: 967795
SGI-Modid: xfs-linux-melb:xfs-kern:29510a
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

bcc7b445

ext34: ensure do_split leaves enough free space in both blocks · ef2b02d3

由 Eric Sandeen 提交于 9月 18, 2007

The do_split() function for htree dir blocks is intended to split a leaf
block to make room for a new entry.  It sorts the entries in the original
block by hash value, then moves the last half of the entries to the new
block - without accounting for how much space this actually moves.  (IOW,
it moves half of the entry *count* not half of the entry *space*).  If by
chance we have both large & small entries, and we move only the smallest
entries, and we have a large new entry to insert, we may not have created
enough space for it.

The patch below stores each record size when calculating the dx_map, and
then walks the hash-sorted dx_map, calculating how many entries must be
moved to more evenly split the existing entries between the old block and
the new block, guaranteeing enough space for the new entry.

The dx_map "offs" member is reduced to u16 so that the overall map size
does not change - it is temporarily stored at the end of the new block, and
if it grows too large it may be overwritten.  By making offs and size both
u16, we won't grow the map size.

Also add a few comments to the functions involved.

This fixes the testcase reported by hooanon05@yahoo.co.jp on the
linux-ext4 list, "ext3 dir_index causes an error"

Thanks to Andreas Dilger for discussing the problem & solution with me.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Tested-by: NJunjiro Okajima <hooanon05@yahoo.co.jp>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef2b02d3

nfs: fix oops re sysctls and V4 support · 49af7ee1

由 Alexey Dobriyan 提交于 9月 18, 2007

NFS unregisters sysctls only if V4 support is compiled in.  However, sysctl
table is not V4 specific, so unregister it always.

Steps to reproduce:

	[build nfs.ko with CONFIG_NFS_V4=n]
	modrobe nfs
	rmmod nfs
	ls /proc/sys

Unable to handle kernel paging request at ffffffff880661c0 RIP:
 [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
PGD 203067 PUD 207063 PMD 7e216067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: lockd nfs_acl sunrpc
Pid: 3335, comm: ls Not tainted 2.6.23-rc3-bloat #2
RIP: 0010:[<ffffffff802af8e3>]  [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
RSP: 0018:ffff81007fd93e78  EFLAGS: 00010286
RAX: ffffffff880661c0 RBX: ffffffff80466370 RCX: ffffffff880661c0
RDX: 00000000000014c0 RSI: ffff81007f3ad020 RDI: ffff81007efd8b40
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff802a8570 R12: ffffffff880661c0
R13: ffff81007e219640 R14: ffff81007efd8b40 R15: ffff81007ded7280
FS:  00002ba25ef03060(0000) GS:ffff81007ff81258(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff880661c0 CR3: 000000007dfaf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 3335, threadinfo ffff81007fd92000, task ffff81007d8a0000)
Stack:  ffff81007f3ad150 ffffffff80283f30 ffff81007fd93f48 ffff81007efd8b40
 ffff81007ee00440 0000000422222222 0000000200035593 ffffffff88037e9a
 2222222222222222 ffffffff80466500 ffff81007e416400 ffff81007e219640
Call Trace:
 [<ffffffff80283f30>] filldir+0x0/0xf0
 [<ffffffff80283f30>] filldir+0x0/0xf0
 [<ffffffff802840c7>] vfs_readdir+0xa7/0xc0
 [<ffffffff80284376>] sys_getdents+0x96/0xe0
 [<ffffffff8020bb3e>] system_call+0x7e/0x83

Code: 41 8b 14 24 85 d2 74 dc 49 8b 44 24 08 48 85 c0 74 e7 49 3b
RIP  [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
 RSP <ffff81007fd93e78>
CR2: ffffffff880661c0
Kernel panic - not syncing: Fatal exception
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NTrond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49af7ee1

dir_index: error out instead of BUG on corrupt dx dirs · 3d82abae

由 Eric Sandeen 提交于 9月 18, 2007

Convert asserts (BUGs) in dx_probe from bad on-disk data to recoverable
errors with helpful warnings.  With help catching other asserts from Duane
Griffin <duaneg@dghda.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NDuane Griffin <duaneg@dghda.com>
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d82abae

18 9月, 2007 2 次提交

[XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer. · b394e43e

由 Lachlan McIlroy 提交于 9月 14, 2007

SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29676a
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

b394e43e

[XFS] Ensure file size updates have been completed before writing inode to disk. · 776a75fa

由 Lachlan McIlroy 提交于 9月 14, 2007

SGI-PV: 968767
SGI-Modid: xfs-linux-melb:xfs-kern:29675a
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

776a75fa

17 9月, 2007 1 次提交

[XFS] On-demand reaping of the MRU cache · 65de5567

由 David Chinner 提交于 8月 16, 2007

Instead of running the mru cache reaper all the time based on a timeout,
we should only run it when the cache has active objects. This allows CPUs
to sleep when there is no activity rather than be woken repeatedly just to
check if there is anything to do.

SGI-PV: 968554
SGI-Modid: xfs-linux-melb:xfs-kern:29305a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

65de5567

15 9月, 2007 1 次提交

As struct iw_point is bi-directional payload, we should copy back the content · 53c57255

由 Masakazu Mokuno 提交于 9月 14, 2007

on return from ioctl calls
Signed-off-by: NMasakazu Mokuno <mokuno@sm.sony.co.jp>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

53c57255

12 9月, 2007 8 次提交

Leases can be hidden by flocks · 0e2f6db8

由 Pavel Emelyanov 提交于 9月 11, 2007

The inode->i_flock list contains the leases, flocks and posix
locks in the specified order. However, the flocks are added in
the head of this list thus hiding the leases from F_GETLEASE
command, from time_out_leases() and other code that expects
the leases to come first.

The following example will demonstrate this:

#define _GNU_SOURCE

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/file.h>

static void show_lease(int fd)
{
        int res;

        res = fcntl(fd, F_GETLEASE);
        switch (res) {
                case F_RDLCK:
                        printf("Read lease\n");
                        break;
                case F_WRLCK:
                        printf("Write lease\n");
                        break;
                case F_UNLCK:
                        printf("No leases\n");
                        break;
                default:
                        printf("Some shit\n");
                        break;
        }
}

int main(int argc, char **argv)
{
        int fd, res;

        fd = open(argv[1], O_RDONLY);
        if (fd == -1) {
                perror("Can't open file");
                return 1;
        }

        res = fcntl(fd, F_SETLEASE, F_WRLCK);
        if (res == -1) {
                perror("Can't set lease");
                return 1;
        }

        show_lease(fd);

        if (flock(fd, LOCK_SH) == -1) {
                perror("Can't flock shared");
                return 1;
        }

        show_lease(fd);

        return 0;
}

The first call to show_lease() will show the write lease set, but
the second will show no leases.

Fix the flock adding so that the leases always stay in the head
of this list.

Found during making the flocks pid-namespaces aware.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e2f6db8

Fix select on /proc files without ->poll · dd23aae4

由 Alexey Dobriyan 提交于 9月 11, 2007

Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
786d7e16 aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.

The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler.  The new code makes ->poll always there and returns 0 by
default, which is not correct.  Return DEFAULT_POLLMASK instead.

Steps to reproduce:

	install emacs, SBCL, SLIME
	emacs
	M-x slime	in *inferior-lisp* buffer
	[watch it doing "Connecting to Swank on port X.."]

Please, apply before 2.6.23.

P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd23aae4

afs: mntput called before dput · 1a1a1a75

由 Andreas Gruenbacher 提交于 9月 11, 2007

dput must be called before mntput here.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Acked-By: NDavid Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a1a1a75

quota: fix infinite loop · 9c3013e9

由 Jan Kara 提交于 9月 11, 2007

If we fail to start a transaction when releasing dquot, we have to call
dquot_release() anyway to mark dquot structure as inactive.  Otherwise we
end in an infinite loop inside dqput().
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: xb <xavier.bru@bull.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c3013e9

ocfs2: Fix calculation of i_blocks during truncate · e535e2ef

由 Mark Fasheh 提交于 8月 31, 2007

We were setting i_blocks too early - before truncating any allocation.
Correct things to set i_blocks after the allocation change.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

e535e2ef

[PATCH] ocfs2: Fix a wrong cluster calculation. · 30b8548f

由 tao.ma@oracle.com 提交于 9月 06, 2007

In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated
by the byte length only. This may cause some problems if we start to write
at some position in the end of one cluster and last to a second cluster
while the "len" is smaller than a cluster size. In that case, we have to
write 2 clusters actually.
So we have to take the start position into consideration also.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

30b8548f

[PATCH] ocfs2: fix mount option parsing · c0123ade

由 Tiger Yang 提交于 9月 08, 2007

For some mount option types, ocfs2_parse_options() will try to access
sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that
hasn't been allocated yet and will cause a kernel crash.

Fix this by storing options in a struct which can then get pushed into the
ocfs2_super once it's been allocated later. If we need more options which
store to the ocfs2_super in the future, we can just fields to this struct.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

c0123ade

ocfs2: update docs for new features · 10b0845b

由 Mark Fasheh 提交于 8月 23, 2007

Update documentation listing ocfs2 features to reflect the current state of
the file system. Add missing descriptions for some mount options which ocfs2
supports.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

10b0845b

11 9月, 2007 2 次提交

knfsd: Validate filehandle type in fsid_source · b8da0d1c

由 Neil Brown 提交于 9月 05, 2007

fsid_source decided where to get the 'fsid' number to
return for a GETATTR based on the type of filehandle.
It can be from the device, from the fsid, or from the
UUID.

It is possible for the filehandle to be inconsistent
with the export information, so make sure the export information
actually has the info implied by the value returned by
fsid_source.
Signed-off-by: NNeil Brown <neilb@suse.de>
Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com>
Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8da0d1c

knfsd: Fixed problem with NFS exporting directories which are mounted on. · a1033be7

由 Neil Brown 提交于 9月 05, 2007

Recent changes in NFSd cause a directory which is mounted-on
to not appear properly when the filesystem containing it is exported.

*exp_get* now returns -ENOENT rather than NULL and when
  commit 5d3dbbea
removed the NULL checks, it didn't add a check for -ENOENT.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1033be7

05 9月, 2007 6 次提交

[XFS] fix nasty quota hashtable allocation bug · 5995cb7d

由 Eric Sandeen 提交于 8月 16, 2007

This git mod: 77e4635a
converted to a "greedy" allocation interface, but for the quota hashtables
it switched from allocating XFS_QM_HASHSIZE (nr of elements)
xfs_dqhash_t's to allocating only XFS_QM_HASHSIZE *bytes* - quite a lot
smaller! Then when we converted hsize "back" to nr of elements (the
division line) hsize went to 0. This was leading to oopses when running
any quota tests on the Fedora 8 test kernel, but the problem has been
there for almost a year.

SGI-PV: 968837
SGI-Modid: xfs-linux-melb:xfs-kern:29354a
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

5995cb7d

[XFS] fix sparse shadowed variable warnings · 265c1fac

由 Christoph Hellwig 提交于 8月 16, 2007

- in xfs_probe_cluster rename the inner len to pg_len. There's no harm
  here because the outer len isn't used after the inner len comes into
  existence but it keeps the code clean.
- in xfs_da_do_buf remove the inner i because they don't overlap
  and they are both the same type.

SGI-PV: 968555
SGI-Modid: xfs-linux-melb:xfs-kern:29311a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

265c1fac

[XFS] fix ASSERT and ASSERT_ALWAYS · ee5c8023

由 Christoph Hellwig 提交于 8月 16, 2007

- remove the != 0 inside the unlikely in ASSERT_ALWAYS because sparse now
  complains about comparisons between pointers and 0
- add a standalone ASSERT implementation because defining it to
  ASSERT_ALWAYS means the string is expanded before the token passing
  stringification. This way we get the actual content of the
  assertion in the assfail message and don't overflow sparse's
  stringification buffer leading to sparse error messages.

SGI-PV: 968555
SGI-Modid: xfs-linux-melb:xfs-kern:29310a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

ee5c8023

[XFS] Fix sparse warning in kmem_shake_allow · 34521c5e

由 Christoph Hellwig 提交于 8月 16, 2007

We can't return a masked result of a __bitwise type. Compare it to 0 first
to keep the behaviour without the warning.

SGI-PV: 968555
SGI-Modid: xfs-linux-melb:xfs-kern:29309a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

34521c5e

[XFS] Fix sparse NULL vs 0 warnings · 4b80916b

由 Christoph Hellwig 提交于 8月 16, 2007

Sparse now warns about comparing pointers to 0, so change all instance
where that happens to NULL instead.

SGI-PV: 968555
SGI-Modid: xfs-linux-melb:xfs-kern:29308a
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NTim Shimmin <tes@sgi.com>

4b80916b

[XFS] Set filestreams object timeout to something sane. · 8da22d7a

由 David Chinner 提交于 8月 16, 2007

SGI-PV: 968554
SGI-Modid: xfs-linux-melb:xfs-kern:29303a
Signed-off-by: NDavid Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTim Shimmin <tes@sgi.com>

8da22d7a

03 9月, 2007 1 次提交

[JFFS2] fix write deadlock regression · fc0e0197

由 Jason Lunz 提交于 9月 01, 2007

I've bisected the deadlock when many small appends are done on jffs2 down to
this commit:

commit 6fe6900e
Author: Nick Piggin <npiggin@suse.de>
Date:   Sun May 6 14:49:04 2007 -0700

    mm: make read_cache_page synchronous

    Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd.  All depending on whether the filler is async and/or can return
    with a !uptodate page.

It introduced a wait to read_cache_page, as well as a
read_cache_page_async function equivalent to the old read_cache_page
without any callers.

Switching jffs2_gc_fetch_page to read_cache_page_async for the old
behavior makes the deadlocks go away, but maybe reintroduces the
use-before-uptodate problem? I don't understand the mm/fs interaction
well enough to say.

[It's fine. dwmw2.]
Signed-off-by: NJason Lunz <lunz@falooley.org>
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>

fc0e0197

01 9月, 2007 9 次提交

NFS: Fix a write request leak in nfs_invalidate_page() · 1b3b4a1a

由 Trond Myklebust 提交于 8月 28, 2007

Ryusuke Konishi says:

The recent truncate_complete_page() clears the dirty flag from a page
before calling a_ops->invalidatepage(),
^^^^^^
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
        ...
        cancel_dirty_page(page, PAGE_CACHE_SIZE);  <--- Inserted here at
kernel 2.6.20

        if (PagePrivate(page))
                do_invalidatepage(page, 0);   ---> will call
a_ops->invalidatepage()
        ...
}

and this is disturbing nfs_wb_page_priority() from calling 
nfs_writepage_locked() that is expected to handle the pending
request (=nfs_page) associated with the page.

int nfs_wb_page_priority(struct inode *inode, struct page *page, int how)
{
        ...
        if (clear_page_dirty_for_io(page)) {
                ret = nfs_writepage_locked(page, &wbc);
                if (ret < 0)
                        goto out;
        }
        ...
}

Since truncate_complete_page() will get rid of the page after
a_ops->invalidatepage() returns, the request (=nfs_page) associated
with the page becomes a garbage in nfs_inode->nfs_page_tree.
------------------------

Fix this by ensuring that nfs_wb_page_priority() recognises that it may
also need to clear out non-dirty pages that have an nfs_page associated
with them.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1b3b4a1a

NFS: change NFS mount error return when hostname/pathname too long · 7d1cca72

由 Chuck Lever 提交于 8月 29, 2007

According to the mount(2) man page, the proper error return code for the
mount(2) system call when the special device name or the mounted-on
directory name is too long is ENAMETOOLONG.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7d1cca72

NFS: Off-by-one length error in string handling · 350c73af

由 Chuck Lever 提交于 8月 29, 2007

The hostname was getting truncated in the new text-based NFS mount API.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

350c73af

NFS: Return a real error code from mount(2) · fdc6e2c8

由 Chuck Lever 提交于 8月 29, 2007

Don't filter the return code from the in-kernel rpcbind or NFS mount
clients.  Return the real error code so that callers of the new NFS
text-based mount API can apply a useful retry strategy.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fdc6e2c8

NFS: mount option parser chokes on proto= · fdb66ff4

由 Chuck Lever 提交于 8月 29, 2007

The new text-based NFS mount option parsing logic doesn't recognize any
valid transport protocols due to a silly mistake in the protocol token
matching logic.  This prevents basic mount requests such as:

   mount.nfs server:/export /mnt -o proto=tcp

from working with the new text-based NFS mount API.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fdb66ff4

T
NFSv4: Ensure that we pass the correct dentry to nfs4_intent_set_file · deee9369
由 Trond Myklebust 提交于 8月 27, 2007
```
This patch fixes an Oops that was reported by Gabriel Barazer.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
deee9369

NFSv4: Fix a typo in _nfs4_do_open_reclaim · 65bbf6bd

由 Trond Myklebust 提交于 8月 27, 2007

This should fix the following Oops reported by Jeff Garzik:

kernel BUG at fs/nfs/nfs4xdr.c:1040!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: nfs lockd sunrpc af_packet
ipv6 cpufreq_ondemand acpi_cpufreq battery floppy nvram sg snd_hda_intel
ata_generic snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_page_alloc e1000
firewire_ohci ata_piix i2c_core sr_mod cdrom sata_sil ahci libata sd_mod
scsi_mod ext3 jbd ehci_hcd uhci_hcd
Pid: 16353, comm: 10.10.10.1-recl Not tainted 2.6.23-rc3 #1
RIP: 0010:[<ffffffff88240980>] [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330
RSP: 0018:ffff8100467c5c60  EFLAGS: 00010202
RAX: ffff81000f89b8b8 RBX: 00000000697a6f6d RCX: ffff81000f89b8b8
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8100467c5c80
RBP: ffff8100467c5c80 R08: ffff81000f89bc30 R09: ffff81000f89b83f
R10: 0000000000000001 R11: ffffffff881e79e0 R12: ffff81003cbd1808
R13: ffff81000f89b860 R14: ffff81005fc984e0 R15: ffffffff88240af0
FS:  0000000000000000(0000) GS:ffffffff8052a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002adb9e51a030 CR3: 000000007ea7e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process 10.10.10.1-recl (pid: 16353, threadinfo ffff8100467c4000, task ffff8100038ce780)
Stack:  ffff81004aeb6a40 ffff81003cbd1808 ffff81003cbd1808 ffffffff88240b5d
 ffff81000f89b8bc ffff81005fc984e8 ffff81000f89bc30 ffff81005fc984e8
 0000000300000000 0000000000000000 0000000000000000 ffff81003cbd1800
Call Trace:
 [<ffffffff88240b5d>] :nfs:nfs4_xdr_enc_open_noattr+0x6d/0x90
 [<ffffffff881e74b7>] :sunrpc:rpcauth_wrap_req+0x97/0xf0
 [<ffffffff88240af0>] :nfs:nfs4_xdr_enc_open_noattr+0x0/0x90
 [<ffffffff881df57a>] :sunrpc:call_transmit+0x18a/0x290
 [<ffffffff881e5e7b>] :sunrpc:__rpc_execute+0x6b/0x290
 [<ffffffff881dff76>] :sunrpc:rpc_do_run_task+0x76/0xd0
 [<ffffffff882373f6>] :nfs:_nfs4_proc_open+0x76/0x230
 [<ffffffff88237a2e>] :nfs:nfs4_open_recover_helper+0x5e/0xc0
 [<ffffffff88237b74>] :nfs:nfs4_open_recover+0xe4/0x120
 [<ffffffff88238e14>] :nfs:nfs4_open_reclaim+0xa4/0xf0
 [<ffffffff882413c5>] :nfs:nfs4_reclaim_open_state+0x55/0x1b0
 [<ffffffff882417ea>] :nfs:reclaimer+0x2ca/0x390
 [<ffffffff88241520>] :nfs:reclaimer+0x0/0x390
 [<ffffffff8024e59b>] kthread+0x4b/0x80
 [<ffffffff8020cad8>] child_rip+0xa/0x12
 [<ffffffff8024e550>] kthread+0x0/0x80
 [<ffffffff8020cace>] child_rip+0x0/0x12


Code: 0f 0b eb fe 48 89 ef c7 00 00 00 00 02 be 08 00 00 00 e8 79 
RIP  [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330
 RSP <ffff8100467c5c60>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

65bbf6bd

NFS: Fix use of cancel_delayed_work_sync in nfs_release_automount_timer · 560aef74

由 Trond Myklebust 提交于 8月 27, 2007

Doh! We can't use cancel_delayed_work_sync because we may have been called
from an unmount that was being performed by nfs_automount_task.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

560aef74

NFS: Fix the mount regression · e89a5a43

由 Trond Myklebust 提交于 8月 31, 2007

This avoids the recent NFS mount regression (returning EBUSY when
mounting the same filesystem twice with different parameters).

The best I can do given the constraints appears to be to have the kernel
first look for a superblock that matches both the fsid and the
user-specified mount options, and then spawn off a new superblock if
that search fails.

Note that this is not the same as specifying nosharecache everywhere
since nosharecache will never attempt to match an existing superblock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NHua Zhong <hzhong@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e89a5a43