提交 · fe8bc91c4c30122b357d197117705cfd4fabaf28 · openanolis / cloud-kernel

11 11月, 2009 2 次提交

ext3: Wait for proper transaction commit on fsync · fe8bc91c

由 Jan Kara 提交于 10月 16, 2009

We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

fe8bc91c

ext3: retry failed direct IO allocations · ea0174a7

由 Eric Sandeen 提交于 10月 12, 2009

On a 256M 4k block filesystem, doing this in a loop:

    dd if=/dev/zero of=test oflag=direct bs=1M count=64
    rm -f test

eventually leads to spurious ENOSPC:

    dd: writing `test': No space left on device

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.

A similar patch went into ext4 (commit
fbbf6945)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

ea0174a7

08 11月, 2009 2 次提交

nilfs2: fix missing cleanup of gc cache on error cases · c083234f

由 Ryusuke Konishi 提交于 11月 08, 2009

This fixes an -rc1 regression brought by the commit:
1cf58fa8 ("nilfs2: shorten freeze
period due to GC in write operation v3").

Although the patch moved out a function call of
nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from
nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding
cleanup job needed for the error case.

This will move the missing cleanup job to the destination function.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: NJiro SEKIBA <jir@unicus.jp>

c083234f

nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks · 5399dd1f

由 Ryusuke Konishi 提交于 11月 07, 2009

This fixes a kernel oops reported by Markus Trippelsdorf in the email
titled "[NILFS users] kernel Oops while running nilfs_cleanerd".

The oops was caused by a bug of error path in
nilfs_ioctl_move_blocks() function, which was inlined in
nilfs_ioctl_clean_segments().

nilfs_ioctl_move_blocks checks duplication of blocks which will be
moved in garbage collection.  But, the check should have be done
within nilfs_ioctl_move_inode_block() to prevent list corruption among
buffers storing the target blocks.

To fix the kernel oops, this moves forward the duplication check
before the list insertion.

I also tested this for stable trees [2.6.30, 2.6.31].
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>

5399dd1f

07 11月, 2009 2 次提交

cifs: don't use CIFSGetSrvInodeNumber in is_path_accessible · f475f677

由 Jeff Layton 提交于 11月 06, 2009

Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to
verify the accessibility of the root inode and then falls back to doing a
full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report
of a server that returns NT_STATUS_INTERNAL_ERROR rather than something
that translates to EOPNOTSUPP.

Rather than trying to be clever with that call, just have
is_path_accessible do a normal QPathInfo. That call is widely
supported and it shouldn't increase the overhead significantly.

Cc: Stable <stable@kernel.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

f475f677

cifs: clean up handling when server doesn't consistently support inode numbers · ec06aedd

由 Jeff Layton 提交于 11月 06, 2009

It's possible that a server will return a valid FileID when we query the
FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers
when we do a FindFile with an infolevel of
SMB_FIND_FILE_ID_FULL_DIR_INFO.

In this situation turn off querying for server inode numbers, generate a
warning for the user and just generate an inode number using iunique.
Once we generate any inode number with iunique we can no longer use any
server inode numbers or we risk collisions, so ensure that we don't do
that in cifs_get_inode_info either.

Cc: Stable <stable@kernel.org>
Reported-by: NTimothy Normand Miller <theosib@gmail.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

ec06aedd

05 11月, 2009 1 次提交

sysfs: Don't leak secdata when a sysfs_dirent is freed. · 4c3da220

由 Eric W. Biederman 提交于 11月 04, 2009

While refreshing my sysfs patches I noticed a leak in the secdata
implementation.  We don't free the secdata when we free the
sysfs dirent.

This is a bug in 2.6.32-rc5 that we really should close.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

4c3da220

04 11月, 2009 4 次提交

x86, fs: Fix x86 procfs stack information for threads on 64-bit · 89240ba0

由 Stefani Seibold 提交于 11月 03, 2009

This patch fixes two issues in the procfs stack information on
x86-64 linux.

The 32 bit loader compat_do_execve did not store stack
start. (this was figured out by Alexey Dobriyan).

The stack information on a x64_64 kernel always shows 0 kbyte
stack usage, because of a missing implementation of the KSTK_ESP
macro which always returned -1.

The new implementation now returns the right value.
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1257240160.4889.24.camel@wall-e>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

89240ba0

fuse: invalidate target of rename · 5219f346

由 Miklos Szeredi 提交于 11月 04, 2009

Invalidate the target's attributes, which may have changed (such as
nlink, change time) so that they are refreshed on the next getattr().
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

5219f346

fuse: fix kunmap in fuse_ioctl_copy_user · 0bd87182

由 Jens Axboe 提交于 11月 03, 2009

Looks like another victim of the confusing kmap() vs kmap_atomic() API
differences.
Reported-by: NTodor Gyumyushev <yodor1@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org

0bd87182

fuse: prevent fuse_put_request on invalid pointer · f60311d5

由 Anand V. Avati 提交于 10月 22, 2009

fuse_direct_io() has a loop where requests are allocated in each
iteration. if allocation fails, the loop is broken out and follows
into an unconditional fuse_put_request() on that invalid pointer.
Signed-off-by: NAnand V. Avati <avati@gluster.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@kernel.org

f60311d5

03 11月, 2009 4 次提交

nilfs2: add zero-fill for new btree node buffers · 05b4358a

由 Ryusuke Konishi 提交于 9月 14, 2009

Adds missing initialization of newly allocated b-tree node buffers.
This avoids garbage data to be mixed in b-tree node blocks.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

05b4358a

nilfs2: fix irregular checkpoint creation due to data flush · aeda7f63

由 Ryusuke Konishi 提交于 11月 02, 2009

When nilfs flushes out dirty data to reduce memory pressure, creation
of checkpoints is wrongly postponed.  This bug causes irregular
checkpoint creation especially in small footprint systems.

To correct this issue, a timer for the checkpoint creation has to be
continued if a log writer does not create a checkpoint.

This will do the correction.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

aeda7f63

nilfs2: fix dirty page accounting leak causing hang at write · b1e19e56

由 Ryusuke Konishi 提交于 11月 03, 2009

Bruno Prémont and Dunphy, Bill noticed me that NILFS will certainly
hang on ARM-based targets.

I found this was caused by an underflow of dirty pages counter.  A
b-tree cache routine was marking page dirty without adjusting page
account information.

This fixes the dirty page accounting leak and resolves the hang on
arm-based targets.
Reported-by: NBruno Prémont <bonbons@linux-vserver.org>
Reported-by: NDunphy, Bill <WDunphy@tandbergdata.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NBruno Prémont <bonbons@linux-vserver.org>
Cc: stable <stable@kernel.org>

b1e19e56

Revert "ext4: Remove journal_checksum mount option and enable it by default" · d4da6c9c

由 Linus Torvalds 提交于 11月 02, 2009

This reverts commit d0646f7b, as
requested by Eric Sandeen.

It can basically cause an ext4 filesystem to miss recovery (and thus get
mounted with errors) if the journal checksum does not match.

Quoth Eric:

   "My hand-wavy hunch about what is happening is that we're finding a
    bad checksum on the last partially-written transaction, which is
    not surprising, but if we have a wrapped log and we're doing the
    initial scan for head/tail, and we abort scanning on that bad
    checksum, then we are essentially running an unrecovered filesystem.

    But that's hand-wavy and I need to go look at the code.

    We lived without journal checksums on by default until now, and at
    this point they're doing more harm than good, so we should revert
    the default-changing commit until we can fix it and do some good
    power-fail testing with the fixes in place."

See

	http://bugzilla.kernel.org/show_bug.cgi?id=14354

for all the gory details.
Requested-by: NEric Sandeen <sandeen@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Alexey Fisher <bug-track@fisher-privat.net>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mathias Burén <mathias.buren@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4da6c9c

02 11月, 2009 5 次提交

9p: fix readdir corner cases · 3e2796a9

由 Eric Van Hensbergen 提交于 11月 02, 2009

The patch below also addresses a couple of other corner cases in readdir
seen with a large (e.g. 64k) msize. I'm not sure what people think of
my co-opting of fid->aux here. I'd be happy to rework if there's a better
way.

When the size of the user supplied buffer passed to readdir is smaller
than the data returned in one go by the 9P read request, v9fs_dir_readdir()
currently discards extra data so that, on the next call, a 9P read
request will be issued with offset < previous offset + bytes returned,
which voilates the constraint described in paragraph 3 of read(5) description.
This patch preseves the leftover data in fid->aux for use in the next call.
Signed-off-by: NJim Garlick <garlick@llnl.gov>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

3e2796a9

9p: fix readlink · 2511cd0b

由 Martin Stava 提交于 11月 02, 2009

I do not know if you've looked on the patch, but unfortunately it is
incorrect. A suggested better version is in this email (the old
version didn't work in case the user provided buffer was not long
enough - it incorrectly appended null byte on a position of last char,
and thus broke the contract of the readlink method). However, I'm
still not sure this is 100% correct thing to do, I think readlink is
supposed to return buffer without last null byte in all cases, but we
do return last null byte (even the old version).. on the other hand it
is likely unspecified what is in the remaining part of the buffer, so
null character may be fine there ;):
Signed-off-by: NMartin Stava <martin.stava@gmail.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

2511cd0b

9p: fix a small bug in readdir for long directories · f91b9099

由 Martin Stava 提交于 11月 02, 2009

Here is a proposed patch for bug in readdir. Listing of dirs with
many files fails without this patch.
Signed-off-by: NMartin Stava <martin.stava@gmail.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

f91b9099

Fix bio_alloc() and bio_kmalloc() documentation · 5f04eeb8

由 Alberto Bertogli 提交于 11月 02, 2009

Commit 451a9ebf accidentally broke bio_alloc() and bio_kmalloc() comments by
(almost) swapping them.

This patch fixes that, by placing the comments in the right place.
Signed-off-by: NAlberto Bertogli <albertito@blitiri.com.ar>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5f04eeb8

bio_put(): add bio_clone() to the list of functions in the comment · ad0bf110

由 Alberto Bertogli 提交于 11月 02, 2009

In bio_put()'s comment, add bio_clone() to the list of functions that can
give you a bio reference.
Signed-off-by: NAlberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ad0bf110

30 10月, 2009 3 次提交

xfs: fix xfs_quota remove error · c7ff91d7

由 Ryota Yamauchi 提交于 10月 30, 2009

The xfs_quota returns ENOSYS when remove command is executed.
Reproducable with following steps.

    # mount -t xfs -o uquota /dev/sda7 /mnt/mp1
    # xfs_quota -x -c off -c remove
    XFS_QUOTARM: Function not implemented.

The remove command is allowed during quotaoff, but xfs_fs_set_xstate()
checks whether quota is running, and it leads to ENOSYS.

To solve this problem, add a check for X_QUOTARM.
Signed-off-by: NRyota Yamauchi <r-yamauchi@vf.jp.nec.com>
Signed-off-by: NUtako Kusaka <u-kusaka@wm.jp.nec.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c7ff91d7

xfs: free temporary cursor in xfs_dialloc · 3b826386

由 Eric Sandeen 提交于 10月 30, 2009

Commit bd169565 seems
to have a slight regression where this code path:

    if (!--searchdistance) {
        /*
         * Not in range - save last search
         * location and allocate a new inode
         */
        ...
        goto newino;
    }

doesn't free the temporary cursor (tcur) that got dup'd in
this function.

This leaks an item in the xfs_btree_cur zone, and it's caught
on module unload:

===========================================================
BUG xfs_btree_cur: Objects remaining on kmem_cache_close()
-----------------------------------------------------------

It seems like maybe a single free at the end of the function might
be cleaner, but for now put a del_cursor right in this code block
similar to the handling in the rest of the function.
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3b826386

B
powerpc: Cleanup Kconfig selection of hugetlbfs support · 5a1eb5c4
由 Benjamin Herrenschmidt 提交于 10月 30, 2009
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
5a1eb5c4

29 10月, 2009 3 次提交

hfs: fix oops on mount with corrupted btree extent records · 47f365eb

由 Jeff Mahoney 提交于 10月 26, 2009

A particular fsfuzzer run caused an hfs file system to crash on mount.
This is due to a corrupted MDB extent record causing a miscalculation of
HFS_I(inode)->first_blocks for the extent tree.  If the extent records are
zereod out, it won't trigger the first_blocks special case.  Instead it
falls through to the extent code which we're still in the middle of
initializing.

This patch catches the 0 size extent records, reports the corruption, and
fails the mount.
Reported-by: NRamon de Carvalho Valle <rcvalle@linux.vnet.ibm.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

47f365eb

hfsplus: refuse to mount volumes larger than 2TB · 5c36fe3d

由 Ben Hutchings 提交于 10月 26, 2009

As found in <http://bugs.debian.org/550010>, hfsplus is using type u32
rather than sector_t for some sector number calculations.

In particular, hfsplus_get_block() does:

        u32 ablock, dblock, mask;
...
        map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask));

I am not confident that I can find and fix all cases where a sector number
may be truncated.  For now, avoid data loss by refusing to mount HFS+
volumes with more than 2^32 sectors (2TB).

[akpm@linux-foundation.org: fix 32 and 64-bit issues]
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Cc: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c36fe3d

hwpoison: fix/proc/meminfo alignment · 370c28de

由 Hugh Dickins 提交于 10月 26, 2009

Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted
line is being shown too far right (it does align with x86_64's VmallocChunk
above, but I hope nobody will ever have that much corrupted!). Align it.
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

370c28de

27 10月, 2009 2 次提交

powerpc: Limit hugetlbfs support to PPC64 Book-3S machines · 0cd9ad73

由 Kumar Gala 提交于 10月 16, 2009

Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0cd9ad73

sh: Fix hugetlbfs dependencies for SH-3 && MMU configurations. · ffb4a73d

由 Paul Mundt 提交于 10月 27, 2009

The hugetlb dependencies presently depend on SUPERH && MMU while the
hugetlb page size definitions depend on CPU_SH4 or CPU_SH5. This
unfortunately allows SH-3 + MMU configurations to enable hugetlbfs
without a corresponding HPAGE_SHIFT definition, resulting in the build
blowing up.

As SH-3 doesn't support variable page sizes, we tighten up the
dependenies a bit to prevent hugetlbfs from being enabled. These days
we also have a shiny new SYS_SUPPORTS_HUGETLBFS, so switch to using
that rather than adding to the list of corner cases in fs/Kconfig.
Reported-by: NKristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

ffb4a73d

26 10月, 2009 3 次提交

block: use after free bug in __blkdev_get · 960cc0f4

由 Neil Brown 提交于 10月 26, 2009

commit 0762b8bd
(from 14 months ago) introduced a use-after-free bug which has just
recently started manifesting in my md testing.
I tried git bisect to find out what caused the bug to start
manifesting, and it could have been the recent change to
blk_unregister_queue (48c0d4d4) but the results were inconclusive.

This patch certainly fixes my symptoms and looks correct as the two
calls are now in the same order as elsewhere in that function.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

960cc0f4

T
NFSv4: The link() operation should return any delegation on the file · 9a3936aa
由 Trond Myklebust 提交于 10月 26, 2009
```
Otherwise, we have to wait for the server to recall it.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
9a3936aa

NFSv4: Fix two unbalanced put_rpccred() issues. · 141aeb9f

由 Trond Myklebust 提交于 10月 26, 2009

Commits 29fba38b (nfs41: lease renewal) and fc01cea9 (nfs41: sequence
operation) introduce a couple of put_rpccred() calls on credentials for
which there is no corresponding get_rpccred().

See http://bugzilla.kernel.org/show_bug.cgi?id=14249Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

141aeb9f

24 10月, 2009 2 次提交

NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE · 52567b03

由 Trond Myklebust 提交于 10月 23, 2009

RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not
supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc
operations. The problem is that we map that error into EREMOTEIO in the XDR
layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(),
and nfs_increment_seqid() don't recognise it.

The fix is to defer the mapping until after the middle layers have
processed the error.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

52567b03

nfs: Panic when commit fails · a8b40bc7

由 Terry Loftin 提交于 10月 22, 2009

Actually pass the NFS_FILE_SYNC option to the server to avoid a
Panic in nfs_direct_write_complete() when a commit fails.

At the end of an nfs write, if the nfs commit fails, all the writes
will be rescheduled. They are supposed to be rescheduled as NFS_FILE_SYNC
writes, but the rpc_task structure is not completely intialized and so
the option is not passed. When the rescheduled writes complete, the
return indicates that they are NFS_UNSTABLE and we try to do another
commit. This leads to a Panic because the commit data structure pointer
was set to null in the initial (failed) commit attempt.
Signed-off-by: NTerry Loftin <terry.loftin@hp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a8b40bc7

22 10月, 2009 2 次提交

nfs: Fix nfs_parse_mount_options() kfree() leak · 4223a4a1

由 Yinghai Lu 提交于 10月 20, 2009

Fix a (small) memory leak in one of the error paths of the NFS mount
options parsing code.

Regression introduced in 2.6.30 by commit a67d18f8 (NFS: load the
rpc/rdma transport module automatically).
Reported-by: NYinghai Lu <yinghai@kernel.org>
Reported-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4223a4a1

fs: pipe.c null pointer dereference · ad396024

由 Earl Chew 提交于 10月 19, 2009

This patch fixes a null pointer exception in pipe_rdwr_open() which
generates the stack trace:

> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
>  [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
>  [<ffffffff8028125c>] __dentry_open+0x13c/0x230
>  [<ffffffff8028143d>] do_filp_open+0x2d/0x40
>  [<ffffffff802814aa>] do_sys_open+0x5a/0x100
>  [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67

The failure mode is triggered by an attempt to open an anonymous
pipe via /proc/pid/fd/* as exemplified by this script:

=============================================================
while : ; do
   { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
   PID=$!
   OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
        { read PID REST ; echo $PID; } )
   OUT="${OUT%% *}"
   DELAY=$((RANDOM * 1000 / 32768))
   usleep $((DELAY * 1000 + RANDOM % 1000 ))
   echo n > /proc/$OUT/fd/1                 # Trigger defect
done
=============================================================

Note that the failure window is quite small and I could only
reliably reproduce the defect by inserting a small delay
in pipe_rdwr_open(). For example:

 static int
 pipe_rdwr_open(struct inode *inode, struct file *filp)
 {
       msleep(100);
       mutex_lock(&inode->i_mutex);

Although the defect was observed in pipe_rdwr_open(), I think it
makes sense to replicate the change through all the pipe_*_open()
functions.

The core of the change is to verify that inode->i_pipe has not
been released before attempting to manipulate it. If inode->i_pipe
is no longer present, return ENOENT to indicate so.

The comment about potentially using atomic_t for i_pipe->readers
and i_pipe->writers has also been removed because it is no longer
relevant in this context. The inode->i_mutex lock must be used so
that inode->i_pipe can be dealt with correctly.
Signed-off-by: NEarl Chew <earl_chew@agilent.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad396024

21 10月, 2009 1 次提交

dnotify: ignore FS_EVENT_ON_CHILD · 94552684

由 Andreas Gruenbacher 提交于 10月 15, 2009

Mask off FS_EVENT_ON_CHILD in dnotify_handle_event().  Otherwise, when there
is more than one watch on a directory and dnotify_should_send_event()
succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause
spurious events.

This case was overlooked in commit e42e2773.

	#define _GNU_SOURCE

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <signal.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <fcntl.h>
	#include <string.h>

	static void create_event(int s, siginfo_t* si, void* p)
	{
		printf("create\n");
	}

	static void delete_event(int s, siginfo_t* si, void* p)
	{
		printf("delete\n");
	}

	int main (void) {
		struct sigaction action;
		char *tmpdir, *file;
		int fd1, fd2;

		sigemptyset (&action.sa_mask);
		action.sa_flags = SA_SIGINFO;

		action.sa_sigaction = create_event;
		sigaction (SIGRTMIN + 0, &action, NULL);

		action.sa_sigaction = delete_event;
		sigaction (SIGRTMIN + 1, &action, NULL);

	#	define TMPDIR "/tmp/test.XXXXXX"
		tmpdir = malloc(strlen(TMPDIR) + 1);
		strcpy(tmpdir, TMPDIR);
		mkdtemp(tmpdir);

	#	define TMPFILE "/file"
		file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1);
		sprintf(file, "%s/%s", tmpdir, TMPFILE);

		fd1 = open (tmpdir, O_RDONLY);
		fcntl(fd1, F_SETSIG, SIGRTMIN);
		fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE);

		fd2 = open (tmpdir, O_RDONLY);
		fcntl(fd2, F_SETSIG, SIGRTMIN + 1);
		fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE);

		if (fork()) {
			/* This triggers a create event */
			creat(file, 0600);
			/* This triggers a create and delete event (!) */
			unlink(file);
		} else {
			sleep(1);
			rmdir(tmpdir);
		}

		return 0;
	}
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

94552684

19 10月, 2009 3 次提交

HWPOISON: fix/proc/meminfo alignment · 5d5429af

由 Hugh Dickins 提交于 10月 13, 2009

5d5429af

inotify: fix coalesce duplicate events into a single event in special case · 3de0ef4f

由 Wei Yongjun 提交于 10月 14, 2009

If we do rename a dir entry, like this:

  rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2")
  rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ")

The duplicate events should be coalesced into a single event. But those two
events do not be coalesced into a single event, due to some bad check in
event_compare(). It can not match the two NULL inodes as the same event.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

3de0ef4f

fsnotify: do not set group for a mark before it is on the i_list · 9f0d793b

由 Eric Paris 提交于 9月 11, 2009

fsnotify_add_mark is supposed to add a mark to the g_list and i_list and to
set the group and inode for the mark.  fsnotify_destroy_mark_by_entry uses
the fact that ->group != NULL to know if this group should be destroyed or
if it's already been done.

But fsnotify_add_mark sets the group and inode before it actually adds the
mark to the i_list and g_list.  This can result in a race in inotify, it
requires 3 threads.

sys_inotify_add_watch("file")	sys_inotify_add_watch("file")	sys_inotify_rm_watch([a])
inotify_update_watch()
inotify_new_watch()
inotify_add_to_idr()
   ^--- returns wd = [a]
				inotfiy_update_watch()
				inotify_new_watch()
				inotify_add_to_idr()
				fsnotify_add_mark()
				   ^--- returns wd = [b]
				returns to userspace;
								inotify_idr_find([a])
								   ^--- gives us the pointer from task 1
fsnotify_add_mark()
   ^--- this is going to set the mark->group and mark->inode fields, but will
return -EEXIST because of the race with [b].
								fsnotify_destroy_mark()
								   ^--- since ->group != NULL we call back
									into inotify_freeing_mark() which calls
								inotify_remove_from_idr([a])

since fsnotify_add_mark() failed we call:
inotify_remove_from_idr([a])     <------WHOOPS it's not in the idr, this could
					have been any entry added later!

The fix is to make sure we don't set mark->group until we are sure the mark is
on the inode and fsnotify_add_mark will return success.
Signed-off-by: NEric Paris <eparis@redhat.com>

9f0d793b

15 10月, 2009 1 次提交

sysfs: Allow sysfs_notify_dirent to be called from interrupt context. · 83db93f4

由 Neil Brown 提交于 9月 15, 2009

sysfs_notify_dirent is a simple atomic operation that can be used to
alert user-space that new data can be read from a sysfs attribute.

Unfortunately it cannot currently be called from non-process context
because of its use of spin_lock which is sometimes taken with
interrupts enabled.

So change all lockers of sysfs_open_dirent_lock to disable interrupts,
thus making sysfs_notify_dirent safe to be called from non-process
context (as drivers/md does in md_safemode_timeout).

sysfs_get_open_dirent is (documented as being) only called from
process context, so it uses spin_lock_irq.  Other places
use spin_lock_irqsave.

The usage for sysfs_notify_dirent in md_safemode_timeout was
introduced in 2.6.28, so this patch is suitable for that and more
recent kernels.
Reported-by: NJoel Andres Granados <jgranado@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

83db93f4

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功