提交 · e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1 · openeuler / raspberrypi-kernel

23 5月, 2009 1 次提交

block: Do away with the notion of hardsect_size · e1defc4f

由 Martin K. Petersen 提交于 5月 22, 2009

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case.  The
sector size will be 4KB but the logical block size will remain
512-bytes.  Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e1defc4f

22 5月, 2009 2 次提交

nilfs2: fix memory leak in nilfs_ioctl_clean_segments · d5046853

由 Ryusuke Konishi 提交于 5月 22, 2009

This fixes a new memory leak problem in garbage collection.  The
problem was brought by the bugfix patch ("nilfs2: fix lock order
reversal in nilfs_clean_segments ioctl").

Thanks to Kentaro Suzuki for finding this problem.
Reported-by: NKentaro Suzuki <k_suzuki@ms.sylc.co.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

d5046853

[CIFS] fix posix open regression · 703a3b8e

由 Steve French 提交于 5月 21, 2009

Posix open code was not properly adding the file to the
list of open files.  Fix  allocating cifsFileInfo
more than once, and adding twice to flist and tlist.
Also fix mode setting to be done in one place in these
paths.
Signed-off-by: NSteve French <sfrench@us.ibm.com>
Reviewed-by: NShirish Pargaonkar <shirishp@us.ibm.com>
Tested-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NLuca Tettamanti <kronos.it@gmail.com>

703a3b8e

19 5月, 2009 4 次提交

cifs: fix pointer initialization and checks in cifs_follow_symlink (try #4) · 8b6427a2

由 Jeff Layton 提交于 5月 19, 2009

This is the third respin of the patch posted yesterday to fix the error
handling in cifs_follow_symlink. It also includes a fix for a bogus NULL
pointer check in CIFSSMBQueryUnixSymLink that Jeff Moyer spotted.

It's possible for CIFSSMBQueryUnixSymLink to return without setting
target_path to a valid pointer. If that happens then the current value
to which we're initializing this pointer could cause an oops when it's
kfree'd.

This patch is a little more comprehensive than the last patches. It
reorganizes cifs_follow_link a bit for (hopefully) better readability.
It should also eliminate the uneeded allocation of full_path on servers
without unix extensions (assuming they can get to this point anyway, of
which I'm not convinced).

On a side note, I'm not sure I agree with the logic of enabling this
query even when unix extensions are disabled on the client. It seems
like that should disable this as well. But, changing that is outside the
scope of this fix, so I've left it alone for now.
Reported-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@inraded.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

8b6427a2

splice: fix kmaps in default_file_splice_write() · b2858d7d

由 Miklos Szeredi 提交于 5月 19, 2009

Unfortunately multiple kmap() within a single thread are deadlockable,
so writing out multiple buffers with writev() isn't possible.

Change the implementation so that it does a separate write() for each
buffer.  This actually simplifies the code a lot since the
splice_from_pipe() helper can be used.

This limitation is caused by HIGHMEM pages, and so only affects a
subset of architectures and configurations.  In the future it may be
worth to implement default_file_splice_write() in a more efficient way
on configs that allow it.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b2858d7d

bio: always copy back data for copied kernel requests · 4fc981ef

由 Tejun Heo 提交于 5月 19, 2009

When a read bio_copy_kern() request fails, the content of the bounce
buffer is not copied back.  However, as request failure doesn't
necessarily mean complete failure, the buffer state can be useful.
This behavior is also inconsistent with the user map counterpart and
causes the subtle difference between bounced and unbounced IO causes
confusion.

This patch makes bio_copy_kern_endio() ignore @err and always copy
back data on request completion.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

4fc981ef

nfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission. · 7ee2cb7f

由 Frank Filz 提交于 5月 18, 2009

The problem is that permission checking is skipped if atomic open is
possible, but when exec opens a file, it just opens it O_READONLY which
means EXEC permission will not be checked at that time.

This problem is observed by the following sequence (executed as root):

  mount -t nfs4 server:/ /mnt4
  echo "ls" >/mnt4/foo
  chmod 744 /mnt4/foo
  su guest -c "mnt4/foo"
Signed-off-by: NFrank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Tested-by: NEugene Teo <eugeneteo@kernel.sg>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ee2cb7f

18 5月, 2009 3 次提交

reiserfs: fixup perms when xattrs are disabled · b83674c0

由 Jeff Mahoney 提交于 5月 17, 2009

This adds CONFIG_REISERFS_FS_XATTR protection from reiserfs_permission.

This is needed to avoid warnings during file deletions and chowns with
xattrs disabled.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b83674c0

reiserfs: deal with NULL xattr root w/ xattrs disabled · ceb5edc4

由 Jeff Mahoney 提交于 5月 17, 2009

This avoids an Oops in open_xa_root that can occur when deleting a file
with xattrs disabled.  It assumes that the xattr root will be there, and
that is not guaranteed.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ceb5edc4

reiserfs: clean up ifdefs · 12abb35a

由 Jeff Mahoney 提交于 5月 17, 2009

With xattr cleanup even with xattrs disabled, much of the initial setup
is still performed.  Some #ifdefs are just not needed since the options
they protect wouldn't be available anyway.

This cleans those up.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12abb35a

15 5月, 2009 9 次提交

devpts: correctly set default options · 1f71ebed

由 Sukadev Bhattiprolu 提交于 5月 14, 2009

devpts_get_sb() calls memset(0) to clear mount options and calls
parse_mount_options() if user specified any mount options.

The memset(0) is bogus since the 'mode' and 'ptmxmode' options are
non-zero by default.  parse_mount_options() restores options to default
anyway and can properly deal with NULL mount options.

So in devpts_get_sb() remove memset(0) and call parse_mount_options() even
for NULL mount options.

Bug reported by Eric Paris: http://lkml.org/lkml/2009/5/7/448.
Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Tested-by: NMarc Dionne <marc.c.dionne@gmail.com>
Reported-by: NEric Paris <eparis@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Reviewed-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f71ebed

ext4: Fix race in ext4_inode_info.i_cached_extent · 2ec0ae3a

由 Theodore Ts'o 提交于 5月 15, 2009

If two CPU's simultaneously call ext4_ext_get_blocks() at the same
time, there is nothing protecting the i_cached_extent structure from
being used and updated at the same time. This could potentially cause
the wrong location on disk to be read or written to, including
potentially causing the corruption of the block group descriptors
and/or inode table.

This bug has been in the ext4 code since almost the very beginning of
ext4's development. Fortunately once the data is stored in the page
cache cache, ext4_get_blocks() doesn't need to be called, so trying to
replicate this problem to the point where we could identify its root
cause was *extremely* difficult. Many thanks to Kevin Shanahan for
working over several months to be able to reproduce this easily so we
could finally nail down the cause of the corruption.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

2ec0ae3a

ext4: Clear the unwritten buffer_head flag after the extent is initialized · 2a8964d6

由 Aneesh Kumar K.V 提交于 5月 14, 2009

The BH_Unwritten flag indicates that the buffer is allocated on disk
but has not been written; that is, the disk was part of a persistent
preallocation area. That flag should only be set when a get_blocks()
function is looking up a inode's logical to physical block mapping.

When ext4_get_blocks_wrap() is called with create=1, the uninitialized
extent is converted into an initialized one, so the BH_Unwritten flag
is no longer appropriate. Hence, we need to make sure the
BH_Unwritten is not left set, since the combination of BH_Mapped and
BH_Unwritten is not allowed; among other things, it will result ext4's
get_block() to be called over and over again during the write_begin
phase of write(2).
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2a8964d6

S
Btrfs: Spelling fix in btrfs_lookup_first_block_group comments · 9f55684c
由 Sankar P 提交于 5月 14, 2009
```
Signed-off-by: NSankar P <sankar.curiosity@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
9f55684c

Btrfs: make show_options result match actual option names · 6b65c5c6

由 Sage Weil 提交于 5月 14, 2009

The notreelog and flushoncommit mount options were being printed slightly
differently.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6b65c5c6

Btrfs: remove outdated comment in btrfs_ioctl_resize() · 5d847a8e

由 Li Hong 提交于 5月 14, 2009

In Li Zefan's commit dae7b665,
a combination call of kmalloc() and copy_from_user() is replaced by
memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more.
Signed-off-by: NLi Hong <lihong.hi@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5d847a8e

Btrfs: remove some WARN_ONs in the IO failure path · cc7b0c9b

由 Chris Mason 提交于 5月 14, 2009

These debugging WARN_ONs make too much console noise during regular
IO failures. An IO failure will still generate a number of messages
as we verify checksums etc, but these two are not needed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cc7b0c9b

Btrfs: Don't loop forever on metadata IO failures · 76a05b35

由 Chris Mason 提交于 5月 14, 2009

When a btrfs metadata read fails, the first thing we try to do is find
a good copy on another mirror of the block.  If this fails, read_tree_block()
ends up returning a buffer that isn't up to date.

The btrfs btree reading code was reworked to drop locks and repeat
the search when IO was done, but the changes didn't add a check for failed
reads.  The end result was looping forever on buffers that were never
going to become up to date.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

76a05b35

Btrfs: init inode ordered_data_close flag properly · 2757495c

由 Chris Mason 提交于 5月 14, 2009

This flag is used to decide when we need to send a given file through
the ordered code to make sure it is fully written before a transaction
commits.  It was not being properly set to zero when the inode was
being setup.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2757495c

14 5月, 2009 2 次提交

cifs: fix error handling in parse_DFS_referrals · d8e2f53a

由 Jeff Layton 提交于 5月 14, 2009

cifs_strndup_from_ucs returns NULL on error, not an ERR_PTR
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

d8e2f53a

splice: fix error return code · 77f6bf57

由 Andrew Morton 提交于 5月 14, 2009

fs/splice.c: In function 'default_file_splice_read':
fs/splice.c:566: warning: 'error' may be used uninitialized in this function

which is sort-of true.  The code will in fact return -ENOMEM instead of the
kernel_readv() return value.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

77f6bf57

13 5月, 2009 7 次提交

Remove implementation of readpage from the hugetlbfs_aops · f2deae9d

由 Mel Gorman 提交于 5月 13, 2009

The core VM assumes the page size used by the address_space in
inode->i_mapping is PAGE_SIZE but hugetlbfs breaks this assumption by
inserting pages into the page cache at offsets the core VM considers
unexpected.

This would not be a problem except that hugetlbfs also provide a
->readpage implementation.  As it exists, the core VM can assume the
base page size is being used, allocate pages on behalf of the
filesystem, insert them into the page cache and call ->readpage to
populate them.  These pages are the wrong size and at the wrong offset
for hugetlbfs causing confusion.

This patch deletes the ->readpage implementation for hugetlbfs on the
grounds the core VM should not be allocating and populating pages on
behalf of hugetlbfs.  There should be no existing users of the
->readpage implementation so it should not cause a regression.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2deae9d

splice: fix repeated kmap()'s in default_file_splice_read() · 4f231228

由 Jens Axboe 提交于 5月 13, 2009

We cannot reliably map more than one page at the time, or we risk
deadlocking. Just allocate the pages from low mem instead.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

4f231228

P
Squashfs: cody tidying, remove commented out line in Makefile · e5d28753
由 Phillip Lougher 提交于 5月 13, 2009
```
Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
```
e5d28753

Squashfs: check page size is not larger than the filesystem block size · fffb47b8

由 Phillip Lougher 提交于 5月 13, 2009

Normally the block size (by default 128K) will be larger than the
page size, unless a non-standard block size has been specified in
Mksquashfs, and the page size is larger than 4K.
Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>

fffb47b8

Squashfs: fix breakage when page size > metadata block size · a37b06d5

由 Doug Chapman 提交于 5月 13, 2009

Squashfs is broken on any system where the page size is larger than
the metadata size (8192).  This is easily fixed by ensuring cache->pages
is always > 0.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NDoug Chapman <doug.chapman@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>

a37b06d5

epoll: fix size check in epoll_create() · bfe3891a

由 Davide Libenzi 提交于 5月 12, 2009

Fix a size check WRT the manual pages.  This was inadvertently broken by
commit 9fe5ad9c ("flag parameters
add-on: remove epoll_create size param").
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: <Hiroyuki.Mach@gmail.com>
Cc: rohit verma <rohit.170309@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bfe3891a

ext4: Use a fake block number for delayed new buffer_head · 33b9817e

由 Aneesh Kumar K.V 提交于 5月 12, 2009

Use a very large unsigned number (~0xffff) as as the fake block number
for the delayed new buffer. The VFS should never try to write out this
number, but if it does, this will make it obvious.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

33b9817e

14 5月, 2009 1 次提交

ext4: Fix sub-block zeroing for writes into preallocated extents · 9c1ee184

由 Aneesh Kumar K.V 提交于 5月 13, 2009

We need to mark the buffer_head mapping preallocated space as new
during write_begin. Otherwise we don't zero out the page cache content
properly for a partial write. This will cause file corruption with
preallocation.

Now that we mark the buffer_head new we also need to have a valid
buffer_head blocknr so that unmap_underlying_metadata() unmaps the
correct block.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9c1ee184

12 5月, 2009 3 次提交

J
nfsd: silence lockdep warning · 8daed1e5
由 J. Bruce Fields 提交于 5月 11, 2009
```
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
8daed1e5

dup2: Fix return value with oldfd == newfd and invalid fd · 2b79bc4f

由 Jeff Mahoney 提交于 5月 11, 2009

The return value of dup2 when oldfd == newfd and the fd isn't valid is
not getting properly sign extended.  We end up with 4294967287 instead
of -EBADF.

I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory
(2.6.29-rc5), and Ubuntu 9.04 (2.6.28).

This patch uses a signed int for the error value so it is properly
extended.

Commit 6c5d0512 introduced this
regression.
Reported-by: NJiri Dluhos <jdluhos@novell.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b79bc4f

nilfs2: check size of array structured data exchanged via ioctls · 83aca8f4

由 Ryusuke Konishi 提交于 5月 11, 2009

Although some ioctls of nilfs2 exchange data in the form of indirectly
referenced array, some of them lack size check on the array elements.

This inserts the missing checks and rejects requests if data of ioctl
does not have a valid format.

We usually don't have to check size of structures that we associated
with ioctl commands because the size is tested implicitly for
identifying ioctl command; the checks this patch adds are for the
cases where the implicit check is not applied.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

83aca8f4

11 5月, 2009 6 次提交

splice: implement default splice_write method · 0b0a47f5

由 Miklos Szeredi 提交于 5月 07, 2009

If f_op->splice_write() is not implemented, fall back to a plain write.
Use vfs_writev() to write from the pipe buffers.

This will allow splice on all filesystems and file types.  This
includes "direct_io" files in fuse which bypass the page cache.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0b0a47f5

splice: implement default splice_read method · 6818173b

由 Miklos Szeredi 提交于 5月 07, 2009

If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.

This will allow splice and functions using splice, such as the loop
device, to work on all filesystems.  This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6818173b

splice: implement pipe to pipe splicing · 7c77f0b3

由 Miklos Szeredi 提交于 5月 07, 2009

Allow splice(2) to work when both the input and the output is a pipe.

Based on the impementation of the tee(2) syscall, but instead of
duplicating the buffer references move the buffers from the input pipe
to the output pipe.

Moving the whole buffer only succeeds if the full length of the buffer
is spliced.  Otherwise duplicate the buffer, just like tee(2), set the
length of the output buffer and advance the offset on the input
buffer.

Since splice is operating on two pipes, special care needs to be taken
with locking to prevent AN ABBA deadlock.  Again this is done
similarly to the tee(2) syscall, first preparing the input and output
pipes so there's data to consume and space for that data, and then
doing the move operation while holding both locks.

If other processes are doing I/O on the same pipes parallel to the
splice, then by the time both inodes are locked there might be no
buffers left to move, or no space to move them to.  In this case retry
the whole operation, including the preparation phase.  This could lead
to starvation, but I'm not sure if that's serious enough to worry
about.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7c77f0b3

block: add rq->resid_len · c3a4d78c

由 Tejun Heo 提交于 5月 07, 2009

rq->data_len served two purposes - the length of data buffer on issue
and the residual count on completion.  This duality creates some
headaches.

First of all, block layer and low level drivers can't really determine
what rq->data_len contains while a request is executing.  It could be
the total request length or it coulde be anything else one of the
lower layers is using to keep track of residual count.  This
complicates things because blk_rq_bytes() and thus
[__]blk_end_request_all() relies on rq->data_len for PC commands.
Drivers which want to report residual count should first cache the
total request length, update rq->data_len and then complete the
request with the cached data length.

Secondly, it makes requests default to reporting full residual count,
ie. reporting that no data transfer occurred.  The residual count is
an exception not the norm; however, the driver should clear
rq->data_len to zero to signify the normal cases while leaving it
alone means no data transfer occurred at all.  This reverse default
behavior complicates code unnecessarily and renders block PC on some
drivers (ide-tape/floppy) unuseable.

This patch adds rq->resid_len which is used only for residual count.

While at it, remove now unnecessasry blk_rq_bytes() caching in
ide_pc_intr() as rq->data_len is not changed anymore.

Boaz	: spotted missing conversion in osd
Sergei	: spotted too early conversion to blk_rq_bytes() in ide-tape

[ Impact: cleanup residual count handling, report 0 resid by default ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c3a4d78c

nilfs2: fix lock order reversal in nilfs_clean_segments ioctl · 4f6b8288

由 Ryusuke Konishi 提交于 5月 10, 2009

This is a companion patch to ("nilfs2: fix possible circular locking
for get information ioctls").

This corrects lock order reversal between mm->mmap_sem and
nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by
lockdep check:

 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.30-rc3-nilfs-00003-g360bdc1 #7
 -------------------------------------------------------
 mmap/5294 is trying to acquire lock:
  (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

 but task is already holding lock:
  (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&mm->mmap_sem){++++++}:
        [<c01470a5>] __lock_acquire+0x1066/0x13b0
        [<c01474a9>] lock_acquire+0xba/0xdd
        [<c01836bc>] might_fault+0x68/0x88
        [<c023c61d>] copy_from_user+0x2a/0x111
        [<d0d120d0>] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2]
        [<d0d0e2aa>] nilfs_clean_segments+0x6d/0x1b9 [nilfs2]
        [<d0d11f68>] nilfs_ioctl+0x2ad/0x318 [nilfs2]
        [<c01a3be7>] vfs_ioctl+0x22/0x69
        [<c01a408e>] do_vfs_ioctl+0x460/0x499
        [<c01a4107>] sys_ioctl+0x40/0x5a
        [<c01031a4>] sysenter_do_call+0x12/0x38
        [<ffffffff>] 0xffffffff

 -> #0 (&nilfs->ns_segctor_sem){++++.+}:
        [<c0146e0b>] __lock_acquire+0xdcc/0x13b0
        [<c01474a9>] lock_acquire+0xba/0xdd
        [<c0433f1d>] down_read+0x2a/0x3e
        [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
        [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
        [<c0183b0b>] __do_fault+0x165/0x376
        [<c01855cd>] handle_mm_fault+0x287/0x5d1
        [<c043712d>] do_page_fault+0x2fb/0x30a
        [<c0435462>] error_code+0x72/0x78
        [<ffffffff>] 0xffffffff

where nilfs_clean_segments() holds:

  nilfs->ns_segctor_sem -> copy_from_user()
                             --> page fault -> mm->mmap_sem

And, page fault path may hold:

  page fault -> mm->mmap_sem
         --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem

Even though nilfs_clean_segments() does not perform write access on
given user pages, it may cause deadlock because nilfs->ns_segctor_sem
is shared per device and mm->mmap_sem can be shared with other tasks.

To avoid this problem, this patch moves all calls of copy_from_user()
outside the nilfs->ns_segctor_sem lock in the ioctl.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

4f6b8288

nilfs2: fix possible circular locking for get information ioctls · 47eb6b9c

由 Ryusuke Konishi 提交于 4月 30, 2009

This is one of two patches which are to correct possible circular
locking between mm->mmap_sem and nilfs->ns_segctor_sem.

The problem was detected by lockdep check as follows:

 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.30-rc3-nilfs-00002-g3552613 #6
 -------------------------------------------------------
 mmap/5418 is trying to acquire lock:
 (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

 but task is already holding lock:
 (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&mm->mmap_sem){++++++}:
 [<c01470a5>] __lock_acquire+0x1066/0x13b0
 [<c01474a9>] lock_acquire+0xba/0xdd
 [<c01836bc>] might_fault+0x68/0x88
 [<c023c730>] copy_to_user+0x2c/0xfc
 [<d0d11b4f>] nilfs_ioctl_wrap_copy+0x103/0x160 [nilfs2]
 [<d0d11fa9>] nilfs_ioctl+0x30a/0x3b0 [nilfs2]
 [<c01a3be7>] vfs_ioctl+0x22/0x69
 [<c01a408e>] do_vfs_ioctl+0x460/0x499
 [<c01a4107>] sys_ioctl+0x40/0x5a
 [<c01031a4>] sysenter_do_call+0x12/0x38
 [<ffffffff>] 0xffffffff

 -> #0 (&nilfs->ns_segctor_sem){++++.+}:
 [<c0146e0b>] __lock_acquire+0xdcc/0x13b0
 [<c01474a9>] lock_acquire+0xba/0xdd
 [<c0433f1d>] down_read+0x2a/0x3e
 [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
 [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
 [<c0183b0b>] __do_fault+0x165/0x376
 [<c01855cd>] handle_mm_fault+0x287/0x5d1
 [<c043712d>] do_page_fault+0x2fb/0x30a
 [<c0435462>] error_code+0x72/0x78
 [<ffffffff>] 0xffffffff

 other info that might help us debug this:

 1 lock held by mmap/5418:
 #0:  (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a

 stack backtrace:
 Pid: 5418, comm: mmap Not tainted 2.6.30-rc3-nilfs-00002-g3552613 #6
 Call Trace:
 [<c0432145>] ? printk+0xf/0x12
 [<c0145c48>] print_circular_bug_tail+0xaa/0xb5
 [<c0146e0b>] __lock_acquire+0xdcc/0x13b0
 [<d0d10149>] ? nilfs_sufile_get_stat+0x1e/0x105 [nilfs2]
 [<c013b59a>] ? up_read+0x16/0x2c
 [<d0d10225>] ? nilfs_sufile_get_stat+0xfa/0x105 [nilfs2]
 [<c01474a9>] lock_acquire+0xba/0xdd
 [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
 [<c0433f1d>] down_read+0x2a/0x3e
 [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
 [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
 [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
 [<c0183b0b>] __do_fault+0x165/0x376
 [<c01855cd>] handle_mm_fault+0x287/0x5d1
 [<c043700a>] ? do_page_fault+0x1d8/0x30a
 [<c013b54f>] ? down_read_trylock+0x39/0x43
 [<c043712d>] do_page_fault+0x2fb/0x30a
 [<c0436e32>] ? do_page_fault+0x0/0x30a
 [<c0435462>] error_code+0x72/0x78
 [<c0436e32>] ? do_page_fault+0x0/0x30a

This makes the lock granularity of nilfs->ns_segctor_sem finer than
that of the mmap semaphore for ioctl commands except
nilfs_clean_segments().

The successive patch ("nilfs2: fix lock order reversal in
nilfs_clean_segments ioctl") is required to fully resolve the problem.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

47eb6b9c

10 5月, 2009 1 次提交

nilfs2: ensure to clear dirty state when deleting metadata file block · 84338237

由 Ryusuke Konishi 提交于 5月 05, 2009

This would fix the following failure during GC:

 nilfs_cpfile_delete_checkpoints: cannot delete block
 NILFS: GC failed during preparation: cannot delete checkpoints: err=-2

The problem was caused by a break in state consistency between page
cache and btree; the above block was removed from the btree but the
page buffering the block was remaining in the page cache in dirty
state.

This resolves the inconsistency by ensuring to clear dirty state of
the page buffering the deleted block.
Reported-by: NDavid Arendt <admin@prnet.org>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

84338237

09 5月, 2009 1 次提交

Fix races around the access to ->s_options · 2a32cebd

由 Al Viro 提交于 5月 08, 2009

Put generic_show_options read access to s_options under rcu_read_lock,
split save_mount_options() into "we are setting it the first time"
(uses in foo_fill_super()) and "we are relacing and freeing the old one",
synchronize_rcu() before kfree() in the latter.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a32cebd