- 12 3月, 2011 2 次提交
-
-
由 Trond Myklebust 提交于
We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 11 3月, 2011 8 次提交
-
-
由 Andy Adamson 提交于
Signed-off-by: NAndy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Andy Adamson 提交于
Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Ricardo Labiaga 提交于
Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Frank Filz 提交于
(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: NFrank Filz <ffilzlnx@us.ibm.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Jovi Zhang 提交于
this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: NJovi Zhang <bookjovi@gmail.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Stanislav Fomichev 提交于
add kmalloc return value check in decode_and_add_ds Signed-off-by: NStanislav Fomichev <kernel@fomichev.me> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Jeff Layton 提交于
I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 05 3月, 2011 2 次提交
-
-
由 Neil Horman 提交于
The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sage Weil 提交于
Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 04 3月, 2011 3 次提交
-
-
由 Sage Weil 提交于
First, this was racy anyway: d_release isn't called until well after the dentry is unhashed. Second, this runs afoul of the recent dcache change that clears d_parent prior to calling d_release (949854d0), causing a NULL pointer dereference. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This reverts commit 97d79b40. This fails to account for d_parent changes due to rename or disconnected dentries due to submounts or NFS reexports. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 03 3月, 2011 9 次提交
-
-
由 Al Viro 提交于
merge hfs_unlink() and hfs_rmdir(), while we are at it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
(256 << sizeof(x)) - 1 is not the maximal possible value of x... In reality, the maximal allowed value for UDF FileLinkCount is 65535. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
if directory has so many subdirectories that its link count is set to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails, we shouldn't decrement the parent's link count in cleanup path; that's what DEC_DIR_INODE_NLINK() is for. As it is, we end up with parent suddenly getting zero i_nlink, with very unpleasant effects. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Paul Bolle 提交于
This message looks like an error (which it isn't) when booting with a flattened device tree. Remove the message from normal kernel builds. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 02 3月, 2011 3 次提交
-
-
由 Josh Hunt 提交于
vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt it as reported and analyzed by Josh. In fact, there is no good reason to mess with i_nlink of the moved file. We did it presumably to simulate linking into the new directory and unlinking from an old one. But the practical effect of this is disputable because fsck can possibly treat file as being properly linked into both directories without writing any error which is confusing. So we just stop increment-decrement games with i_nlink which also fixes the corruption. CC: stable@kernel.org CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NJosh Hunt <johunt@akamai.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Alex Elder 提交于
Commit 493f3358 added this call to xfs_fs_geometry() in order to avoid passing kernel stack data back to user space: + memset(geo, 0, sizeof(*geo)); Unfortunately, one of the callers of that function passes the address of a smaller data type, cast to fit the type that xfs_fs_geometry() requires. As a result, this can happen: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: f87aca93 Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358+ #1 Call Trace: [<c12991ac>] ? panic+0x50/0x150 [<c102ed71>] ? __stack_chk_fail+0x10/0x18 [<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs] Fix this by fixing that one caller to pass the right type and then copy out the subset it is interested in. Note: This patch is an alternative to one originally proposed by Eric Sandeen. Reported-by: NJeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Signed-off-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Tested-by: NJeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
-
由 Ryusuke Konishi 提交于
According to the report from Jiro SEKIBA titled "regression in 2.6.37?" (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and later kernels, lscp command no longer displays "i" flag on checkpoints that snapshot operations or garbage collection created. This is a regression of nilfs2 checkpointing function, and it's critical since it broke behavior of a part of nilfs2 applications. For instance, snapshot manager of TimeBrowse gets to create meaningless snapshots continuously; snapshot creation triggers another checkpoint, but applications cannot distinguish whether the new checkpoint contains meaningful changes or not without the i-flag. This patch fixes the regression and brings that application behavior back to normal. Reported-by: NJiro SEKIBA <jir@unicus.jp> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: NJiro SEKIBA <jir@unicus.jp> Cc: stable <stable@kernel.org> [2.6.37]
-
- 01 3月, 2011 1 次提交
-
-
由 Randy Dunlap 提交于
Fix new kernel-doc warning in fs/block_dev.c: Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 2月, 2011 5 次提交
-
-
由 Jan Kara 提交于
A race can occur when io_submit() races with io_destroy(): CPU1 CPU2 io_submit() do_io_submit() ... ctx = lookup_ioctx(ctx_id); io_destroy() Now do_io_submit() holds the last reference to ctx. ... queue new AIO put_ioctx(ctx) - frees ctx with active AIOs We solve this issue by checking whether ctx is being destroyed in AIO submission path after adding new AIO to ctx. Then we are guaranteed that either io_destroy() waits for new AIO or we see that ctx is being destroyed and bail out. Cc: Nick Piggin <npiggin@kernel.dk> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
aio-dio-invalidate-failure GPFs in aio_put_req from io_submit. lookup_ioctx doesn't implement the rcu lookup pattern properly. rcu_read_lock does not prevent refcount going to zero, so we might take a refcount on a zero count ioctx. Fix the bug by atomically testing for zero refcount before incrementing. [jack@suse.cz: added comment into the code] Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Timo Warns 提交于
The kernel automatically evaluates partition tables of storage devices. The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains a bug that causes a kernel oops on certain corrupted LDM partitions. A kernel subsystem seems to crash, because, after the oops, the kernel no longer recognizes newly connected storage devices. The patch changes ldm_parse_vmdb() to Validate the value of vblk_size. Signed-off-by: NTimo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Acked-by: NRichard Russon <ldm@flatcap.org> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davide Libenzi 提交于
In several places, an epoll fd can call another file's ->f_op->poll() method with ep->mtx held. This is in general unsafe, because that other file could itself be an epoll fd that contains the original epoll fd. The code defends against this possibility in its own ->poll() method using ep_call_nested, but there are several other unsafe calls to ->poll elsewhere that can be made to deadlock. For example, the following simple program causes the call in ep_insert recursively call the original fd's ->poll, leading to deadlock: #include <unistd.h> #include <sys/epoll.h> int main(void) { int e1, e2, p[2]; struct epoll_event evt = { .events = EPOLLIN }; e1 = epoll_create(1); e2 = epoll_create(2); pipe(p); epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt); epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt); write(p[1], p, sizeof p); epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); return 0; } On insertion, check whether the inserted file is itself a struct epoll, and if so, do a recursive walk to detect whether inserting this file would create a loop of epoll structures, which could lead to deadlock. [nelhage@ksplice.com: Use epmutex to serialize concurrent inserts] Signed-off-by: NDavide Libenzi <davidel@xmailserver.org> Signed-off-by: NNelson Elhage <nelhage@ksplice.com> Reported-by: NNelson Elhage <nelhage@ksplice.com> Tested-by: NNelson Elhage <nelhage@ksplice.com> Cc: <stable@kernel.org> [2.6.34+, possibly earlier] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Blanchard 提交于
I'm seeing the following oops when testing afs: Unable to handle kernel paging request for data at address 0x00000008 ... NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0 LR [c00000000033987c] .afs_put_writeback+0x98/0xec Call Trace: [c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec [c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c [c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320 [c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4 [c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc [c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178 [c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128 [c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8 [c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0 [c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40 afs_write_begin hits an error and calls afs_unlink_writeback. In there we do list_del_init on an uninitialised list. The patch below initialises ->link when creating the afs_writeback struct. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 2月, 2011 3 次提交
-
-
由 Miklos Szeredi 提交于
Commit e1181ee6 "vfs: pass struct file to do_truncate on O_TRUNC opens" broke the behavior of open(O_TRUNC|O_RDONLY) in fuse. Fuse assumed that when called from open, a truncate() will be done, not an ftruncate(). Fix by restoring the old behavior, based on the ATTR_OPEN flag. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Single threaded NTFS-3G could get stuck if a delayed RELEASE reply triggered a DESTROY request via path_put(). Fix this by a) making RELEASE requests synchronous, whenever possible, on fuseblk filesystems b) if not possible (triggered by an asynchronous read/write) then do the path_put() in a separate thread with schedule_work(). Reported-by: NOliver Neukum <oneukum@suse.de> Cc: stable@kernel.org Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Tejun Heo 提交于
The new implementation of bd_link_disk_holder() added by 49731baa (block: restore multiple bd_link_disk_holder() support) didn't get an extra reference for the holder_dir kobject of the slave bdev; however, bdev kills holder_dir on removal, not release, so if the slave bdev is removed while there are holder links, the holder_dir will be destroyed while there still are holder links, which leads to oops later when bd_unlink_disk_order() tries to remove those links. Make bd_link_disk_holder() grab an extra reference for the slave's holder_dir and put it in bd_unlink_disk_holder(). Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: N"Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com> Tested-by: N"Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 2月, 2011 4 次提交
-
-
由 J. R. Okajima 提交于
By the commit b3e19d92 2011-01-07 fs: scale mntget/mntput vfsmount_lock was introduced around testing mnt_count. Fix the mis-typed 'unlock' Signed-off-by: NJ. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 NeilBrown 提交于
There are two cases when we call flush_disk. In one, the device has disappeared (check_disk_change) so any data will hold becomes irrelevant. In the oter, the device has changed size (check_disk_size_change) so data we hold may be irrelevant. In both cases it makes sense to discard any 'clean' buffers, so they will be read back from the device if needed. In the former case it makes sense to discard 'dirty' buffers as there will never be anywhere safe to write the data. In the second case it *does*not* make sense to discard dirty buffers as that will lead to file system corruption when you simply enlarge the containing devices. flush_disk calls __invalidate_devices. __invalidate_device calls both invalidate_inodes and invalidate_bdev. invalidate_inodes *does* discard I_DIRTY inodes and this does lead to fs corruption. invalidate_bev *does*not* discard dirty pages, but I don't really care about that at present. So this patch adds a flag to __invalidate_device (calling it __invalidate_device2) to indicate whether dirty buffers should be killed, and this is passed to invalidate_inodes which can choose to skip dirty inodes. flusk_disk then passes true from check_disk_change and false from check_disk_size_change. dm avoids tripping over this problem by calling i_size_write directly rathher than using check_disk_size_change. md does use check_disk_size_change and so is affected. This regression was introduced by commit 608aeef1 which causes check_disk_size_change to call flush_disk, so it is suitable for any kernel since 2.6.27. Cc: stable@kernel.org Acked-by: NJeff Moyer <jmoyer@redhat.com> Cc: Andrew Patterson <andrew.patterson@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Miklos Szeredi 提交于
Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Reported-by: NMichael Leun <lkml20101129@newton.leun.net> Reported-by: NGurudas Pai <gurudas.pai@oracle.com> Tested-by: NGurudas Pai <gurudas.pai@oracle.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Mason 提交于
The Btrfs fiemap code wasn't properly returning delalloc extents, so applications that trust fiemap to decide if there are holes in the file see holes instead of delalloc. This reworks the btrfs fiemap code, adding a get_extent helper that searches for delalloc ranges and also adding a helper for extent_fiemap that skips past holes in the file. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-