- 20 11月, 2009 9 次提交
-
-
由 David Howells 提交于
FS-Cache has two structs internally for keeping track of the internal state of a cached file: the fscache_cookie struct, which represents the netfs's state, and fscache_object struct, which represents the cache's state. Each has a pointer that points to the other (when both are in existence), and each has a spinlock for pointer maintenance. Since netfs operations approach these structures from the cookie side, they get the cookie lock first, then the object lock. Cache operations, on the other hand, approach from the object side, and get the object lock first. It is not then permitted for a cache operation to get the cookie lock whilst it is holding the object lock lest deadlock occur; instead, it must do one of two things: (1) increment the cookie usage counter, drop the object lock and then get both locks in order, or (2) simply hold the object lock as certain parts of the cookie may not be altered whilst the object lock is held. It is also not permitted to follow either pointer without holding the lock at the end you start with. To break the pointers between the cookie and the object, both locks must be held. fscache_write_op(), however, violates the locking rules: It attempts to get the cookie lock without (a) checking that the cookie pointer is a valid pointer, and (b) holding the object lock to protect the cookie pointer whilst it follows it. This is so that it can access the pending page store tree without interference from __fscache_write_page(). This is fixed by splitting the cookie lock, such that the page store tracking tree is protected by its own lock, and checking that the cookie pointer is non-NULL before we attempt to follow it whilst holding the object lock. The new lock is subordinate to both the cookie lock and the object lock, and so should be taken after those. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
The object-available state in the object processing state machine (as processed by fscache_object_available()) can't rely on the cookie to be available because the FSCACHE_COOKIE_CREATING bit may have been cleared by fscache_obtained_object() prior to the object being put into the FSCACHE_OBJECT_AVAILABLE state. Clearing the FSCACHE_COOKIE_CREATING bit on a cookie permits __fscache_relinquish_cookie() to proceed and detach the cookie from the object. To deal with this, we don't dereference object->cookie in fscache_object_available() if the object has already been detached. In addition, a couple of assertions are added into fscache_drop_object() to make sure the object is unbound from the cookie before it gets there. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Permit the operations to retrieve data from the cache or to allocate space in the cache for future writes to be interrupted whilst they're waiting for permission for the operation to proceed. Typically this wait occurs whilst the cache object is being looked up on disk in the background. If an interruption occurs, and the operation has not yet been given the go-ahead to run, the operation is dequeued and cancelled, and control returns to the read operation of the netfs routine with none of the requested pages having been read or in any way marked as known by the cache. This means that the initial wait is done interruptibly rather than uninterruptibly. In addition, extra stats values are made available to show the number of ops cancelled and the number of cache space allocations interrupted. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
__fscache_write_page() attempts to load the radix tree preallocation pool for the CPU it is on before calling radix_tree_insert(), as the insertion must be done inside a pair of spinlocks. Use of the preallocation pool, however, is contingent on the radix tree being initialised without __GFP_WAIT specified. __fscache_acquire_cookie() was passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT. The solution is to AND out __GFP_WAIT. Additionally, the banner comment to radix_tree_preload() is altered to make note of this prerequisite. Possibly there should be a WARN_ON() too. Without this fix, I have seen the following recursive deadlock caused by radix_tree_insert() attempting to allocate memory inside the spinlocked region, which resulted in FS-Cache being called back into to release memory - which required the spinlock already held. ============================================= [ INFO: possible recursive locking detected ] 2.6.32-rc6-cachefs #24 --------------------------------------------- nfsiod/7916 is trying to acquire lock: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache] but task is already holding lock: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache] other info that might help us debug this: 5 locks held by nfsiod/7916: #0: (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2 #1: (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2 #2: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache] #3: (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache] #4: (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache] stack backtrace: Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24 Call Trace: [<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3 [<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3 [<ffffffff8100e27d>] ? dump_trace+0x248/0x257 [<ffffffff8105ad70>] lock_acquire+0x57/0x6d [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache] [<ffffffff8135467c>] _spin_lock+0x2c/0x3b [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache] [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache] [<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache] [<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs] [<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs] [<ffffffff81087ffb>] try_to_release_page+0x32/0x3b [<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70 [<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31 [<ffffffff81093153>] shrink_inactive_list+0x392/0x67c [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70 [<ffffffff810934ca>] shrink_list+0x8d/0x8f [<ffffffff81093744>] shrink_zone+0x278/0x33c [<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba [<ffffffff8109453b>] try_to_free_pages+0x22e/0x392 [<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212 [<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf [<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1 [<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c [<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118 [<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c [<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c [<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache] [<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs] [<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs] [<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs] [<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc] [<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc] [<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc] [<ffffffff810482e5>] worker_thread+0x1ef/0x2e2 [<ffffffff81048290>] ? worker_thread+0x19a/0x2e2 [<ffffffff81352433>] ? thread_return+0x3e/0x101 [<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc] [<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34 [<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf [<ffffffff810480f6>] ? worker_thread+0x0/0x2e2 [<ffffffff8104bd21>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44 [<ffffffff8104bca7>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Clear the pointers from the fscache_cookie struct to netfs private data after clearing the pointer to the cookie from the fscache_object struct and releasing the object lock, rather than before. This allows the netfs private data pointers to be relied on simply by holding the object lock, rather than having to hold the cookie lock. This is makes things simpler as the cookie lock has to be taken before the object lock, but sometimes the object pointer is all that the code has. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Count entries to and exits from cache operation table functions. Maintain these as a single counter that's added to or removed from as appropriate. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Allow the current state of all fscache objects to be dumped by doing: cat /proc/fs/fscache/objects By default, all objects and all fields will be shown. This can be restricted by adding a suitable key to one of the caller's keyrings (such as the session keyring): keyctl add user fscache:objlist "<restrictions>" @s The <restrictions> are: K Show hexdump of object key (don't show if not given) A Show hexdump of object aux data (don't show if not given) And paired restrictions: C Show objects that have a cookie c Show objects that don't have a cookie B Show objects that are busy b Show objects that aren't busy W Show objects that have pending writes w Show objects that don't have pending writes R Show objects that have outstanding reads r Show objects that don't have outstanding reads S Show objects that have slow work queued s Show objects that don't have slow work queued If neither side of a restriction pair is given, then both are implied. For example: keyctl add user fscache:objlist KB @s shows objects that are busy, and lists their object keys, but does not dump their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is not implied. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Annotate slow-work runqueue proc lines for FS-Cache work items. Objects include the object ID and the state. Operations include the object ID, the operation ID and the operation type and state. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Wait for outstanding slow work items belonging to a module to clear when unregistering that module as a user of the facility. This prevents the put_ref code of a work item from being taken away before it returns. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 18 11月, 2009 4 次提交
-
-
由 Peter Zijlstra 提交于
This is for consistency with various ioctl() operations that include the suffix "PGRP" in their names, and also for consistency with PRIO_PGRP, used with setpriority() and getpriority(). Also, using PGRP instead of GID avoids confusion with the common abbreviation of "group ID". I'm fine with anything that makes it more consistent, and if PGRP is what is the predominant abbreviation then I see no need to further confuse matters by adding a third one. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stefani Seibold 提交于
Fix a small issue for the stack pointer in /proc/<pid>/stat. In case of a kernel thread the value of the printed stack pointer should be 0. Signed-off-by: NStefani Seibold <stefani@seibold.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nathaniel W. Turner 提交于
Access to log items on the AIL is generally protected by m_ail_lock; this is particularly needed when we're getting or setting the 64-bit li_lsn on a 32-bit platform. This patch fixes a couple places where we were accessing the log item after dropping the AIL lock on 32-bit machines. This can result in a partially-zeroed log->l_tail_lsn if xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at least some cases, this can leave the l_tail_lsn with a zero cycle number, which means xlog_space_left will think the log is full (unless CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to processes stuck forever in xlog_grant_log_space. Thanks to Adrian VanderSpek for first spotting the race potential and to Dave Chinner for debug assistance. Signed-off-by: NNathaniel W. Turner <nate@houseofnate.net> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Jan Rekorajski 提交于
Hi, I was hit by a bug in linux 2.6.31 when XFS is not able to recover the log after a crash if fs was mounted with quotas. Gory details in XFS bugzilla: http://oss.sgi.com/bugzilla/show_bug.cgi?id=855. It looks like wrong struct is used in buffer length check, and the following patch should fix the problem. xfs_dqblk_t has a size of 104+32 bytes, while xfs_disk_dquot_t is 104 bytes long, and this is exactly what I see in system logs - "XFS: dquot too small (104) in xlog_recover_do_dquot_trans." Signed-off-by: NJan Rekorajski <baggins@sith.mimuw.edu.pl> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 16 11月, 2009 1 次提交
-
-
由 Suresh Jayaraman 提交于
Fix the commit ec06aedd that intended to turn off querying for server inode numbers when server doesn't consistently support inode numbers. Presumably the commit didn't actually clear the CIFS_MOUNT_SERVER_INUM flag, perhaps a typo. Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de> Acked-by: NJeff Layton <jlayton@redhat.com> Cc: Stable <stable@kernel.org> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 15 11月, 2009 2 次提交
-
-
由 Jiro SEKIBA 提交于
The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: NJiro SEKIBA <jir@unicus.jp> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Petr Vandrovec 提交于
Commit 8177e6d6 ("nfsd: clean up readdirplus encoding") introduced single character typo in nfs3 readdir+ implementation. Unfortunately that typo has quite bad side effects: random memory corruption, followed (on my box) with immediate spontaneous box reboot. Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware ESXi box tries to list contents of my home directory. Signed-off-by: NPetr Vandrovec <petr@vandrovec.name> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 11月, 2009 1 次提交
-
-
由 Ryusuke Konishi 提交于
Will fix the following lock order reversal lockdep detected: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6 #7 ------------------------------------------------------- chcp/30157 is trying to acquire lock: (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] but task is already holding lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&nilfs->ns_segctor_sem){++++.+}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c14151e2>] down_read+0x31/0x45 [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2] [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #1 (&type->s_umount_key#31/1){+.+.+.}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c104c0f3>] down_write_nested+0x34/0x52 [<c10c08fe>] sget+0x22e/0x389 [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #0 (&nilfs->ns_mount_mutex){+.+.+.}: [<c1057727>] __lock_acquire+0xe27/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c1414d63>] mutex_lock_nested+0x41/0x23e [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2] [<c10cca12>] vfs_ioctl+0x27/0x6e [<c10ccf93>] do_vfs_ioctl+0x491/0x4db [<c10cd022>] sys_ioctl+0x45/0x5f [<c1002a14>] sysenter_do_call+0x12/0x32 Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
- 12 11月, 2009 15 次提交
-
-
由 Mike Hommey 提交于
Because of an integer overflow on start_blk, various kind of wrong results would be returned by the generic_block_fiemap() handler, such as no extents when there is a 4GB+ hole at the beginning of the file, or wrong fe_logical when an extent starts after the first 4GB. Signed-off-by: NMike Hommey <mh@glandium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Eric Sandeen <sandeen@sgi.com> Cc: Josef Bacik <jbacik@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Blanchard 提交于
In setup_arg_pages we work hard to assign a value to ret, but on exit we always return 0. Also remove a now duplicated exit path and branch to out_unlock instead. Signed-off-by: NAnton Blanchard <anton@samba.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
For FS_IOC_RESVSP and FS_IOC_RESVSP64 compat_sys_ioctl() uses its arg argument as a pointer to userspace. However it is missing a a call to compat_ptr() which will do a proper pointer conversion. This was introduced with 3e63cbb1 "fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls". Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Ankit Jain <me@ankitjain.org> Acked-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NArnd Bergmann <arndbergmann@googlemail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> [2.6.31.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sukadev Bhattiprolu 提交于
Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace' that is discussed in: http://lkml.org/lkml/2009/10/2/159. To summarize the thread, when container-init is terminated, it sets the PF_EXITING flag, zaps other processes in the container and waits to reap them. As a part of reaping, the container-init should flush any /proc dentries associated with the processes. But because the container-init is itself exiting and the following PF_EXITING check, the dentries are not flushed, resulting in leak in /proc inodes and dentries. This fix reverts the commit 7766755a ("Fix /proc dcache deadlock in do_exit") which introduced the check for PF_EXITING. At the time of the commit, shrink_dcache_parent() flushed dentries from other filesystems also and could have caused a deadlock which the commit fixed. But as pointed out by Eric Biederman, after commit 0feae5c4, shrink_dcache_parent() no longer affects other filesystems. So reverting the commit is now safe. As pointed out by Jan Kara, the leak is not as critical since the unclaimed space will be reclaimed under memory pressure or by: echo 3 > /proc/sys/vm/drop_caches But since this check is no longer required, its best to remove it. Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Reported-by: NDaniel Lezcano <dlezcano@fr.ibm.com> Acked-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NJan Kara <jack@ucw.cz> Cc: Andrea Arcangeli <andrea@cpushare.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stefan Schmidt 提交于
This fixes: ERROR: "log_start_commit" [fs/ext3/ext3.ko] undefined! Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
-
由 Josef Bacik 提交于
There is a problem where iget5_locked will look for an inode, not find it, and then subsequently try to allocate it. Another CPU will have raced in and allocated the inode instead, so when iget5_locked gets the inode spin lock again and does a search, it finds the new inode. So it goes ahead and calls destroy_inode on the inode it just allocated. The problem is we don't set BTRFS_I(inode)->root until the new inode is completely initialized. This patch makes us set root to NULL when alloc'ing a new inode, so when we get to btrfs_destroy_inode and we see that root is NULL we can just free up the memory and continue on. This fixes the panic http://www.kerneloops.org/submitresult.php?number=812690 Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
On an FS where all of the space has not been allocated into chunks yet, the enospc can return enospc just because the existing metadata chunks are full. We get around this by allowing more metadata chunks to be allocated up to a certain limit, and finding the right limit is a little fuzzy. The problem is the reservations for delalloc would preallocate way too much of the FS as metadata. We need to start saying no and just force some IO to happen. But we also need to let a reasonable amount of the FS become metadata. This bumps the hard limit up, later releases will have a better system. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Currently compressed IO does not deal with not having its entire extent able to be allocated. So if we have enough free space to allocate for the extent, but its not contiguous, it will fail spectacularly. This patch fixes this by falling back on uncompressed IO which lets us spread the delalloc extent across multiple extents. I tested this by making us randomly think the reservation had failed to make it fallback on the uncompressed io way and it seemed to work fine. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
This patch changes a few things. Hopefully the comments are helpfull, but I'll try and be as verbose here. Problem: My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root. Part of this problem was we pick the first block group we can find and start caching it, even if it may not have enough free space. The other problem is we only search for cached block groups the first time around, which we won't find any cached block groups because this is a newly mounted fs, so we end up caching several block groups during bootup, which with alot of fragmentation takes around 30-45 seconds to complete, which bogs down the system. So Solution: 1) Don't cache block groups willy-nilly at first. Instead try and figure out which block group has the most free, and therefore will take the least amount of time to cache. 2) Don't be so picky about cached block groups. The other problem is once we've filled up a cluster, if the block group isn't finished caching the next time we try and do the allocation we'll completely ignore the cluster and start searching from the beginning of the space, which makes us cache more block groups, which slows us down even more. So instead of skipping block groups that are not finished caching when we have a hint, only skip the block group if it hasn't started caching yet. There is one other tweak in here. Before if we allocated a chunk and still couldn't find new space, we'd end up switching the space info to force another chunk allocation. This could make us end up with way too many chunks, so keep track of this particular case. With this patch and my previous cluster fixes my fedora box now boots in 43 seconds, and according to the bootchart is not held up by our block group caching at all. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Dan Carpenter 提交于
I re-orderred the checks to avoid dereferencing "em" if it was null. Found by smatch static checker. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Dongyang 提交于
We don't need to call btrfs_release_path because btrfs_free_path will do that for us. Signed-off-by: NLi Dongyang <Jerry87905@gmail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
We weren't reserving metadata space for rename, rmdir and unlink, which could cause problems. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
This patch fixes a problem where max_size can be set to 0 even though we filled the cluster properly. We set max_size to 0 if we restart the cluster window, but if the new start entry is big enough to be our new cluster then we could return with a max_size set to 0, which will mean the next time we try to allocate from this cluster it will fail. So set max_extent to the entry's size. Tested this on my box and now we actually allocate from the cluster after we fill it. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
We use journal_info to tell if we're in a nested transaction to make sure we don't commit the transaction within a nested transaction. We use another method to see if there are any outstanding ioctl trans handles, so if we're starting one do not set current->journal_info, since it will screw with other filesystems. This patch also cleans up the starting stuff so there aren't any magic numbers. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Sometimes our start allocation hint when we cow a file can be either EXTENT_HOLE or some other such place holder, which is not optimal. So if we find that our em->block_start is one of these special values, check to see where the first block of the inode is stored, and use that as a hint. If that block is also a special value, just fallback on a hint of 0 and let the allocator figure out a good place to put the data. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 11 11月, 2009 3 次提交
-
-
由 Tao Ma 提交于
If journal init fails, we need to free j_wbuf. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Eric Sandeen 提交于
On a 256M 4k block filesystem, doing this in a loop: dd if=/dev/zero of=test oflag=direct bs=1M count=64 rm -f test eventually leads to spurious ENOSPC: dd: writing `test': No space left on device As with other block allocation callers, it looks like we need to potentially retry the allocations on the initial ENOSPC. A similar patch went into ext4 (commit fbbf6945) Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 09 11月, 2009 1 次提交
-
-
由 Theodore Ts'o 提交于
This is a partial revert of commit 6487a9d3 (only the changes made to fs/ext4/namei.c), since it is causing the following brelse() double-free warning when running fsstress on a file system with 1k blocksize and we run into a block allocation failure while converting a single-block directory to a multi-block hash-tree indexed directory. WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33() Hardware name: VFS: brelse: Trying to free free buffer Modules linked in: Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101 Call Trace: [<c01587fb>] warn_slowpath_common+0x65/0x95 [<c0158869>] warn_slowpath_fmt+0x29/0x2c [<c021168e>] __brelse+0x2e/0x33 [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60 [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd [<c0175c1f>] ? cpu_clock+0x3f/0x5b [<c017f6ec>] ? lock_release_holdtime+0x36/0x137 [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51 [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124 [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60 [<c0290d1c>] kjournald2+0x11a/0x310 [<c017118e>] ? autoremove_wake_function+0x0/0x38 [<c0290c02>] ? kjournald2+0x0/0x310 [<c0170ee6>] kthread+0x66/0x6b [<c0170e80>] ? kthread+0x0/0x6b [<c01251b3>] kernel_thread_helper+0x7/0x10 ---[ end trace 5579351b86af61e3 ]--- Commit 6487a9d3 was an attempt some buffer head leaks in an ENOSPC error path, but in some cases it actually results in an excess ENOSPC, as shown above. Fixing this means cleaning up who is responsible for releasing the buffer heads from the callee to the caller of add_dirent_to_buf(). Since that's a relatively complex change, and we're late in the rcX development cycle, I'm reverting this now, and holding back a more complete fix until after 2.6.32 ships. We've lived with this buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a few more months won't kill us. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Curt Wohlgemuth <curtw@google.com>
-
- 08 11月, 2009 2 次提交
-
-
由 Ryusuke Konishi 提交于
This fixes an -rc1 regression brought by the commit: 1cf58fa8 ("nilfs2: shorten freeze period due to GC in write operation v3"). Although the patch moved out a function call of nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding cleanup job needed for the error case. This will move the missing cleanup job to the destination function. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: NJiro SEKIBA <jir@unicus.jp>
-
由 Ryusuke Konishi 提交于
This fixes a kernel oops reported by Markus Trippelsdorf in the email titled "[NILFS users] kernel Oops while running nilfs_cleanerd". The oops was caused by a bug of error path in nilfs_ioctl_move_blocks() function, which was inlined in nilfs_ioctl_clean_segments(). nilfs_ioctl_move_blocks checks duplication of blocks which will be moved in garbage collection. But, the check should have be done within nilfs_ioctl_move_inode_block() to prevent list corruption among buffers storing the target blocks. To fix the kernel oops, this moves forward the duplication check before the list insertion. I also tested this for stable trees [2.6.30, 2.6.31]. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>
-
- 07 11月, 2009 2 次提交
-
-
由 Jeff Layton 提交于
Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to verify the accessibility of the root inode and then falls back to doing a full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report of a server that returns NT_STATUS_INTERNAL_ERROR rather than something that translates to EOPNOTSUPP. Rather than trying to be clever with that call, just have is_path_accessible do a normal QPathInfo. That call is widely supported and it shouldn't increase the overhead significantly. Cc: Stable <stable@kernel.org> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Jeff Layton 提交于
It's possible that a server will return a valid FileID when we query the FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers when we do a FindFile with an infolevel of SMB_FIND_FILE_ID_FULL_DIR_INFO. In this situation turn off querying for server inode numbers, generate a warning for the user and just generate an inode number using iunique. Once we generate any inode number with iunique we can no longer use any server inode numbers or we risk collisions, so ensure that we don't do that in cifs_get_inode_info either. Cc: Stable <stable@kernel.org> Reported-by: NTimothy Normand Miller <theosib@gmail.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-