- 23 3月, 2010 11 次提交
-
-
由 Sage Weil 提交于
Make variable name slightly more generic, since it will (soon) reflect either the time the request was sent OR the time it was last determined to be still retrying. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The messenger fault was clearing the BUSY bit, for reasons unclear. This made it possible for the con->ops->fault function to reopen the connection, and requeue work in the workqueue--even though the current thread was already in con_work. This avoids a problem where the client busy loops with connection failures on an unreachable OSD, but doesn't address the root cause of that problem. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Prevent duplicate 'mds0 caps stale' message from spamming the console every few seconds while the MDS restarts. Set s_renew_requested earlier, so that we only print the message once, even if we don't send an actual request. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The incremental map decoding of pg pool updates wasn't skipping the snaps and removed_snaps vectors. This caused osd requests to stall when pool snapshots were created or fs snapshots were deleted. Use a common helper for full and incremental map decoders that decodes pools properly. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The wait_unsafe_requests() helper dropped the mdsc mutex to wait for each request to complete, and then examined r_node to get the next request after retaking the lock. But the request completion removes the request from the tree, so r_node was always undefined at this point. Since it's a small race, it usually led to a valid request, but not always. The result was an occasional crash in rb_next() while dereferencing node->rb_left. Fix this by clearing the rb_node when removing the request from the request tree, and not walking off into the weeds when we are done waiting for a request. Since the request we waited on will _always_ be out of the request tree, take a ref on the next request, in the hopes that it won't be. But if it is, it's ok: we can start over from the beginning (and traverse over older read requests again). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We were releasing used caps (e.g. FILE_CACHE) from encode_inode_release with MDS requests (e.g. setattr). We don't carry refs on most caps, so this code worked most of the time, but for setattr (utimes) we try to drop Fscr. This causes cap state to get slightly out of sync with reality, and may result in subsequent mds revoke messages getting ignored. Fix by only releasing unused caps. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Drop session mutex unconditionally in handle_cap_grant, and do the check_caps from the handle_cap_grant helper. This avoids using a magic return value. Also avoid using a flag variable in the IMPORT case and call check_caps at the appropriate point. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Passing a session pointer to ceph_check_caps() used to mean it would leave the session mutex locked. That wasn't always possible if it wasn't passed CHECK_CAPS_AUTHONLY. If could unlock the passed session and lock a differet session mutex, which was clearly wrong, and also emitted a warning when it a racing CPU retook it and we did an unlock from the wrong context. This was only a problem when there was more than one MDS. First, make ceph_check_caps unconditionally drop the session mutex, so that it is free to lock other sessions as needed. Then adjust the one caller that passes in a session (handle_cap_grant) accordingly. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If we don't have the exported cap it's because we already released it. No need to WARN. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This causes an oops when debug output is enabled and we kick an osd request with no current r_osd (sometime after an osd failure). Check the pointer before dereferencing. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Previously we would decode state directly into our current ticket_handler. This is problematic if for some reason we fail to decode, because we end up with half new state and half old state. We are probably already in bad shape if we get an update we can't decode, but we may as well be tidy anyway. Decode into new_* temporaries and update the ticket_handler only on success. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 21 3月, 2010 6 次提交
-
-
由 Sage Weil 提交于
Release the old ticket_blob buffer when we get an updated service ticket from the monitor. Previously these were getting leaked. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The buffer size was incorrectly calculated for the ceph_x_encrypt() encapsulated ticket blob. Use a helper (with correct arithmetic) and BUG out if we were wrong. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We were failing to reconnect to services due to an old authenticator, even though we had the new ticket, because we weren't properly retrying the connect handshake, because we were calling an old/incorrect helper that left in_base_pos incorrect. The result was a failure to reconnect to the OSD or MDS (with an authentication error) if the MDS restarted after the service had been up a few hours (long enough for the original authenticator to be invalid). This was only a problem if the AUTH_X authentication was enabled. Now that the 'negotiate' and 'connect' stages are fully separated, we should use the prepare_read_connect() helper instead, and remove the obsolete one. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
When an inode was dropped while being migrated between two MDSs, i_cap_exporting_issued was non-zero such that issue caps were non-zero and __ceph_is_any_caps(ci) was true. This prevented the inode from being removed from the snap realm, even as it was dropped from the cache. Fix this by dropping any residual i_snap_realm ref in destroy_inode. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
All ci->i_snap_realm_item/realm->inodes_with_caps manipulation should be protected by realm->inodes_with_caps_lock. This bug would have only bit us in a rare race with a realm split (during some snap creations). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Added assertion, and cleared one case where the implemented caps were not following the issued caps. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 19 3月, 2010 4 次提交
-
-
由 Chris Mason 提交于
This is used by the inode lookup ioctl to follow all the backrefs up to the subvol root. But the search being done would sometimes land one past the last item in the leaf instead of finding the backref. This changes the search to look for the highest possible backref and hop back one item. It also fixes a leaked path on failure to find the root. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
When a root id of 0 is sent to the inode lookup ioctl, it will use the root of the file we're ioctling and pass the root id back to userland along with the results. This allows userland to do searches based on that root later on. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The search ioctl was skipping large items entirely (ones that are too big for the results buffer). This changes things to at least copy the item header so that we can send information about the item back to userland. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The search ioctl was working well for finding tree roots, but using it for generic searches requires a few changes to how the keys are advanced. This treats the search control min fields for objectid, type and offset more like a key, where we drop the offset to zero once we bump the type, etc. The downside of this is that we are changing the min_type and min_offset fields during the search, and so the ioctl caller needs extra checks to make sure the keys in the result are the ones it wanted. This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make things more readable. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 3月, 2010 2 次提交
-
-
由 Akinobu Mita 提交于
Use bitmap_weight() instead of doing hweight32() for each u32 element in the page. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Venkatesh Pallipadi 提交于
jffs2 uses rb_node = NULL; to zero rb_root. The problem with this is that 17d9ddc7 ("rbtree: Add support for augmented rbtrees") in the linux-next tree adds a new field to that struct which needs to be NULL as well. This patch uses RB_ROOT as the intializer so all of the relevant fields will be NULL'd. Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Eric Paris <eparis@redhat.com> Acked-by: NDavid Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2010 6 次提交
-
-
由 Dave Chinner 提交于
If we are doing a forced shutdown, we can get lots of noise about delalloc pages being discarded. This is happens by design during a forced shutdown, so don't spam the logs with these messages. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Alex Elder 提交于
Re-apply a commit that had been reverted due to regressions that have since been fixed. From 95f8e302 Mon Sep 17 00:00:00 2001 From: Nick Piggin <npiggin@suse.de> Date: Tue, 6 Jan 2009 14:43:09 +1100 Implement XFS's large buffer support with the new vmap APIs. See the vmap rewrite (db64fe02) for some numbers. The biggest improvement that comes from using the new APIs is avoiding the global KVA allocation lock on every call. Signed-off-by: NNick Piggin <npiggin@suse.de> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NLachlan McIlroy <lachlan@sgi.com> Only modifications here were a minor reformat, plus making the patch apply given the new use of xfs_buf_is_vmapped(). Modified-by: NAlex Elder <aelder@sgi.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Alex Elder 提交于
Re-apply a commit that had been reverted due to regressions that have since been fixed. Original commit: d2859751 Author: Nick Piggin <npiggin@suse.de> Date: Tue, 6 Jan 2009 14:40:44 +1100 XFS's vmap batching simply defers a number (up to 64) of vunmaps, and keeps track of them in a list. To purge the batch, it just goes through the list and calls vunamp on each one. This is pretty poor: a global TLB flush is generally still performed on each vunmap, with the most expensive parts of the operation being the broadcast IPIs and locking involved in the SMP callouts, and the locking involved in the vmap management -- none of these are avoided by just batching up the calls. I'm actually surprised it ever made much difference. (Now that the lazy vmap allocator is upstream, this description is not quite right, but the vunmap batching still doesn't seem to do much). Rip all this logic out of XFS completely. I will improve vmap performance and scalability directly in subsequent patch. Signed-off-by: NNick Piggin <npiggin@suse.de> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NLachlan McIlroy <lachlan@sgi.com> The only change I made was to use the "new" xfs_buf_is_vmapped() function in a place it had been open-coded in the original. Modified-by: NAlex Elder <aelder@sgi.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chris Mason 提交于
The space_info ioctl was using copy_to_user inside rcu_read_lock. This commit changes things to copy into a buffer first and then dump the result down to userland. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
key->type is u8, not u64. fs/btrfs/ioctl.c: In function 'copy_to_sk': fs/btrfs/ioctl.c:1024: warning: comparison is always true due to limited range of data type Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 16 3月, 2010 1 次提交
-
-
由 NeilBrown 提交于
bdi_unregister is called by nfs_put_super which is only called by generic_shutdown_super if ->s_root is not NULL. So if we error out in a circumstance where we called nfs_bdi_register (i.e. server != NULL) but have not set s_root, then we need to call bdi_unregister explicitly in nfs_get_sb and various other *_get_sb() functions. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 15 3月, 2010 10 次提交
-
-
由 Dan Carpenter 提交于
I fixed the indent level. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Nick Piggin 提交于
GFP_FS must be masked out, NOFS can't be or'd in. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
After callling submit_bio, the bio can be freed at any time. The btrfs submission thread helper was checking the bio flags too late, which might not give the correct answer. When CONFIG_DEBUG_PAGE_ALLOC is turned on, it can lead to oopsen. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Xiao Guangrong 提交于
We can use btrfs_stack_device_id() to get dev_item->devid Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Akinobu Mita 提交于
Use memparse() instead of its own private implementation. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
df is a very loaded question in btrfs. This gives us a way to get the per-space usage information so we can tell exactly what is in use where. This will help us figure out ENOSPC problems, and help users better understand where their disk space is going. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
This patch just goes through and fixes everybody that does lock_extent() blah unlock_extent() to use lock_extent_bits() blah unlock_extent_cached() and pass around a extent_state so we only have to do the searches once per function. This gives me about a 3 mb/s boots on my random write test. I have not converted some things, like the relocation and ioctl's, since they aren't heavily used and the relocation stuff is in the middle of being re-written. I also changed the clear_extent_bit() to only unset the cached state if we are clearing EXTENT_LOCKED and related stuff, so we can do things like this lock_extent_bits() clear delalloc bits unlock_extent_cached() without losing our cached state. I tested this thoroughly and turned on LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out fine. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
When finishing io we run btrfs_dec_test_ordered_pending, and then immediately run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that already, so we're searching twice when we don't have to. This patch lets us pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do complete io on that ordered extent we can just use the one we found then instead of having to do another btrfs_lookup_ordered_extent. This made my fio job with the other patch go from 24 mb/s to 29 mb/s. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
This patch makes us cache the extent state we find in find_delalloc_range since we'll have to lock the extent later on in the function. This will keep us from re-searching for the rang when we try to lock the extent. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
The ordered tree used to need a mutex, but currently all we use it for is to protect the rb_tree, and a spin_lock is just fine for that. Using a spin_lock instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have less latency, 3445.138 ms instead of 3820.633 ms. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-