- 02 5月, 2013 34 次提交
-
-
由 Alex Elder 提交于
Hold off building the osd request message in ceph_writepages_start() until just before it will be submitted to the osd client for execution. We'll still create the request and allocate the page pointer array after we learn we have at least one page to write. A local variable will be used to keep track of the allocated array of pages. Wait until just before submitting the request for assigning that page array pointer to the request message. Create ands use a new function osd_req_op_extent_update() whose purpose is to serve this one spot where the length value supplied when an osd request's op was initially formatted might need to get changed (reduced, never increased) before submitting the request. Previously, ceph_writepages_start() assigned the message header's data length because of this update. That's no longer necessary, because ceph_osdc_build_request() will recalculate the right value to use based on the content of the ops in the request. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Defer building the osd request until just before submitting it in all callers except ceph_writepages_start(). (That caller will be handed in the next patch.) Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
There is a helper function alloc_page_vec() that, despite its generic sounding name depends heavily on an osd request structure being populated with certain information. There is only one place this function is used, and it ends up being a bit simpler to just open code what it does, so get rid of the helper. The real motivation for this is deferring building the of the osd request message, and this is a step in that direction. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Mostly for readability, define ceph_writepages_osd_request() and use it to allocate the osd request for ceph_writepages_start(). Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
This patch moves the call to ceph_osdc_build_request() out of ceph_osdc_new_request() and into its caller. This is in order to defer formatting osd operation information into the request message until just before request is started. The only unusual (ab)user of ceph_osdc_build_request() is ceph_writepages_start(), where the final length of write request may change (downward) based on the current inode size or the oldest snapshot context with dirty data for the inode. The remaining callers don't change anything in the request after has been built. This means the ops array is now supplied by the caller. It also means there is no need to pass the mtime to ceph_osdc_new_request() (it gets provided to ceph_osdc_build_request()). And rather than passing a do_sync flag, have the number of ops in the ops array supplied imply adding a second STARTSYNC operation after the READ or WRITE requested. This and some of the patches that follow are related to having the messenger (only) be responsible for filling the content of the message header, as described here: http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
There's one spot in ceph_writepages_start() that open-codes what page_offset() does safely. Use the macro so we don't have to worry about wrapping. This resolves: http://tracker.ceph.com/issues/4648Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
In create_fs_client() a memory pool is set up be used for arrays of pages that might be needed in ceph_writepages_start() if memory is tight. There are two problems with the way it's initialized: - The size provided is the number of pages we want in the array, but it should be the number of bytes required for that many page pointers. - The number of pages computed can end up being 0, while we will always need at least one page. This patch fixes both of these problems. This resolves the two simple problems defined in: http://tracker.ceph.com/issues/4603Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Sage Weil 提交于
Use wrapper functions that check whether the auth op exists so that callers do not need a bunch of conditional checks. Simplifies the external interface. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
Currently the messenger calls out to a get_authorizer con op, which will create a new authorizer if it doesn't yet have one. In the meantime, when we rotate our service keys, the authorizer doesn't get updated. Eventually it will be rejected by the server on a new connection attempt and get invalidated, and we will then rebuild a new authorizer, but this is not ideal. Instead, if we do have an authorizer, call a new update_authorizer op that will verify that the current authorizer is using the latest secret. If it is not, we will build a new one that does. This avoids the transient failure. This fixes one of the sorry sequence of events for bug http://tracker.ceph.com/issues/4282Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Henry C Chang 提交于
We should advance the user data pointer by _len_ instead of _written_. _len_ is the data length written in each iteration while _written_ is the accumulated data length we have writtent out. Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com> Reviewed-by: NGreg Farnum <greg@inktank.com> Tested-by: NSage Weil <sage@inktank.com>
-
由 Yan, Zheng 提交于
Current ceph code tracks directory's completeness in two places. ceph_readdir() checks i_release_count to decide if it can set the I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE flag. This indirection introduces locking complexity. This patch adds a new variable i_complete_count to ceph_inode_info. Set i_release_count's value to it when marking a directory complete. By comparing the two variables, we know if a directory is complete Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
-
由 Alex Elder 提交于
Change it so we only assign outgoing data information for messages if there is outgoing data to send. This then allows us to add a few more (currently commented-out) assertions. This is related to: http://tracker.ceph.com/issues/4284Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Alex Elder 提交于
Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and ceph_msg_data_set_trail() to clearly abstract the assignment of the remaining data-related fields in a ceph message structure. Use the new functions in the osd client and mds client. This partially resolves: http://tracker.ceph.com/issues/4263Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
When setting page array information for message data, provide the byte length rather than the page count ceph_msg_data_set_pages(). Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Define a function ceph_msg_data_set_pages(), which more clearly abstracts the assignment page-related fields for data in a ceph message structure. Use this new function in the osd client and mds client. Ideally, these fields would never be set more than once (with BUG_ON() calls to guarantee that). At the moment though the osd client sets these every time it receives a message, and in the event of a communication problem this can happen more than once. (This will be resolved shortly, but setting up these helpers first makes it all a bit easier to work with.) Rearrange the field order in a ceph_msg structure to group those that are used to define the possible data payloads. This partially resolves: http://tracker.ceph.com/issues/4263Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Record the byte count for an osd request rather than the page count. The number of pages can always be derived from the byte count (and alignment/offset) but the reverse is not true. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
An osd request defines information about where data to be read should be placed as well as where data to write comes from. Currently these are represented by common fields. Keep information about data for writing separate from data to be read by splitting these into data_in and data_out fields. This is the key patch in this whole series, in that it actually identifies which osd requests generate outgoing data and which generate incoming data. It's less obvious (currently) that an osd CALL op generates both outgoing and incoming data; that's the focus of some upcoming work. This resolves: http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
An osd request uses either pages or a bio list for its data. Use a union to record information about the two, and add a data type tag to select between them. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Pull the fields in an osd request structure that define the data for the request out into a separate structure. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Currently ceph_osdc_new_request() assigns an osd request's r_num_pages and r_alignment fields. The only thing it does after that is call ceph_osdc_build_request(), and that doesn't need those fields to be assigned. Move the assignment of those fields out of ceph_osdc_new_request() and into its caller. As a result, the page_align parameter is no longer used, so get rid of it. Note that in ceph_sync_write(), the value for req->r_num_pages had already been calculated earlier (as num_pages, and fortunately it was computed the same way). So don't bother recomputing it, but because it's not needed earlier, move that calculation after the call to ceph_osdc_new_request(). Hold off making the assignment to r_alignment, doing it instead r_pages and r_num_pages are getting set. Similarly, in start_read(), nr_pages already holds the number of pages in the array (and is calculated the same way), so there's no need to recompute it. Move the assignment of the page alignment down with the others there as well. This and the next few patches are preparation work for: http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
(This is being reposted. The first one had a problem because it erroneously added a similar change elsewhere; that change has been dropped.) The next patch in this series points out that the calculation for the number of pages in an osd request is getting done twice. It is not obvious, but the result of both calculations is identical. This patch simplifies one of them--as a separate step--to make it clear that the transformation in the next patch is valid. In ceph_sync_write() there is some magic that computes page_align for an osd request. But a little analysis shows it can be simplified. First, we have: io_align = pos & ~PAGE_MASK; which is used here: page_align = (pos - io_align + buf_align) & ~PAGE_MASK; Note (pos - io_align) simply rounds "pos" down to the nearest multiple of the page size. We also have: buf_align = (unsigned long)data & ~PAGE_MASK; Adding buf_align to that rounded-down "pos" value will stay within the same page; the result will just be offset by the page offset for the "data" pointer. The final mask therefore leaves just the value of "buf_align". One more simplification. Note that the result of calc_pages_for() is invariant of which page the offset starts in--the only thing that matters is the offset within the starting page. We will have put the proper page offset to use into "page_align", so just use that in calculating num_pages. This resolves: http://tracker.ceph.com/issues/4166Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
There's a spot that computes the number of pages to allocate for a page-aligned length by just shifting it. Use calc_pages_for() instead, to be consistent with usage everywhere else. The result is the same. The reason for this is to make it clearer in an upcoming patch that this calculation is duplicated. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Currently, incoming mds messages never use page data, which means there is no need to set the page_alignment field in the message. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Alex Elder 提交于
The only user of the ceph messenger that doesn't define an alloc_msg method is the mds client. Define one, such that it works just like it did before, and simplify ceph_con_in_msg_alloc() by assuming the alloc_msg method is always present. This and the next patch resolve: http://tracker.ceph.com/issues/4322Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Alex Elder 提交于
The purpose of ceph_calc_object_layout() is to fill in the pool number and seed for a ceph_pg structure provided, based on a given osd map and target object id. Currently that function takes a file layout parameter, but the only thing used out of that is its pool number. Change the function so it takes a pool number rather than the full file layout structure. Only update the ceph_pg if the pool is found in the osd map. Get rid of few useless lines of code from the function while there. Since the function now very clearly just fills in the ceph_pg structure it's provided, rename it ceph_calc_ceph_pg(). Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
The pagelist_count field is never actually used, so get rid of it. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Yan, Zheng 提交于
make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller does not hold the i_mutex, so ceph_aio_read() can call safely. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Yan, Zheng 提交于
ceph_aio_write() has an optimization that marks CEPH_CAP_FILE_WR cap dirty before data is copied to page cache and inode size is updated. The optimization avoids slow cap revocation caused by balance_dirty_pages(), but introduces inode size update race. If ceph_check_caps() flushes the dirty cap before the inode size is updated, MDS can miss the new inode size. So just remove the optimization. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Sage Weil 提交于
commit 22cddde1 breaks the atomicity of write operation, it also introduces a deadlock between write and truncate. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com> Conflicts: fs/ceph/addr.c
-
由 Yan, Zheng 提交于
commit c6ffe100 moved the flag that tracks if the dcache contents for a directory are complete to dentry. The problem is there are lots of places that use ceph_dir_{set,clear,test}_complete() while holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may sleep because they call dput(). This patch basically reverts that commit. For ceph_d_prune(), it's called with both the dentry to prune and the parent dentry are locked. So it's safe to access the parent dentry's d_inode and clear I_COMPLETE flag. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Yan, Zheng 提交于
MDS ignores cap update message if migrate_seq mismatch, so when receiving a cap import message with higher migrate_seq, set mds_want according to the cap import message. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Yan, Zheng 提交于
So the client will later send cap release message to MDS Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Yan, Zheng 提交于
commit 6e8575fa makes parse_reply_info_extra() return -EIO for LSSNAP Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
由 Alex Elder 提交于
Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
- 26 4月, 2013 1 次提交
-
-
由 Zhao Hongjiang 提交于
dprintk() shouldn't access @ring after it's unmapped. Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 4月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 3a366e61. Wanlong Gao reports that it causes a kernel panic on his machine several minutes after boot. Reverting it removes the panic. Jens says: "It's not quite clear why that is yet, so I think we should just revert the commit for 3.9 final (which I'm assuming is pretty close). The wifi is crap at the LSF hotel, so sending this email instead of queueing up a revert and pull request." Reported-by: NWanlong Gao <gaowanlong@cn.fujitsu.com> Requested-by: NJens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 4月, 2013 3 次提交
-
-
由 Vyacheslav Dubeyko 提交于
Change a u32 to loff_t hfsplus_file_truncate(). Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Documentation/filesystems/proc.txt says about coredump_filter bitmask, Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only effected by bit 5-6. However current code can go into the subsequent flag checks of bit 0-4 for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work as written in the document. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Reviewed-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently we fail to include any data on hugepages into coredump, because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently introduced by commit 314e51b9 ("mm: kill vma flag VM_RESERVED and mm->reserved_vm counter"). This looks to me a serious regression, so let's fix it. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 4月, 2013 1 次提交
-
-
由 Suleiman Souhlal 提交于
Revert commit 62a3ddef ("vfs: fix spinning prevention in prune_icache_sb"). This commit doesn't look right: since we are looking at the tail of the list (sb->s_inode_lru.prev) if we want to skip an inode, we should put it back at the head of the list instead of the tail, otherwise we will keep spinning on it. Discovered when investigating why prune_icache_sb came top in perf reports of a swapping load. Signed-off-by: NSuleiman Souhlal <suleiman@google.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-