- 10 11月, 2010 2 次提交
-
-
由 Sage Weil 提交于
We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The offset/length arguments aren't used. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 09 11月, 2010 2 次提交
-
-
由 Sage Weil 提交于
The client can have a newer ctime than the MDS due to AUTH_EXCL and XATTR_EXCL caps as well; update the check in ceph_fill_file_time appropriately. This fixes cases where ctime/mtime goes backward under the right sequence of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that goes to the MDS). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We may get updates on the same inode from multiple MDSs; generally we only pay attention if the update is newer than what we already have. The exception is when an MDS sense unstable information, in which case we always update. The old > check got this wrong when our version was odd (e.g. 3) and the reply version was even (e.g. 2): the older stale (v2) info would be applied. Fixed and clarified the comment. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 08 11月, 2010 6 次提交
-
-
由 Sage Weil 提交于
MDS requests can be rebuilt and resent in non-process context, but were filling in uid/gid from current_fsuid/gid. Put that information in the request struct on request setup. This fixes incorrect (and root) uid/gid getting set for requests that are forwarded between MDSs, usually due to metadata migrations. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We used to use rdcache_gen to indicate whether we "might" have cached pages. Now we just look at the mapping to determine that. However, some old behavior remains from that transition. First, rdcache_gen == 0 no longer means we have no pages. That can happen at any time (presumably when we carry FILE_CACHE). We should not reset it to zero, and we should not check that it is zero. That means that the only purpose for rdcache_revoking is to resolve races between new issues of FILE_CACHE and an async invalidate. If they are equal, we should invalidate. On success, we decrement rdcache_revoking, so that it is no longer equal to rdcache_gen. Similarly, if we success in doing a sync invalidate, set revoking = gen - 1. (This is a small optimization to avoid doing unnecessary invalidate work and does not affect correctness.) Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If the auth cap migrates to another MDS, clear requested_max_size so that we resend any pending max_size increase requests. This fixes potential hangs on writes that extend a file and race with an cap migration between MDSs. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Only the auth MDS has a meaningful max_size value for us, so only update it in fill_inode if we're being issued an auth cap. Otherwise, a random stat result from a non-auth MDS can clobber a meaningful max_size, get the client<->mds cap state out of sync, and make writes hang. Specifically, even if the client re-requests a larger max_size (which it will), the MDS won't respond because as far as it knows we already have a sufficiently large value. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Normally when we open a file we already have a cap, and simply update the wanted set. However, if we open a file for write, but don't have an auth cap, that doesn't work; we need to open a new cap with the auth MDS. Only reuse existing caps if we are opening for read or the existing cap is auth. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We dereference *in a few lines down, but only set it on rename. It is apparently pretty rare for this to trigger, but I have been hitting it with a clustered MDSs. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 28 10月, 2010 1 次提交
-
-
由 Sage Weil 提交于
This reverts commit d91f2438. The intent of issue_seq is to distinguish between mds->client messages that (re)create the cap and those that do not, which means we should _only_ be updating that value in the create paths. By updating it in handle_cap_grant, we reset it to zero, which then breaks release. The larger question is what workload/problem made me think it should be updated here... Signed-off-by: NSage Weil <sage@newdream.net>
-
- 21 10月, 2010 14 次提交
-
-
由 Sage Weil 提交于
We were taking dcache_lock inside of i_lock, which introduces a dependency not found elsewhere in the kernel, complicationg the vfs locking scalability work. Since we don't actually need it here anyway, remove it. We only need i_lock to test for the I_COMPLETE flag, so be careful to do so without dcache_lock held. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Julia Lawall 提交于
Convert a sequence of kmalloc and memcpy to use kmemdup. The semantic patch that performs this transformation is: (http://coccinelle.lip6.fr/) // <smpl> @@ expression a,flag,len; expression arg,e1,e2; statement S; @@ a = - \(kmalloc\|kzalloc\)(len,flag) + kmemdup(arg,len,flag) <... when != a if (a == NULL || ...) S ...> - memcpy(a,arg,len+1); // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Greg Farnum 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Randy Dunlap 提交于
Include "super.h" outside of CONFIG_DEBUG_FS to eliminate a compiler warning: fs/ceph/debugfs.c:266: warning: 'struct ceph_fs_client' declared inside parameter list fs/ceph/debugfs.c:266: warning: its scope is only this definition or declaration, which is probably not what you want fs/ceph/debugfs.c:271: warning: 'struct ceph_fs_client' declared inside parameter list Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
-
由 Sage Weil 提交于
Switch from using the BKL explicitly to the new lock_flocks() interface. Eventually this will turn into a spinlock. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Greg Farnum 提交于
When the lock_kernel() turns into lock_flocks() and a spinlock, we won't be able to do allocations with the lock held. Preallocate space without the lock, and retry if the lock state changes out from underneath us. Signed-off-by: NGreg Farnum <gregf@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This is simpler and faster. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The i_rdcache_gen value only implies we MAY have cached pages; actually check the mapping to see if it's worth bothering with an invalidate. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Snaps in the root directory are now supported by the MDS, and harmless on older versions. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
This will be used for rbd snapshots administration. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
-
由 Yehuda Sadeh 提交于
Allow the messenger to send/receive data in a bio. This is added so that we wouldn't need to copy the data into pages or some other buffer when doing IO for an rbd block device. We can now have trailing variable sized data for osd ops. Also osd ops encoding is more modular. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
The osd requests creation are being decoupled from the vino parameter, allowing clients using the osd to use other arbitrary object names that are not necessarily vino based. Also, calc_raw_layout now takes a snap id. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
Implement a pool lookup by name. This will be used by rbd. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 07 10月, 2010 6 次提交
-
-
由 Sage Weil 提交于
We need to update the issue_seq on any grant operation, be it via an MDS reply or a separate grant message. The update in the grant path was missing. This broke cap release for inodes in which the MDS sent an explicit grant message that was not soon after followed by a successful MDS reply on the same inode. Also fix the signedness on seq locals. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Greg Farnum 提交于
If an MDS tries to revoke caps that we don't have, we want to send releases early since they probably contain the caps message the MDS is looking for. Previously, we only sent the messages if we didn't have the inode either. But in a multi-mds system we can retain the inode after dropping all caps for a single MDS. Signed-off-by: NGreg Farnum <gregf@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Aneesh Kumar K.V 提交于
encode_fh on error should update max_len with minimum required size, so that caller can redo the call with the reallocated buffer. This is required with open by handle patch series Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Aneesh Kumar K.V 提交于
encode_fh function should return 255 on error as done by other file system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units and not in bytes. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If we interrupt an osd request, we call __cancel_request, but it wasn't verifying that req->r_osd was non-NULL before dereferencing it. This could cause a crash if osds were flapping and we aborted a request on said osd. Reported-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Henry C Chang 提交于
Fix argument order. Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 18 9月, 2010 2 次提交
-
-
由 Sage Weil 提交于
We select CRYPTO_AES, but not CRYPTO. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
See if the i_data mapping has any pages to determine if the FILE_CACHE capability is currently in use, instead of assuming it is any time the rdcache_gen value is set (i.e., issued -> used). This allows the MDS RECALL_STATE process work for inodes that have cached pages. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 17 9月, 2010 2 次提交
-
-
由 Sage Weil 提交于
Sending multiple flushsnap messages is problematic because we ignore the response if the tid doesn't match, and the server may only respond to each one once. It's also a waste. So, skip cap_snaps that are already on the flushing list, unless the caller tells us to resend (because we are reconnecting). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The cap_snap creation/queueing relies on both the current i_head_snapc _and_ the i_snap_realm pointers being correct, so that the new cap_snap can properly reference the old context and the new i_head_snapc can be updated to reference the new snaprealm's context. To fix this, we: - move inodes completely to the new (split) realm so that i_snap_realm is correct, and - generate the new snapc's _before_ queueing the cap_snaps in ceph_update_snap_trace(). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 15 9月, 2010 2 次提交
-
-
由 Sage Weil 提交于
Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages or is still writing. We'll send the newer capsnaps only after the older ones complete. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The 'follows' should match the seq for the snap context for the given snap cap, which is the context under which we have been dirtying and writing data and metadata. The snapshot that _contains_ those updates thus _follows_ that context's seq #. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 14 9月, 2010 1 次提交
-
-
由 Sage Weil 提交于
When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by 1cd3935b. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 12 9月, 2010 2 次提交
-
-
由 Sage Weil 提交于
Cast the value before shifting so that we don't run out of bits with a 32-bit unsigned long. This fixes wrapping of high file offsets into the low 4GB of a file on disk, and the subsequent data corruption for large files. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Fix the reconnect encoding to encode the cap record when the MDS does not have the FLOCK capability (i.e., pre v0.22). Signed-off-by: NSage Weil <sage@newdream.net>
-