提交 · b7495fc2ff941db6a118a93ab8d61149e3f4cef8 · openanolis / cloud-kernel

10 11月, 2010 2 次提交

ceph: make page alignment explicit in osd interface · b7495fc2

由 Sage Weil 提交于 11月 09, 2010

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.
Signed-off-by: NSage Weil <sage@newdream.net>

b7495fc2

S
ceph: fix comment, remove extraneous args · e98b6fed
由 Sage Weil 提交于 11月 09, 2010
```
The offset/length arguments aren't used.
Signed-off-by: NSage Weil <sage@newdream.net>
```
e98b6fed

09 11月, 2010 2 次提交

ceph: fix update of ctime from MDS · d8672d64

由 Sage Weil 提交于 11月 08, 2010

The client can have a newer ctime than the MDS due to AUTH_EXCL and
XATTR_EXCL caps as well; update the check in ceph_fill_file_time
appropriately.

This fixes cases where ctime/mtime goes backward under the right sequence
of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
goes to the MDS).
Signed-off-by: NSage Weil <sage@newdream.net>

d8672d64

ceph: fix version check on racing inode updates · 8bd59e01

由 Sage Weil 提交于 11月 08, 2010

We may get updates on the same inode from multiple MDSs; generally we only
pay attention if the update is newer than what we already have.  The
exception is when an MDS sense unstable information, in which case we
always update.

The old > check got this wrong when our version was odd (e.g. 3) and the
reply version was even (e.g. 2): the older stale (v2) info would be
applied.  Fixed and clarified the comment.
Signed-off-by: NSage Weil <sage@newdream.net>

8bd59e01

08 11月, 2010 6 次提交

ceph: fix uid/gid on resent mds requests · cb4276cc

由 Sage Weil 提交于 11月 08, 2010

MDS requests can be rebuilt and resent in non-process context, but were
filling in uid/gid from current_fsuid/gid.  Put that information in the
request struct on request setup.

This fixes incorrect (and root) uid/gid getting set for requests that
are forwarded between MDSs, usually due to metadata migrations.
Signed-off-by: NSage Weil <sage@newdream.net>

cb4276cc

ceph: fix rdcache_gen usage and invalidate · cd045cb4

由 Sage Weil 提交于 11月 04, 2010

We used to use rdcache_gen to indicate whether we "might" have cached
pages. Now we just look at the mapping to determine that. However, some
old behavior remains from that transition.

First, rdcache_gen == 0 no longer means we have no pages. That can happen
at any time (presumably when we carry FILE_CACHE). We should not reset it
to zero, and we should not check that it is zero.

That means that the only purpose for rdcache_revoking is to resolve races
between new issues of FILE_CACHE and an async invalidate. If they are
equal, we should invalidate. On success, we decrement rdcache_revoking,
so that it is no longer equal to rdcache_gen. Similarly, if we success
in doing a sync invalidate, set revoking = gen - 1. (This is a small
optimization to avoid doing unnecessary invalidate work and does not
affect correctness.)
Signed-off-by: NSage Weil <sage@newdream.net>

cd045cb4

ceph: re-request max_size if cap auth changes · feb4cc9b

由 Sage Weil 提交于 11月 07, 2010

If the auth cap migrates to another MDS, clear requested_max_size so that
we resend any pending max_size increase requests.  This fixes potential
hangs on writes that extend a file and race with an cap migration between
MDSs.
Signed-off-by: NSage Weil <sage@newdream.net>

feb4cc9b

ceph: only let auth caps update max_size · 912a9b03

由 Sage Weil 提交于 11月 07, 2010

Only the auth MDS has a meaningful max_size value for us, so only update it
in fill_inode if we're being issued an auth cap. Otherwise, a random
stat result from a non-auth MDS can clobber a meaningful max_size, get
the client<->mds cap state out of sync, and make writes hang.

Specifically, even if the client re-requests a larger max_size (which it
will), the MDS won't respond because as far as it knows we already have a
sufficiently large value.
Signed-off-by: NSage Weil <sage@newdream.net>

912a9b03

ceph: fix open for write on clustered mds · 7421ab80

由 Sage Weil 提交于 11月 07, 2010

Normally when we open a file we already have a cap, and simply update the
wanted set. However, if we open a file for write, but don't have an auth
cap, that doesn't work; we need to open a new cap with the auth MDS. Only
reuse existing caps if we are opening for read or the existing cap is auth.
Signed-off-by: NSage Weil <sage@newdream.net>

7421ab80

ceph: fix bad pointer dereference in ceph_fill_trace · d8b16b3d

由 Sage Weil 提交于 11月 06, 2010

We dereference *in a few lines down, but only set it on rename.  It is
apparently pretty rare for this to trigger, but I have been hitting it
with a clustered MDSs.
Signed-off-by: NSage Weil <sage@newdream.net>

d8b16b3d

28 10月, 2010 1 次提交

Revert "ceph: update issue_seq on cap grant" · 2f56f56a

由 Sage Weil 提交于 10月 27, 2010

This reverts commit d91f2438.

The intent of issue_seq is to distinguish between mds->client messages that
(re)create the cap and those that do not, which means we should _only_ be
updating that value in the create paths.  By updating it in handle_cap_grant,
we reset it to zero, which then breaks release.

The larger question is what workload/problem made me think it should be
updated here...
Signed-off-by: NSage Weil <sage@newdream.net>

2f56f56a

21 10月, 2010 14 次提交

ceph: do not carry i_lock for readdir from dcache · efa4c120

由 Sage Weil 提交于 10月 18, 2010

We were taking dcache_lock inside of i_lock, which introduces a dependency
not found elsewhere in the kernel, complicationg the vfs locking
scalability work.  Since we don't actually need it here anyway, remove
it.

We only need i_lock to test for the I_COMPLETE flag, so be careful to do
so without dcache_lock held.
Signed-off-by: NSage Weil <sage@newdream.net>

efa4c120

fs/ceph/xattr.c: Use kmemdup · 61413c2f

由 Julia Lawall 提交于 10月 17, 2010

Convert a sequence of kmalloc and memcpy to use kmemdup.

The semantic patch that performs this transformation is:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression a,flag,len;
expression arg,e1,e2;
statement S;
@@

  a =
-  \(kmalloc\|kzalloc\)(len,flag)
+  kmemdup(arg,len,flag)
  <... when != a
  if (a == NULL || ...) S
  ...>
- memcpy(a,arg,len+1);
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NSage Weil <sage@newdream.net>

61413c2f

G
ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl. · 571dba52
由 Greg Farnum 提交于 9月 24, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
571dba52

ceph: fix debugfs warnings · 6f453ed6

由 Randy Dunlap 提交于 9月 28, 2010

Include "super.h" outside of CONFIG_DEBUG_FS to eliminate a compiler warning:

fs/ceph/debugfs.c:266: warning: 'struct ceph_fs_client' declared inside parameter list
fs/ceph/debugfs.c:266: warning: its scope is only this definition or declaration, which is probably not what you want
fs/ceph/debugfs.c:271: warning: 'struct ceph_fs_client' declared inside parameter list
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

6f453ed6

ceph: switch from BKL to lock_flocks() · 496e5955

由 Sage Weil 提交于 9月 22, 2010

Switch from using the BKL explicitly to the new lock_flocks() interface.
Eventually this will turn into a spinlock.
Signed-off-by: NSage Weil <sage@newdream.net>

496e5955

ceph: preallocate flock state without locks held · fca4451a

由 Greg Farnum 提交于 9月 17, 2010

When the lock_kernel() turns into lock_flocks() and a spinlock, we won't
be able to do allocations with the lock held. Preallocate space without
the lock, and retry if the lock state changes out from underneath us.
Signed-off-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

fca4451a

S
ceph: use mapping->nrpages to determine if mapping is empty · 18a38193
由 Sage Weil 提交于 9月 17, 2010
```
This is simpler and faster.
Signed-off-by: NSage Weil <sage@newdream.net>
```
18a38193

ceph: only invalidate on check_caps if we actually have pages · 93afd449

由 Sage Weil 提交于 9月 17, 2010

The i_rdcache_gen value only implies we MAY have cached pages; actually
check the mapping to see if it's worth bothering with an invalidate.
Signed-off-by: NSage Weil <sage@newdream.net>

93afd449

ceph: do not hide .snap in root directory · 4c32f5dd

由 Sage Weil 提交于 8月 24, 2010

Snaps in the root directory are now supported by the MDS, and harmless on
older versions.
Signed-off-by: NSage Weil <sage@newdream.net>

4c32f5dd

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

Y
ceph-rbd: osdc support for osd call and rollback operations · ae1533b6
由 Yehuda Sadeh 提交于 5月 18, 2010
```
This will be used for rbd snapshots administration.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
```
ae1533b6

ceph: messenger and osdc changes for rbd · 68b4476b

由 Yehuda Sadeh 提交于 4月 06, 2010

Allow the messenger to send/receive data in a bio.  This is added
so that we wouldn't need to copy the data into pages or some other buffer
when doing IO for an rbd block device.

We can now have trailing variable sized data for osd
ops.  Also osd ops encoding is more modular.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

68b4476b

ceph: refactor osdc requests creation functions · 3499e8a5

由 Yehuda Sadeh 提交于 4月 06, 2010

The osd requests creation are being decoupled from the
vino parameter, allowing clients using the osd to use
other arbitrary object names that are not necessarily
vino based. Also, calc_raw_layout now takes a snap id.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

3499e8a5

ceph: lookup pool in osdmap by name · 7669a2c9

由 Yehuda Sadeh 提交于 5月 17, 2010

Implement a pool lookup by name.  This will be used by rbd.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

7669a2c9

07 10月, 2010 6 次提交

ceph: update issue_seq on cap grant · d91f2438

由 Sage Weil 提交于 9月 22, 2010

We need to update the issue_seq on any grant operation, be it via an MDS
reply or a separate grant message.  The update in the grant path was
missing.  This broke cap release for inodes in which the MDS sent an
explicit grant message that was not soon after followed by a successful
MDS reply on the same inode.

Also fix the signedness on seq locals.
Signed-off-by: NSage Weil <sage@newdream.net>

d91f2438

ceph: send cap release message early on failed revoke. · 21b559de

由 Greg Farnum 提交于 10月 06, 2010

If an MDS tries to revoke caps that we don't have, we want to send
releases early since they probably contain the caps message the MDS
is looking for.

Previously, we only sent the messages if we didn't have the inode either. But
in a multi-mds system we can retain the inode after dropping all caps for
a single MDS.
Signed-off-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

21b559de

ceph: Update max_len with minimum required size · bba0cd0e

由 Aneesh Kumar K.V 提交于 10月 05, 2010

encode_fh on error should update max_len with minimum required
size, so that caller can redo the call with the reallocated buffer.
This is required with open by handle patch series
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NSage Weil <sage@newdream.net>

bba0cd0e

ceph: Fix return value of encode_fh function · 92923dcb

由 Aneesh Kumar K.V 提交于 10月 05, 2010

encode_fh function should return 255 on error as done by other file
system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units
and not in bytes.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NSage Weil <sage@newdream.net>

92923dcb

ceph: avoid null deref in osd request error path · 6bc18876

由 Sage Weil 提交于 9月 27, 2010

If we interrupt an osd request, we call __cancel_request, but it wasn't
verifying that req->r_osd was non-NULL before dereferencing it. This could
cause a crash if osds were flapping and we aborted a request on said osd.
Reported-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

6bc18876

ceph: fix list_add usage on unsafe_writes list · 936aeb5c

由 Henry C Chang 提交于 9月 22, 2010

Fix argument order.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

936aeb5c

18 9月, 2010 2 次提交

ceph: select CRYPTO · be4f104d

由 Sage Weil 提交于 9月 17, 2010

We select CRYPTO_AES, but not CRYPTO.
Signed-off-by: NSage Weil <sage@newdream.net>

be4f104d

ceph: check mapping to determine if FILE_CACHE cap is used · a43fb731

由 Sage Weil 提交于 9月 17, 2010

See if the i_data mapping has any pages to determine if the FILE_CACHE
capability is currently in use, instead of assuming it is any time the
rdcache_gen value is set (i.e., issued -> used).

This allows the MDS RECALL_STATE process work for inodes that have cached
pages.
Signed-off-by: NSage Weil <sage@newdream.net>

a43fb731

17 9月, 2010 2 次提交

ceph: only send one flushsnap per cap_snap per mds session · e835124c

由 Sage Weil 提交于 9月 17, 2010

Sending multiple flushsnap messages is problematic because we ignore
the response if the tid doesn't match, and the server may only respond to
each one once.  It's also a waste.

So, skip cap_snaps that are already on the flushing list, unless the caller
tells us to resend (because we are reconnecting).
Signed-off-by: NSage Weil <sage@newdream.net>

e835124c

ceph: fix cap_snap and realm split · ae00d4f3

由 Sage Weil 提交于 9月 16, 2010

The cap_snap creation/queueing relies on both the current i_head_snapc
_and_ the i_snap_realm pointers being correct, so that the new cap_snap
can properly reference the old context and the new i_head_snapc can be
updated to reference the new snaprealm's context.  To fix this, we:

 - move inodes completely to the new (split) realm so that i_snap_realm
   is correct, and
 - generate the new snapc's _before_ queueing the cap_snaps in
   ceph_update_snap_trace().
Signed-off-by: NSage Weil <sage@newdream.net>

ae00d4f3

15 9月, 2010 2 次提交

ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap · cfc0bf66

由 Sage Weil 提交于 9月 14, 2010

Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages
or is still writing.  We'll send the newer capsnaps only after the older
ones complete.
Signed-off-by: NSage Weil <sage@newdream.net>

cfc0bf66

ceph: correctly set 'follows' in flushsnap messages · 8bef9239

由 Sage Weil 提交于 9月 14, 2010

The 'follows' should match the seq for the snap context for the given snap
cap, which is the context under which we have been dirtying and writing
data and metadata.  The snapshot that _contains_ those updates thus
_follows_ that context's seq #.
Signed-off-by: NSage Weil <sage@newdream.net>

8bef9239

14 9月, 2010 1 次提交

ceph: fix dn offset during readdir_prepopulate · 467c5251

由 Sage Weil 提交于 9月 13, 2010

When adding the readdir results to the cache, ceph_set_dentry_offset was
clobbered our just-set offset.  This can cause the readdir result offsets
to get out of sync with the server.  Add an argument to the helper so
that it does not.

This bug was introduced by 1cd3935b.
Signed-off-by: NSage Weil <sage@newdream.net>

467c5251

12 9月, 2010 2 次提交

ceph: fix file offset wrapping at 4GB on 32-bit archs · a77d9f7d

由 Sage Weil 提交于 9月 11, 2010

Cast the value before shifting so that we don't run out of bits with a
32-bit unsigned long.  This fixes wrapping of high file offsets into the
low 4GB of a file on disk, and the subsequent data corruption for large
files.
Signed-off-by: NSage Weil <sage@newdream.net>

a77d9f7d

ceph: fix reconnect encoding for old servers · 3612abbd

由 Sage Weil 提交于 9月 07, 2010

Fix the reconnect encoding to encode the cap record when the MDS does not
have the FLOCK capability (i.e., pre v0.22).
Signed-off-by: NSage Weil <sage@newdream.net>

3612abbd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功