提交 · 9bae113a085b790de384bf86f09e15b42a65a985 · openeuler / raspberrypi-kernel

27 7月, 2011 4 次提交

ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC · 9bae113a

由 Sage Weil 提交于 7月 26, 2011

We only need to put these on the directory unsafe list if they have
side effects that fsync(2) should flush out.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9bae113a

ceph: fix bad parent_inode calc in ceph_lookup_open · acda7657

由 Sage Weil 提交于 7月 26, 2011

We were always getting NULL here because the intent file f_dentry is always
NULL at this point, which means we were always passing NULL to
ceph_mdsc_do_request.  In reality, this was fine, since this isn't
currently ever a write operation that needs to get strung on the dir's
unsafe list.

Use the dir explicitly, and only pass it if this open has side-effects that
a dir fsync should flush.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

acda7657

ceph: avoid carrying Fw cap during write into page cache · d8de9ab6

由 Sage Weil 提交于 7月 26, 2011

The generic_file_aio_write call may block on balance_dirty_pages while we
flush data to the OSDs.  If we hold a reference to the FILE_WR cap during
that interval revocation by the MDS (e.g., to do a stat(2)) may be very
slow.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

d8de9ab6

ceph: add F_SYNC file flag to force sync (non-O_DIRECT) io · 4918b6d1

由 Sage Weil 提交于 7月 26, 2011

This allows us to force IO through the sync path which you normally only
get when multiple clients are reading/writing to the same file or by
mounting with -o sync.  Among other things, this lets test programs verify
correctness with a single mount.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

4918b6d1

14 6月, 2011 2 次提交

ceph: fix sync and dio writes across stripe boundaries · d7f124f1

由 Sage Weil 提交于 6月 13, 2011

We were iterating across stripe boundaries properly, but not moving the
write buffer pointer forward. This caused us to rewrite the same data
after the break. Fix by adjusting the data pointer forward, and
recalculating the io and buffer alignment after the break.
Signed-off-by: NSage Weil <sage@newdream.net>

d7f124f1

ceph: fix page alignment corrections · 773e9b44

由 Sage Weil 提交于 6月 07, 2011

 dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1
 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1
Reported-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

773e9b44

08 6月, 2011 3 次提交

ceph: fix ENOENT logic in striped_read · 0e98728f

由 Sage Weil 提交于 6月 07, 2011

Getting ENOENT is equivalent to reading 0 bytes.  Make that correction
before setting up the hit_stripe and was_short flags.

Fixes the following case:
 dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0
 dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct
Reported-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

0e98728f

ceph: fix short sync reads from the OSD · c3cd6283

由 Sage Weil 提交于 6月 01, 2011

If we get a short read from the OSD because the object is small, we need to
zero the remainder of the buffer.  For O_DIRECT reads, the attempted range
is not trimmed to i_size by the VFS, so we were actually looping
indefinitely.

Fix by trimming by i_size, and the unconditionally zeroing the trailing
range.
Reported-by: NJeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: NSage Weil <sage@newdream.net>

c3cd6283

ceph: use ihold when we already have an inode ref · 70b666c3

由 Sage Weil 提交于 5月 27, 2011

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: NSage Weil <sage@newdream.net>

70b666c3

05 5月, 2011 1 次提交

ceph: do not call __mark_dirty_inode under i_lock · fca65b4a

由 Sage Weil 提交于 5月 04, 2011

The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.
Signed-off-by: NSage Weil <sage@newdream.net>

fca65b4a

22 3月, 2011 2 次提交

ceph: add request to the tail of unsafe write list · 49bcb932

由 Henry C Chang 提交于 3月 15, 2011

In sync_write_wait(), we assume that the newest request is at the
tail of unsafe write list. We should maintain the semantics here.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

49bcb932

ceph: remove request from unsafe list if it is canceled/timed out · 78a25565

由 Henry C Chang 提交于 3月 15, 2011

This fixes the list corruption warning like this:

------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
Hardware name: X8DTU
list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130).
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
Pid: 10977, comm: smbd Tainted: G        W  2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
Call Trace:
[<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94
[<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43
[<ffffffff812351a3>] __list_add+0x68/0x81
[<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph]
[<ffffffff8111d2a0>] do_sync_write+0xe8/0x125
[<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3
[<ffffffff811e8521>] ? security_file_permission+0x16/0x18
[<ffffffff8111d864>] vfs_write+0xae/0x10b
[<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76
[<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
---[ end trace 08573eb9f07ff6f4 ]---
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

78a25565

18 12月, 2010 1 次提交

ceph: mark user pages dirty on direct-io reads · b6aa5901

由 Henry C Chang 提交于 12月 15, 2010

For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

b6aa5901

16 12月, 2010 1 次提交

ceph: fix direct-io on non-page-aligned buffers · ab226e21

由 Henry C Chang 提交于 12月 15, 2010

The user buffer may be 512-byte aligned, not page-aligned.  We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ab226e21

10 11月, 2010 2 次提交

ceph: make page alignment explicit in osd interface · b7495fc2

由 Sage Weil 提交于 11月 09, 2010

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.
Signed-off-by: NSage Weil <sage@newdream.net>

b7495fc2

S
ceph: fix comment, remove extraneous args · e98b6fed
由 Sage Weil 提交于 11月 09, 2010
```
The offset/length arguments aren't used.
Signed-off-by: NSage Weil <sage@newdream.net>
```
e98b6fed

08 11月, 2010 1 次提交

ceph: fix open for write on clustered mds · 7421ab80

由 Sage Weil 提交于 11月 07, 2010

Normally when we open a file we already have a cap, and simply update the
wanted set. However, if we open a file for write, but don't have an auth
cap, that doesn't work; we need to open a new cap with the auth MDS. Only
reuse existing caps if we are opening for read or the existing cap is auth.
Signed-off-by: NSage Weil <sage@newdream.net>

7421ab80

21 10月, 2010 1 次提交

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

07 10月, 2010 1 次提交

ceph: fix list_add usage on unsafe_writes list · 936aeb5c

由 Henry C Chang 提交于 9月 22, 2010

Fix argument order.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

936aeb5c

04 8月, 2010 1 次提交
- S
  ceph: whitespace cleanup · 213c99ee
  由 Sage Weil 提交于 8月 03, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  213c99ee
03 8月, 2010 1 次提交

ceph: add flock/fcntl lock support · 40819f6f

由 Greg Farnum 提交于 8月 02, 2010

Implement flock inode operation to support advisory file locking.  All
lock/unlock operations are synchronous with the MDS.  Lock state is
sent when reconnecting to a recovering MDS to restore the shared lock
state.
Signed-off-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

40819f6f

02 8月, 2010 3 次提交

ceph: code cleanup · cd84db6e

由 Yehuda Sadeh 提交于 6月 11, 2010

Mainly fixing minor issues reported by sparse.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

cd84db6e

ceph: perform lazy reads when file mode and caps permit · 2962507c

由 Sage Weil 提交于 5月 27, 2010

If the file mode is marked as "lazy," perform cached/buffered reads when
the caps permit it.  Adjust the rdcache_gen and invalidation logic
accordingly so that we manage our cache based on the FILE_CACHE -or-
FILE_LAZYIO cap bits.
Signed-off-by: NSage Weil <sage@newdream.net>

2962507c

ceph: perform lazy writes when file mode and caps permit · 33caad32

由 Sage Weil 提交于 5月 26, 2010

If we have marked a file as "lazy" (using the ceph ioctl), perform buffered
writes when the MDS caps allow it.
Signed-off-by: NSage Weil <sage@newdream.net>

33caad32

28 7月, 2010 1 次提交

ceph: use complete_all and wake_up_all · 03066f23

由 Yehuda Sadeh 提交于 7月 27, 2010

This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

03066f23

30 5月, 2010 1 次提交

fs/ceph: Use ERR_CAST · 7e34bc52

由 Julia Lawall 提交于 5月 22, 2010

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of
the returned value is the same as the type of the enclosing function.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NSage Weil <sage@newdream.net>

7e34bc52

22 5月, 2010 1 次提交

sanitize vfs_fsync calling conventions · 8018ab05

由 Christoph Hellwig 提交于 3月 22, 2010

Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.

The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8018ab05

18 5月, 2010 4 次提交

ceph: all allocation functions should get gfp_mask · 34d23762

由 Yehuda Sadeh 提交于 4月 06, 2010

This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

34d23762

ceph: make ceph_msg_new return NULL on failure; clean up, fix callers · a79832f2

由 Sage Weil 提交于 4月 01, 2010

Returning ERR_PTR(-ENOMEM) is useless extra work. Return NULL on failure
instead, and fix up the callers (about half of which were wrong anyway).
Signed-off-by: NSage Weil <sage@newdream.net>

a79832f2

ceph: use ceph_sb_to_client instead of ceph_client · 640ef79d

由 Cheng Renquan 提交于 3月 26, 2010

ceph_sb_to_client and ceph_client are really identical, we need to dump
one; while function ceph_client is confusing with "struct ceph_client",
ceph_sb_to_client's definition is more clear; so we'd better switch all
call to ceph_sb_to_client.

  -static inline struct ceph_client *ceph_client(struct super_block *sb)
  -{
  -	return sb->s_fs_info;
  -}
Signed-off-by: NCheng Renquan <crquan@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

640ef79d

ceph: use __page_cache_alloc and add_to_page_cache_lru · 31459fe4

由 Yehuda Sadeh 提交于 3月 17, 2010

Following Nick Piggin patches in btrfs, pagecache pages should be
allocated with __page_cache_alloc, so they obey pagecache memory
policies.

Also, using add_to_page_cache_lru instead of using a private
pagevec where applicable.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

31459fe4

04 5月, 2010 1 次提交

ceph: fix direct io truncate offset · 5c6a2cdb

由 Sage Weil 提交于 4月 22, 2010

truncate_inode_pages_range wants the end offset to align with the last byte
in a page.
Signed-off-by: NSage Weil <sage@newdream.net>

5c6a2cdb

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

02 3月, 2010 1 次提交

ceph: return EBADF if waiting for caps on closed file · 195d3ce2

由 Sage Weil 提交于 3月 01, 2010

Verify the file is actually open for the given caps when we are
waiting for caps.  This ensures we will wake up and return EBADF
if another thread closes the file out from under us.

Note that EBADF is also the correct return code from write(2)
when called on a file handle opened for reading (although the
vfs should catch that).
Signed-off-by: NSage Weil <sage@newdream.net>

195d3ce2

24 2月, 2010 1 次提交
- Y
  ceph: don't clobber write return value when using O_SYNC · 88d892a3
  由 Yehuda Sadeh 提交于 2月 23, 2010
```
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>
```
  88d892a3
12 2月, 2010 3 次提交

ceph: fix sync read eof check deadlock · 6a026589

由 Sage Weil 提交于 2月 09, 2010

If a sync read gets a short result from the OSD, it may need to do a
getattr to see if it is short due to reaching end-of-file. The getattr
was being done while holding a reference to FILE_RD, which can lead to
a deadlock if the MDS is revoking that capability bit and can't process
the getattr until it does.

We fix this by setting a flag if EOF size validation is needed, and doing
the getattr in ceph_aio_read, after the RD cap ref is dropped. If the
read needs to be continued, we loop and continue traversing the file.
Signed-off-by: NSage Weil <sage@newdream.net>

6a026589

ceph: sync read/write considers page cache · 29065a51

由 Yehuda Sadeh 提交于 2月 09, 2010

In the cases where we either do a sync read or a write, we
need to make sure that everything in the page cache is flushed.
In the case of a sync write we invalidate the relevant pages,
so that subsequent read/write reflects the new data written.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

29065a51

ceph: fix short synchronous reads · 972f0d3a

由 Yehuda Sadeh 提交于 2月 04, 2010

Zeroing of holes was not done correctly: page_off was miscalculated and
zeroing the tail didn't not adjust the 'read' value to include the zeroed
portion.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

972f0d3a

07 1月, 2010 1 次提交

ceph: fix copy_user_to_page_vector() · 6a4ef481

由 Yehuda Sadeh 提交于 12月 31, 2009

The function was broken in the case where there was more than one page
involved, broke the ceph sync_write case.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

6a4ef481

05 11月, 2009 1 次提交

ceph: fix sparse endian warning · 6a18be16

由 Sage Weil 提交于 11月 04, 2009

Use the __le macro, even though for -1 it doesn't matter.
Signed-off-by: NSage Weil <sage@newdream.net>

6a18be16