提交 · 4908b822b300d2d7ad0341203181cfbd8a91092a · openeuler / raspberrypi-kernel

07 5月, 2014 9 次提交

A
ceph: switch to ->write_iter() · 4908b822
由 Al Viro 提交于 4月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
4908b822
A
ceph_sync_direct_write: stop poking into iov_iter guts · 64c31311
由 Al Viro 提交于 4月 03, 2014
```
all needed primitives are there...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
64c31311
A
ceph_sync_read: stop poking into iov_iter guts · 2b777c9d
由 Al Viro 提交于 4月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2b777c9d
A
ceph: switch to ->read_iter() · 3644424d
由 Al Viro 提交于 4月 02, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3644424d

start adding the tag to iov_iter · 71d8e532

由 Al Viro 提交于 3月 05, 2014

For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

71d8e532

new helper: generic_file_read_iter() · ed978a81

由 Al Viro 提交于 3月 05, 2014

iov_iter-using variant of generic_file_aio_read(). Some callers
converted. Note that it's still not quite there for use as ->read_iter() -
we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately,
that's true for all converted callers (and for generic_file_aio_read() itself).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ed978a81

A
ceph_aio_read(): keep iov_iter across retries · 05bb2e0b
由 Al Viro 提交于 3月 05, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
05bb2e0b

kill generic_segment_checks() · cb66a7a1

由 Al Viro 提交于 3月 04, 2014

all callers of ->aio_read() and ->aio_write() have iov/nr_segs already
checked - generic_segment_checks() done after that is just an odd way
to spell iov_length().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cb66a7a1

kill iov_iter_copy_from_user() · e7c24607

由 Al Viro 提交于 4月 10, 2014

all callers can use copy_page_from_iter() and it actually simplifies
them.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e7c24607

12 4月, 2014 2 次提交

fs: disallow all fallocate operation on active swapfile · 0790b31b

由 Lukas Czerner 提交于 4月 12, 2014

Currently some file system have IS_SWAPFILE check in their fallocate
implementations and some do not. However we should really prevent any
fallocate operation on swapfile so move the check to vfs and remove the
redundant checks from the file systems fallocate implementations.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0790b31b

ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure · eab87235

由 Al Viro 提交于 4月 03, 2014

ceph_osdc_put_request(ERR_PTR(-error)) oopses.  What we want there
is break, not goto out.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eab87235

05 4月, 2014 1 次提交

ceph: drop extra open file reference in ceph_atomic_open() · ab866549

由 Yan, Zheng 提交于 4月 01, 2014

ceph_atomic_open() calls ceph_open() after receiving the MDS reply.
ceph_open() grabs an extra open file reference. (The open request
already holds an open file reference)
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

ab866549

03 4月, 2014 2 次提交

ceph: fscache: Update object store limit after file writing · 32d3e148

由 Yunchuan Wen 提交于 12月 26, 2013

Synchronize object->store_limit[_l] with new inode->i_size after file writing.
Tested-by: NMilosz Tanski <milosz@adfin.com>
Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: NMin Chen <minchen@ubuntukylin.com>
Signed-off-by: NLi Wang <liwang@ubuntukylin.com>

32d3e148

ceph: do not chain inode updates to parent fsync · 752c8bdc

由 Sage Weil 提交于 2月 05, 2013

The fsync(dirfd) only covers namespace operations, not inode updates.
We do not need to cover setattr variants or O_TRUNC.
Reported-by: NAl Viro <viro@xeniv.linux.org.uk>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

752c8bdc

02 4月, 2014 2 次提交
- A
  ceph_aio_write(): switch to generic_perform_write() · aec605f4
  由 Al Viro 提交于 2月 11, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  aec605f4
- A
  kill the 5th argument of generic_file_buffered_write() · fcacafd2
  由 Al Viro 提交于 2月 09, 2014
```
same story - it's &iocb->ki_pos in all cases
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fcacafd2
18 2月, 2014 1 次提交
- Y
  ceph: add missing init_acl() for mkdir() and atomic_open() · b20a95a0
  由 Yan, Zheng 提交于 2月 11, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
  b20a95a0
29 1月, 2014 1 次提交

ceph: cast PAGE_SIZE to size_t in ceph_sync_write() · 125d725c

由 Ilya Dryomov 提交于 1月 28, 2014

Use min_t(size_t, ...) instead of plain min(), which does strict type
checking, to avoid compile warning on i386.

Cc: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

125d725c

14 12月, 2013 3 次提交

fs: ceph: new helper: file_inode(file) · aa8b60e0

由 Libo Chen 提交于 12月 11, 2013

Signed-off-by: NLibo Chen <clbchenlibo.chen@huawei.com>
Signed-off-by: NSage Weil <sage@inktank.com>

aa8b60e0

ceph: implement readv/preadv for sync operation · 8eb4efb0

由 majianpeng 提交于 9月 26, 2013

For readv/preadv sync-operatoin, ceph only do the first iov.
Now implement this.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

8eb4efb0

ceph: Implement writev/pwritev for sync operation. · e8344e66

由 majianpeng 提交于 9月 12, 2013

For writev/pwritev sync-operatoin, ceph only do the first iov.

I divided the write-sync-operation into two functions. One for
direct-write, other for none-direct-sync-write. This is because for
none-direct-sync-write we can merge iovs to one. But for direct-write,
we can't merge iovs.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

e8344e66

07 9月, 2013 1 次提交

ceph: use fscache as a local presisent cache · 99ccbd22

由 Milosz Tanski 提交于 8月 21, 2013

Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)

In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.
Signed-off-by: NMilosz Tanski <milosz@adfin.com>
Signed-off-by: NSage Weil <sage@inktank.com>

99ccbd22

28 8月, 2013 3 次提交

ceph: allow sync_read/write return partial successed size of read/write. · ee7289bf

由 majianpeng 提交于 8月 21, 2013

For sync_read/write, it may do multi stripe operations.If one of those
met erro, we return the former successed size rather than a error value.
There is a exception for write-operation met -EOLDSNAPC.If this occur,we
retry the whole write again.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>

ee7289bf

ceph: fix bugs about handling short-read for sync read mode. · 02ae66d8

由 majianpeng 提交于 8月 06, 2013

cephfs . show_layout
>layyout.data_pool:     0
>layout.object_size:   4194304
>layout.stripe_unit:   4194304
>layout.stripe_count:  1

TestA:
>dd if=/dev/urandom of=test bs=1M count=2 oflag=direct
>dd if=/dev/urandom of=test bs=1M count=2 seek=4  oflag=direct
>dd if=test of=/dev/null bs=6M count=1 iflag=direct
The messages from func striped_read are:
ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 2097152 HITSTRIPE SHORT
ceph:           file.c:350  : striped_read 2097152~4194304 (read 2097152) got 0 HITSTRIPE SHORT
ceph:           file.c:381  : zero tail 4194304
ceph:           file.c:390  : striped_read returns 6291456
The hole of file is from 2M--4M.But actualy it zero the last 4M include
the last 2M area which isn't a hole.
Using this patch, the messages are:
ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 2097152 HITSTRIPE SHORT
ceph:           file.c:358  :  zero gap 2097152 to 4194304
ceph:           file.c:350  : striped_read 4194304~2097152 (read 4194304) got 2097152
ceph:           file.c:384  : striped_read returns 6291456

TestB:
>echo majianpeng > test
>dd if=test of=/dev/null bs=2M count=1 iflag=direct
The messages are:
ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 11 HITSTRIPE SHORT
ceph:           file.c:350  : striped_read 11~6291445 (read 11) got 0 HITSTRIPE SHORT
ceph:           file.c:390  : striped_read returns 11
For this case,it did once more striped_read.It's no meaningless.
Using this patch, the message are:
ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 11 HITSTRIPE SHORT
ceph:           file.c:384  : striped_read returns 11

Big thanks to Yan Zheng for the patch.
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>

02ae66d8

ceph: fix fallocate division · b314a90d

由 Sage Weil 提交于 8月 27, 2013

We need to use do_div to divide by a 64-bit value.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

b314a90d

16 8月, 2013 2 次提交

ceph: punch hole support · ad7a60de

由 Li Wang 提交于 8月 15, 2013

This patch implements fallocate and punch hole support for Ceph kernel client.
Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>

ad7a60de

ceph: introduce i_truncate_mutex · b0d7c223

由 Yan, Zheng 提交于 8月 12, 2013

I encountered below deadlock when running fsstress

wmtruncate work      truncate                 MDS
---------------  ------------------  --------------------------
                   lock i_mutex
                                      <- truncate file
lock i_mutex (blocked)
                                      <- revoking Fcb (filelock to MIX)
                   send request ->
                                         handle request (xlock filelock)

At the initial time, there are some dirty pages in the page cache.
When the kclient receives the truncate message, it reduces inode size
and creates some 'out of i_size' dirty pages. wmtruncate work can't
truncate these dirty pages because it's blocked by the i_mutex. Later
when the kclient receives the cap message that revokes Fcb caps, It
can't flush all dirty pages because writepages() only flushes dirty
pages within the inode size.

When the MDS handles the 'truncate' request from kclient, it waits
for the filelock to become stable. But the filelock is stuck in
unstable state because it can't finish revoking kclient's Fcb caps.

The truncate pagecache locking has already caused lots of trouble
for use. I think it's time simplify it by introducing a new mutex.
We use the new mutex to prevent concurrent truncate_inode_pages().
There is no need to worry about race between buffered write and
truncate_inode_pages(), because our "get caps" mechanism prevents
them from concurrent execution.
Reviewed-by: NSage Weil <sage@inktank.com>
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

b0d7c223

10 8月, 2013 4 次提交

ceph: replace hold_mutex flag with goto · 2f75e9e1

由 Sage Weil 提交于 8月 09, 2013

All of the early exit paths need to drop the mutex; it is only the normal
path through the function that does not.  Skip the unlock in that case
with a goto out_unlocked.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NJianpeng Ma <majianpeng@gmail.com>

2f75e9e1

ceph: Move the place for EOLDSNAPC handle in ceph_aio_write to easily understand · 0e5dd45c

由 majianpeng 提交于 8月 08, 2013

Only for ceph_sync_write, the osd can return EOLDSNAPC.so move the
related codes after the call ceph_sync_write.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

0e5dd45c

ceph: Don't use ceph-sync-mode for synchronous-fs. · 7ab9b380

由 majianpeng 提交于 6月 27, 2013

Sending reads and writes through the sync read/write paths bypasses the
page cache, which is not expected or generally a good idea.  Removing
the write check is safe as there is a conditional vfs_fsync_range() later
in ceph_aio_write that already checks for the same flag (via
IS_SYNC(inode)).
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7ab9b380

ceph: cleanup types in striped_read() · 688bac46

由 Dan Carpenter 提交于 7月 23, 2013

We pass in a u64 value for "len" and then immediately truncate away the
upper 32 bits.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <alex.elder@linaro.org>

688bac46

04 7月, 2013 2 次提交

ceph: fix pending vmtruncate race · b415bf4f

由 Yan, Zheng 提交于 7月 02, 2013

The locking order for pending vmtruncate is wrong, it can lead to
following race:

        write                  wmtruncate work
------------------------    ----------------------
lock i_mutex
check i_truncate_pending   check i_truncate_pending
truncate_inode_pages()     lock i_mutex (blocked)
copy data to page cache
unlock i_mutex
                           truncate_inode_pages()

The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()

Fixes: http://tracker.ceph.com/issues/5453Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b415bf4f

ceph: remove sb_start/end_write in ceph_aio_write. · 0405a149

由 Jianpeng Ma 提交于 6月 23, 2013

Either in vfs_write or io_submit,it call file_start/end_write.
The different between file_start/end_write and sb_start/end_write is
file_ only handle regular file.But i think in ceph_aio_write,it only
for regular file.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Acked-by: NYan, Zheng <zheng.z.yan@intel.com>

0405a149

03 7月, 2013 1 次提交

vfs: export lseek_execute() to modules · 46a1c2c7

由 Jie Liu 提交于 6月 25, 2013

For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
matter in lseek_execute() to update the current file offset
to the desired offset if it is valid, ceph also does the
simliar things at ceph_llseek().

To reduce the duplications, this patch make lseek_execute()
public accessible so that we can call it directly from the
underlying file systems.

Thanks Dave Chinner for this suggestion.

[AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]

v2->v1:
- Add kernel-doc comments for lseek_execute()
- Call lseek_execute() in ceph->llseek()
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Ted Tso <tytso@mit.edu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Sage Weil <sage@inktank.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

46a1c2c7

08 5月, 2013 1 次提交

aio: don't include aio.h in sched.h · a27bb332

由 Kent Overstreet 提交于 5月 07, 2013

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a27bb332

02 5月, 2013 5 次提交

libceph: kill off osd data write_request parameters · 406e2c9f

由 Alex Elder 提交于 4月 15, 2013

In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data.  Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

406e2c9f

ceph: fix printk format warnings in file.c · ac7f29bf

由 Randy Dunlap 提交于 4月 19, 2013

Fix printk format warnings by using %zd for 'ssize_t' variables:

fs/ceph/file.c:751:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]
fs/ceph/file.c:762:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: NSage Weil <sage@inktank.com>

ac7f29bf

ceph: apply write checks in ceph_aio_write · 03d254ed

由 Yan, Zheng 提交于 4月 12, 2013

copy write checks in __generic_file_aio_write to ceph_aio_write.
To make these checks cover sync write path.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

03d254ed

ceph: take i_mutex before getting Fw cap · 37505d57

由 Yan, Zheng 提交于 4月 12, 2013

There is deadlock as illustrated bellow. The fix is taking i_mutex
before getting Fw cap reference.

      write                    truncate                 MDS
---------------------     --------------------      --------------
get Fw cap
                          lock i_mutex
lock i_mutex (blocked)
                          request setattr.size  ->
                                                <-   revoke Fw cap
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

37505d57

libceph: change how "safe" callback is used · 26be8808

由 Alex Elder 提交于 4月 15, 2013

An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706Reported-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

26be8808