提交 · 1418bf076d08edd47a610ea3844c6f6012949a51 · openeuler / Kernel

05 2月, 2016 1 次提交

ceph: checking for IS_ERR instead of NULL · 1418bf07

由 Dan Carpenter 提交于 1月 26, 2016

ceph_osdc_alloc_request() returns NULL on error, it never returns error
pointers.

Fixes: 5be0389d ('ceph: re-send AIO write request when getting -EOLDSNAP error')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1418bf07

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

22 1月, 2016 3 次提交

ceph: use i_size_{read,write} to get/set i_size · 99c88e69

由 Yan, Zheng 提交于 12月 30, 2015

Cap message from MDS can update i_size. In that case, we don't
hold i_mutex. So it's unsafe to directly access inode->i_size
while holding i_mutex.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

99c88e69

ceph: re-send AIO write request when getting -EOLDSNAP error · 5be0389d

由 Yan, Zheng 提交于 12月 24, 2015

When receiving -EOLDSNAP from OSD, we need to re-send corresponding
write request. Due to locking issue, we can send new request inside
another OSD request's complete callback. So we use worker to re-send
request for AIO write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5be0389d

ceph: Asynchronous IO support · c8fe9b17

由 Yan, Zheng 提交于 12月 23, 2015

The basic idea of AIO support is simple, just call kiocb::ki_complete()
in OSD request's complete callback. But there are several special cases.

when IO span multiple objects, we need to wait until all OSD requests
are complete, then call kiocb::ki_complete(). Error handling in this case
is tricky too. For simplify, AIO both span multiple objects and extends
i_size are not allowed.

Another special case is check EOF for reading (other client can write to
the file and extend i_size concurrently). For simplify, the direct-IO/AIO
code path does do the check, fallback to normal syn read instead.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c8fe9b17

03 11月, 2015 1 次提交

ceph: combine as many iovec as possile into one OSD request · b5b98989

由 Zhu, Caifeng 提交于 10月 08, 2015

Both ceph_sync_direct_write and ceph_sync_read iterate iovec elements
one by one, send one OSD request for each iovec. This is sub-optimal,
We can combine serveral iovec into one page vector, and send an OSD
request for the whole page vector.
Signed-off-by: NZhu, Caifeng <zhucaifeng@unissoft-nj.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

b5b98989

09 9月, 2015 3 次提交

Y
ceph: get inode size for each append write · 55b0b31c
由 Yan, Zheng 提交于 9月 07, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
55b0b31c

ceph: no need to get parent inode in ceph_open · e36d571d

由 Jianpeng Ma 提交于 8月 18, 2015

parent inode is needed in creating new inode case.  For ceph_open,
the target inode already exists.
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e36d571d

ceph: remove the useless judgement · a43137f7

由 Jianpeng Ma 提交于 8月 18, 2015

err != 0 is already handled. So skip this.
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a43137f7

25 6月, 2015 5 次提交

ceph: rework dcache readdir · fdd4e158

由 Yan, Zheng 提交于 6月 16, 2015

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fdd4e158

ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL · 687265e5

由 Yan, Zheng 提交于 6月 13, 2015

GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

687265e5

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference · 5dda377c

由 Yan, Zheng 提交于 4月 30, 2015

In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.

Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5dda377c

Y
libceph: allow setting osd_req_op's flags · 144cba14
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
144cba14

24 6月, 2015 1 次提交

fs: Rename file_remove_suid() to file_remove_privs() · 5fa8e0a1

由 Jan Kara 提交于 5月 21, 2015

file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5fa8e0a1

16 4月, 2015 1 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

12 4月, 2015 4 次提交

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

switch generic_write_checks() to iocb and iter · 3309dd04

由 Al Viro 提交于 4月 09, 2015

... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c

3309dd04

generic_write_checks(): drop isblk argument · 0fa6b005

由 Al Viro 提交于 4月 04, 2015

all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0fa6b005

make new_sync_{read,write}() static · 5d5d5689

由 Al Viro 提交于 4月 03, 2015

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d5d5689

26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 2月 22, 2015

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

13 3月, 2015 1 次提交

fs: remove ki_nbytes · 66ee59af

由 Christoph Hellwig 提交于 2月 11, 2015

There is no need to pass the total request length in the kiocb, as
we already get passed in through the iov_iter argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

66ee59af

23 2月, 2015 1 次提交

VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8

由 David Howells 提交于 1月 29, 2015

Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cb0b8

19 2月, 2015 3 次提交

ceph: fix atomic_open snapdir · bf91c315

由 Yan, Zheng 提交于 1月 19, 2015

ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
and creates snapdir inode if it's -ENOENT
Signed-off-by: NYan, Zheng <zyan@redhat.com>

bf91c315

ceph: fix reading inline data when i_size > PAGE_SIZE · fcc02d2a

由 Yan, Zheng 提交于 1月 10, 2015

when inode has inline data but its size > PAGE_SIZE (it was truncated
to larger size), previous direct read code return -EIO. This patch adds
code to return zeros for data whose offset > PAGE_SIZE.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fcc02d2a

ceph: properly zero data pages for file holes. · 1487a688

由 Yan, Zheng 提交于 1月 06, 2015

A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls
ceph_zero_pape_vector_range(). The first argument, page_align + read + ret,
passed to ceph_zero_pape_vector_range() is wrong.

When a file has holes, this wrong parameter may cause memory corruption
either in kernal space or user space. Kernel space memory may be corrupted in
the case of non direct IO; user space memory may be corrupted in the case of
direct IO. In the latter case, the application doing direct IO may crash due
to memory corruption, as we have experienced.

The correct value should be initial_align + read + ret, where intial_align =
o_direct ? buf_align : io_align. Compared with page_align, the current page
offest, initial_align is the initial page offest, which should be used to
calculate the page and offset in ceph_zero_pape_vector_range().
Reported-by: Ncaifeng zhu <zhucaifeng@unissoft-nj.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

1487a688

21 1月, 2015 1 次提交

fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info · de1414a6

由 Christoph Hellwig 提交于 1月 14, 2015

Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case.  Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

de1414a6

18 12月, 2014 4 次提交

ceph: convert inline data to normal data before data write · 28127bdd

由 Yan, Zheng 提交于 11月 14, 2014

Before any data write, convert inline data to normal data and set
i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
inline data to object contains 3 operations (CMPXATTR, WRITE and
SETXATTR). It compares a xattr named 'inline_version' to prevent
old data overwrites newer data.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

28127bdd

ceph: sync read inline data · 83701246

由 Yan, Zheng 提交于 11月 14, 2014

we can't use getattr to fetch inline data while holding Fr cap,
because it can cause deadlock. If we need to sync read inline data,
drop cap refs first, then use getattr to fetch inline data.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

83701246

ceph: fetch inline data when getting Fcr cap refs · 3738daa6

由 Yan, Zheng 提交于 11月 14, 2014

we can't use getattr to fetch inline data after getting Fcr caps,
because it can cause deadlock. The solution is try bringing inline
data to page cache when not holding any cap, and hope the inline
data page is still there after getting the Fcr caps. If the page
is still there, pin it in page cache for later IO.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3738daa6

libceph: specify position of extent operation · 715e4cd4

由 Yan, Zheng 提交于 11月 13, 2014

allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

715e4cd4

20 11月, 2014 2 次提交
- A
  kill f_dentry uses · b583043e
  由 Al Viro 提交于 10月 31, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  b583043e
- A
  assorted conversions to %p[dD] · a455589f
  由 Al Viro 提交于 10月 21, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  a455589f
15 10月, 2014 3 次提交

ceph: include the initial ACL in create/mkdir/mknod MDS requests · b1ee94aa

由 Yan, Zheng 提交于 9月 16, 2014

Current code set new file/directory's initial ACL in a non-atomic
manner.
Client first sends request to MDS to create new file/directory, then set
the initial ACL after the new file/directory is successfully created.

The fix is include the initial ACL in create/mkdir/mknod MDS requests.
So MDS can handle creating file/directory and setting the initial ACL in
one request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b1ee94aa

ceph: remove redundant io_iter_advance() · 3b70b388

由 Yan, Zheng 提交于 9月 17, 2014

ceph_sync_read and generic_file_read_iter() have already advanced the
IO iterator.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3b70b388

ceph: request xattrs if xattr_version is zero · 508b32d8

由 Yan, Zheng 提交于 9月 16, 2014

Following sequence of events can happen.
  - Client releases an inode, queues cap release message.
  - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

The fix is force sending a getattr request to MDS if xattrs_version
is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
does not have xattr.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

508b32d8

28 7月, 2014 1 次提交

ceph: fix append mode write · 06fee30f

由 Yan, Zheng 提交于 7月 28, 2014

generic_write_checks() may update 'pos', so we need to pass 'pos'
to ceph_sync_write() and ceph_sync_direct_write();
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

06fee30f

21 7月, 2014 1 次提交
- Y
  ceph: check zero length in ceph_sync_read() · d0d0db22
  由 Yan, Zheng 提交于 7月 21, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
  d0d0db22
08 7月, 2014 2 次提交
- Y
  ceph: pass proper page offset to copy_page_to_iter() · 5aaa432a
  由 Yan, Zheng 提交于 7月 02, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
  5aaa432a
- Y
  ceph: check unsupported fallocate mode · 494d77bf
  由 Yan, Zheng 提交于 6月 26, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
  494d77bf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功