提交 · 99c88e6900fb05d267ae9f6d5e15dc7192ba6f8d · openeuler / raspberrypi-kernel

22 1月, 2016 5 次提交

ceph: use i_size_{read,write} to get/set i_size · 99c88e69

由 Yan, Zheng 提交于 12月 30, 2015

Cap message from MDS can update i_size. In that case, we don't
hold i_mutex. So it's unsafe to directly access inode->i_size
while holding i_mutex.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

99c88e69

ceph: re-send AIO write request when getting -EOLDSNAP error · 5be0389d

由 Yan, Zheng 提交于 12月 24, 2015

When receiving -EOLDSNAP from OSD, we need to re-send corresponding
write request. Due to locking issue, we can send new request inside
another OSD request's complete callback. So we use worker to re-send
request for AIO write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5be0389d

ceph: Asynchronous IO support · c8fe9b17

由 Yan, Zheng 提交于 12月 23, 2015

The basic idea of AIO support is simple, just call kiocb::ki_complete()
in OSD request's complete callback. But there are several special cases.

when IO span multiple objects, we need to wait until all OSD requests
are complete, then call kiocb::ki_complete(). Error handling in this case
is tricky too. For simplify, AIO both span multiple objects and extends
i_size are not allowed.

Another special case is check EOF for reading (other client can write to
the file and extend i_size concurrently). For simplify, the direct-IO/AIO
code path does do the check, fallback to normal syn read instead.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c8fe9b17

ceph: Avoid to propagate the invalid page point · 458c4703

由 Minfei Huang 提交于 12月 19, 2015

The variant pagep will still get the invalid page point, although ceph
fails in function ceph_update_writeable_page.

To fix this issue, Assigne the page to pagep until there is no failure
in function ceph_update_writeable_page.
Signed-off-by: NMinfei Huang <mnfhuang@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

458c4703

ceph: fix double page_unlock() in page_mkwrite() · f9cac5ac

由 Yan, Zheng 提交于 12月 17, 2015

ceph_update_writeable_page() unlocks the page on errors, so
page_mkwrite() should not unlock the page again.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

f9cac5ac

07 11月, 2015 1 次提交

mm, fs: introduce mapping_gfp_constraint() · c62d2555

由 Michal Hocko 提交于 11月 06, 2015

There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c62d2555

03 11月, 2015 7 次提交

libceph: msg signing callouts don't need con argument · 79dbd1ba

由 Ilya Dryomov 提交于 10月 26, 2015

We can use msg->con instead - at the point we sign an outgoing message
or check the signature on the incoming one, msg->con is always set.  We
wouldn't know how to sign a message without an associated session (i.e.
msg->con == NULL) and being able to sign a message using an explicitly
provided authorizer is of no use.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

79dbd1ba

ceph: make fsync() wait unsafe requests that created/modified inode · 68cd5b4b

由 Yan, Zheng 提交于 10月 27, 2015

If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

68cd5b4b

ceph: add request to i_unsafe_dirops when getting unsafe reply · 4c06ace8

由 Yan, Zheng 提交于 10月 27, 2015

Previously we add request to i_unsafe_dirops when registering
request. So ceph_fsync() also waits for imcomplete requests.
This is unnecessary, ceph_fsync() only needs to wait unsafe
requests.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

4c06ace8

ceph: don't invalidate page cache when inode is no longer used · 5e804ac4

由 Yan, Zheng 提交于 10月 26, 2015

ceph_check_caps() invalidate page cache when inode is not used
by any open file. This behaviour is not friendly for workload
that repeatly read files.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5e804ac4

ceph: combine as many iovec as possile into one OSD request · b5b98989

由 Zhu, Caifeng 提交于 10月 08, 2015

Both ceph_sync_direct_write and ceph_sync_read iterate iovec elements
one by one, send one OSD request for each iovec. This is sub-optimal,
We can combine serveral iovec into one page vector, and send an OSD
request for the whole page vector.
Signed-off-by: NZhu, Caifeng <zhucaifeng@unissoft-nj.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

b5b98989

ceph: fix message length computation · 777d738a

由 Arnd Bergmann 提交于 9月 30, 2015

create_request_message() computes the maximum length of a message,
but uses the wrong type for the time stamp: sizeof(struct timespec)
may be 8 or 16 depending on the architecture, while sizeof(struct
ceph_timespec) is always 8, and that is what gets put into the
message.

Found while auditing the uses of timespec for y2038 problems.

Fixes: b8e69066 ("ceph: include time stamp in every MDS request")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

777d738a

ceph: fix a comment typo · 1291fb95

由 Geliang Tang 提交于 9月 30, 2015

Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

1291fb95

23 10月, 2015 1 次提交

Move locks API users to locks_lock_inode_wait() · 4f656367

由 Benjamin Coddington 提交于 10月 22, 2015

Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait().  This
allows for some later cleanup.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

4f656367

11 9月, 2015 1 次提交

mm: mark most vm_operations_struct const · 7cbea8dc

由 Kirill A. Shutemov 提交于 9月 09, 2015

With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7cbea8dc

09 9月, 2015 9 次提交

ceph: improve readahead for file holes · 43838685

由 Yan, Zheng 提交于 9月 07, 2015

When readahead encounters file holes, osd reply returns error -ENOENT,
finish_read() skips adding pages to the the page cache. So readahead
does not work for file holes. The fix is adding zero pages to the
page cache when -ENOENT is returned.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

43838685

Y
ceph: get inode size for each append write · 55b0b31c
由 Yan, Zheng 提交于 9月 07, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
55b0b31c

ceph: cleanup use of ceph_msg_get · 5fdb1389

由 Jianpeng Ma 提交于 8月 18, 2015

Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5fdb1389

ceph: no need to get parent inode in ceph_open · e36d571d

由 Jianpeng Ma 提交于 8月 18, 2015

parent inode is needed in creating new inode case.  For ceph_open,
the target inode already exists.
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e36d571d

ceph: remove the useless judgement · a43137f7

由 Jianpeng Ma 提交于 8月 18, 2015

err != 0 is already handled. So skip this.
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a43137f7

B
ceph: remove redundant test of head->safe and silence static analysis warnings · 1550d34e
由 Brad Hubbard 提交于 8月 18, 2015
```
Signed-off-by: NBrad Hubbard <bhubbard@redhat.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
1550d34e

ceph: fix queuing inode to mdsdir's snaprealm · 23078637

由 Yan, Zheng 提交于 7月 20, 2015

During MDS failovers, MClientSnap message may cause kclient to move
some inodes from root directory's snaprealm to mdsdir's snaprealm
and queue snapshots for these inodes. For a FS has never created any
snapshot, both root directory's snaprealm and mdsdir's snaprealm
share the same snapshot contexts (both are ceph_empty_snapc). This
confuses ceph_put_wrbuffer_cap_refs(), make it unable to distinguish
snapshot buffers from head buffers.

The fix is do not use ceph_empty_snapc as snaprealm's cached context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

23078637

ceph: invalidate dirty pages after forced umount · a341d4df

由 Yan, Zheng 提交于 7月 01, 2015

After forced umount, ceph_writepages_start() skips flushing dirty
pages. To make sure inode's reference count get dropped to zero,
we need to invalidate dirty pages.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a341d4df

ceph: EIO all operations after forced umount · 48fec5d0

由 Yan, Zheng 提交于 7月 01, 2015

This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

48fec5d0

05 9月, 2015 1 次提交

fs: create and use seq_show_option for escaping · a068acf2

由 Kees Cook 提交于 9月 04, 2015

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NJan Kara <jack@suse.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a068acf2

31 7月, 2015 2 次提交

ceph: always re-send cap flushes when MDS recovers · fc927cd3

由 Yan, Zheng 提交于 7月 20, 2015

commit e548e9b9 makes the kclient
only re-send cap flush once during MDS failover. If the kclient sends
a cap flush after MDS enters reconnect stage but before MDS recovers.
The kclient will skip re-sending the same cap flush when MDS recovers.

This causes problem for newly created inode. The MDS handles cap
flushes before replaying unsafe requests, so it's possible that MDS
find corresponding inode is missing when handling cap flush. The fix
is reverting to old behaviour: always re-send when MDS recovers
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fc927cd3

ceph: fix ceph_encode_locks_to_buffer() · f6762cb2

由 Yan, Zheng 提交于 7月 07, 2015

posix locks should be in ctx->flc_posix list
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f6762cb2

25 6月, 2015 13 次提交

ceph: fix ceph_writepages_start() · e1966b49

由 Yan, Zheng 提交于 6月 18, 2015

Before a page get locked, someone else can write data to the page
and increase the i_size. So we should re-check the i_size after
pages are locked.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e1966b49

ceph: rework dcache readdir · fdd4e158

由 Yan, Zheng 提交于 6月 16, 2015

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fdd4e158

ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL · 687265e5

由 Yan, Zheng 提交于 6月 13, 2015

GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

687265e5

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9

由 Yan, Zheng 提交于 6月 10, 2015

if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e548e9b9

ceph: send TID of the oldest pending caps flush to MDS · a2971c8c

由 Yan, Zheng 提交于 6月 10, 2015

According to this information, MDS can trim its completed caps flush
list (which is used to detect duplicated cap flush).
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a2971c8c

ceph: track pending caps flushing globally · 8310b089

由 Yan, Zheng 提交于 6月 09, 2015

So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

8310b089

ceph: track pending caps flushing accurately · 553adfd9

由 Yan, Zheng 提交于 6月 09, 2015

Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

553adfd9

ceph: fix directory fsync · da819c81

由 Yan, Zheng 提交于 5月 27, 2015

fsync() on directory should flush dirty caps and wait for any
uncommitted directory opertions to commit. But ceph_dir_fsync()
only waits for uncommitted directory opertions.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

da819c81

ceph: fix flushing caps · 89b52fe1

由 Yan, Zheng 提交于 5月 27, 2015

Current ceph_fsync() only flushes dirty caps and wait for them to be
flushed. It doesn't wait for caps that has already been flushing.
This patch makes ceph_fsync() wait for pending flushing caps too.
Besides, this patch also makes caps_are_flushed() peroperly handle
tid wrapping.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

89b52fe1

ceph: don't include used caps in cap_wanted · 41445999

由 Yan, Zheng 提交于 5月 25, 2015

when copying files to cephfs, file data may stay in page cache after
corresponding file is closed. Cached data use Fc capability. If we
include Fc capability in cap_wanted, MDS will treat files with cached
data as open files, and journal them in an EOpen event when trimming
log segment.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

41445999

Y
ceph: ratelimit warn messages for MDS closes session · 3e0708b9
由 Yan, Zheng 提交于 5月 22, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
3e0708b9

ceph: simplify two mount_timeout sites · 5be73034

由 Ilya Dryomov 提交于 5月 19, 2015

No need to bifurcate wait now that we've got ceph_timeout_jiffies().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

5be73034