提交 · 4214fb158cc423ac31b841000e219855be055388 · openanolis / cloud-kernel

07 9月, 2017 4 次提交

Y
ceph: validate correctness of some mount options · 4214fb15
由 Yan, Zheng 提交于 7月 11, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
4214fb15

由 Yan, Zheng 提交于 7月 11, 2017

OSD has a configurable limitation of max write size. OSD return
error if write request size is larger than the limitation. For now,
set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small
enough.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

95cca2b4

ceph: limit osd read size to CEPH_MSG_MAX_DATA_LEN · aa187926

由 Yan, Zheng 提交于 7月 11, 2017

libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN.

Link: http://tracker.ceph.com/issues/20528Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

aa187926

Y
ceph: remove unused cap_release_safety mount option · 2ae409dc
由 Yan, Zheng 提交于 7月 11, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
2ae409dc

07 7月, 2017 2 次提交

ceph: new mount option that specifies fscache uniquifier · 1d8f8360

由 Yan, Zheng 提交于 6月 27, 2017

Current ceph uses FSID as primary index key of fscache data. This
allows ceph to retain cached data across remount. But this causes
problem (kernel opps, fscache does not support sharing data) when
a filesystem get mounted several times (with fscache enabled, with
different mount options).

The fix is adding a new mount option, which specifies uniquifier
for fscache.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1d8f8360

ceph: avoid invalid memory dereference in the middle of umount · 62a65f36

由 Yan, Zheng 提交于 6月 22, 2017

extra_mon_dispatch() and debugfs' foo_show functions dereference
fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch
and debugfs before destroying fsc->mds.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

62a65f36

04 5月, 2017 1 次提交

libceph, ceph: always advertise all supported features · 74da4a0f

由 Ilya Dryomov 提交于 3月 03, 2017

No reason to hide CephFS-specific features in the rbd case.  Recent
feature bits mix RADOS and CephFS-specific stuff together anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

74da4a0f

21 4月, 2017 1 次提交

ceph: Convert to separately allocated bdi · 09dc9fc2

由 Jan Kara 提交于 4月 12, 2017

Allocate struct backing_dev_info separately instead of embedding it
inside client structure. This unifies handling of bdi among users.

CC: Ilya Dryomov <idryomov@gmail.com>
CC: "Yan, Zheng" <zyan@redhat.com>
CC: Sage Weil <sage@redhat.com>
CC: ceph-devel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

09dc9fc2

25 2月, 2017 1 次提交

ceph: remove special ack vs commit behavior · 55f2a045

由 Ilya Dryomov 提交于 2月 13, 2017

- ask for a commit reply instead of an ack reply in
  __ceph_pool_perm_get()
- don't ask for both ack and commit replies in ceph_sync_write()
- since just only one reply is requested now, i_unsafe_writes list
  will always be empty -- kill ceph_sync_write_wait() and go back to
  a standard ->evict_inode()
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

55f2a045

20 2月, 2017 1 次提交

ceph: set io_pages bdi hint · 7c94ba27

由 Andreas Gerstmayr 提交于 1月 10, 2017

This patch sets the io_pages bdi hint based on the rsize mount option.
Without this patch large buffered reads (request size > max readahead)
are processed sequentially in chunks of the readahead size (i.e. read
requests are sent out up to the readahead size, then the
do_generic_file_read() function waits until the first page is received).

With this patch read requests are sent out at once up to the size
specified in the rsize mount option (default: 64 MB).
Signed-off-by: NAndreas Gerstmayr <andreas.gerstmayr@catalysts.cc>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7c94ba27

13 12月, 2016 1 次提交
- Y
  ceph: check availability of mds cluster on mount · e9e427f0
  由 Yan, Zheng 提交于 11月 10, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  e9e427f0
29 10月, 2016 2 次提交
- A
  ceph: switch to use of ->d_init() · ad5cb123
  由 Al Viro 提交于 10月 28, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  ad5cb123
- A
  ceph: unify dentry_operations instances · 18fc8abd
  由 Al Viro 提交于 10月 28, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  18fc8abd
18 10月, 2016 1 次提交

ceph: fix uninitialized dentry pointer in ceph_real_mount() · 31ca5878

由 Geert Uytterhoeven 提交于 10月 13, 2016

    fs/ceph/super.c: In function ‘ceph_real_mount’:
    fs/ceph/super.c:818: warning: ‘root’ may be used uninitialized in this function

If s_root is already valid, dentry pointer root is never initialized,
and returned by ceph_real_mount(). This will cause a crash later when
the caller dereferences the pointer.

Fixes: ce2728aa ("ceph: avoid accessing / when mounting a subpath")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

31ca5878

03 10月, 2016 1 次提交

ceph: avoid accessing / when mounting a subpath · ce2728aa

由 Yan, Zheng 提交于 9月 14, 2016

Accessing / causes failuire if the client has caps that restrict path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

ce2728aa

28 7月, 2016 3 次提交

ceph: Mark the file cache as unreclaimable · 6b1a9a6c

由 Nikolay Borisov 提交于 7月 25, 2016

Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
that it can satisfy its internal needs. Inspecting the code shows that
most of the caches are indeed reclaimable since they are directly
related to the generic inode/dentry shrinkers. However, one of the
cache used to satisfy struct file is not reclaimable since its
entries are freed only when the last reference to the file is
dropped. If a heavily loaded node opens a lot of files it can
introduce non-trivial discrepancies between memory shown as reclaimable
and what is actually reclaimed when drop_caches is used.

Fix this by removing the reclaimable flag for the file's cache.
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

6b1a9a6c

ceph: mount non-default filesystem by name · 430afbad

由 Yan, Zheng 提交于 7月 08, 2016

To mount non-default filesytem, user currently needs to provide mds
namespace ID. This is inconvenience.

This patch makes user be able to mount filesystem by name. If user
wants to mount non-default filesystem. Client first subscribes to
fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

430afbad

ceph: wait unsafe sync writes for evicting inode · 9a5530c6

由 Yan, Zheng 提交于 6月 15, 2016

Otherwise ceph_sync_write_unsafe() may access/modify freed inode.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

9a5530c6

26 5月, 2016 3 次提交

Y
ceph: report mount root in session metadata · 3f384954
由 Yan, Zheng 提交于 4月 21, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
3f384954
Y
ceph: CEPH_FEATURE_MDSENC support · d463a43d
由 Yan, Zheng 提交于 3月 31, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
d463a43d

ceph: multiple filesystem support · 235a0982

由 Yan, Zheng 提交于 3月 30, 2016

To access non-default filesystem, we just need to subscribe to
mdsmap.<MDS_NAMESPACE_ID> and add a new mount option for mds
namespace id.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: switch to a new libceph API]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

235a0982

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

26 3月, 2016 4 次提交

ceph: fix mounting same fs multiple times · 132ca7e1

由 Yan, Zheng 提交于 3月 12, 2016

Now __ceph_open_session() only accepts closed client. An opened
client will tigger BUG_ON().
Signed-off-by: NYan, Zheng <zyan@redhat.com>

132ca7e1

ceph: kill ceph_empty_snapc · 34b759b4

由 Ilya Dryomov 提交于 2月 16, 2016

ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
for sizing the request message.

Further, in all four cases the subsequent ceph_osdc_build_request() is
passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
and making ceph_empty_snapc entirely useless.  The two cases where it
actually mattered were removed in commits 86056090 ("ceph: avoid
sending unnessesary FLUSHSNAP message") and 23078637 ("ceph: fix
queuing inode to mdsdir's snaprealm").
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

34b759b4

ceph: don't enable rbytes mount option by default · 133e9156

由 Yan, Zheng 提交于 1月 25, 2016

When rbytes mount option is enabled, directory size is recursive
size. Recursive size is not updated instantly. This can cause
directory size to change between successive stat(1)
Signed-off-by: NYan, Zheng <zyan@redhat.com>

133e9156

libceph: revamp subs code, switch to SUBSCRIBE2 protocol · 82dcabad

由 Ilya Dryomov 提交于 1月 19, 2016

It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime". To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs. Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

82dcabad

15 1月, 2016 1 次提交

kmemcg: account certain kmem allocations to memcg · 5d097056

由 Vladimir Davydov 提交于 1月 14, 2016

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d097056

09 9月, 2015 1 次提交

ceph: EIO all operations after forced umount · 48fec5d0

由 Yan, Zheng 提交于 7月 01, 2015

This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

48fec5d0

05 9月, 2015 1 次提交

fs: create and use seq_show_option for escaping · a068acf2

由 Kees Cook 提交于 9月 04, 2015

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NJan Kara <jack@suse.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a068acf2

25 6月, 2015 3 次提交

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

libceph: store timeouts in jiffies, verify user input · a319bf56

由 Ilya Dryomov 提交于 5月 15, 2015

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a319bf56

Y
ceph: check OSD caps before read/write · 10183a69
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
10183a69

20 4月, 2015 3 次提交

ceph: show non-default options only · ff7eeb82

由 Ilya Dryomov 提交于 3月 25, 2015

Don't pollute /proc/mounts with default options (presently these are
dcache, nofsc and acl).  Leave the acl/noacl however - it's a bit of
a special case due to CONFIG_CEPH_FS_POSIX_ACL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ff7eeb82

libceph, ceph: split ceph_show_options() · ff40f9ae

由 Ilya Dryomov 提交于 3月 25, 2015

Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph.  This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ff40f9ae

ceph: kstrdup() memory handling · a149bb9a

由 Sanidhya Kashyap 提交于 3月 21, 2015

Currently, there is no check for the kstrdup() for r_path2,
r_path1 and snapdir_name as various locations as there is a
possibility of failure during memory pressure. Therefore,
returning ENOMEM where the checks have been missed.
Signed-off-by: NSanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a149bb9a

16 4月, 2015 1 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

19 2月, 2015 1 次提交
- I
  ceph: show nocephx_require_signatures and notcp_nodelay options · 2a0b61ce
  由 Ilya Dryomov 提交于 2月 02, 2015
```
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
```
  2a0b61ce
21 1月, 2015 2 次提交

fs: remove default_backing_dev_info · df0ce26c

由 Christoph Hellwig 提交于 1月 14, 2015

Now that default_backing_dev_info is not used for writeback purposes we can
git rid of it easily:

 - instead of using it's name for tracing unregistered bdi we just use
   "unknown"
 - btrfs and ceph can just assign the default read ahead window themselves
   like several other filesystems already do.
 - we can assign noop_backing_dev_info as the default one in alloc_super.
   All filesystems already either assigned their own or
   noop_backing_dev_info.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

df0ce26c

ceph: remove call to bdi_unregister · e4d27509

由 Christoph Hellwig 提交于 1月 14, 2015

bdi_destroy already does all the work, and if we delay freeing the
anon bdev we can get away with just that single call.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

e4d27509

18 12月, 2014 1 次提交
- Y
  ceph: support inline data feature · 65a22662
  由 Yan, Zheng 提交于 11月 17, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  65a22662

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功