提交 · 6b0490816671b2f4126a99998c9bf3c8c0472de2 · openeuler / Kernel

15 10月, 2014 40 次提交

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 6b049081

由 Linus Torvalds 提交于 10月 15, 2014

Pull Ceph updates from Sage Weil:
 "There is the long-awaited discard support for RBD (Guangliang Zhao,
  Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
  (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
  performance and debugging improvements (Yan, Zheng, John Spray), and a
  smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  ceph: fix divide-by-zero in __validate_layout()
  rbd: rbd workqueues need a resque worker
  libceph: ceph-msgr workqueue needs a resque worker
  ceph: fix bool assignments
  libceph: separate multiple ops with commas in debugfs output
  libceph: sync osd op definitions in rados.h
  libceph: remove redundant declaration
  ceph: additional debugfs output
  ceph: export ceph_session_state_name function
  ceph: include the initial ACL in create/mkdir/mknod MDS requests
  ceph: use pagelist to present MDS request data
  libceph: reference counting pagelist
  ceph: fix llistxattr on symlink
  ceph: send client metadata to MDS
  ceph: remove redundant code for max file size verification
  ceph: remove redundant io_iter_advance()
  ceph: move ceph_find_inode() outside the s_mutex
  ceph: request xattrs if xattr_version is zero
  rbd: set the remaining discard properties to enable support
  rbd: use helpers to handle discard for layered images correctly
  ...

6b049081

Merge branch 'CVE-2014-7970' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux · ce9d7f7b

由 Linus Torvalds 提交于 10月 15, 2014

Pull pivot_root() fix from Andy Lutomirski.

Prevent a leak of unreachable mounts.

* 'CVE-2014-7970' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux:
  mnt: Prevent pivot_root from creating a loop in the mount tree

ce9d7f7b

mnt: Prevent pivot_root from creating a loop in the mount tree · 0d082601

由 Eric W. Biederman 提交于 10月 08, 2014

Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.

In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another.  Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.

Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.

[Added stable cc.  Fixes CVE-2014-7970.  --Andy]

Cc: stable@vger.kernel.org
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>

0d082601

ceph: fix divide-by-zero in __validate_layout() · 0bc62284

由 Yan, Zheng 提交于 10月 14, 2014

The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0bc62284

rbd: rbd workqueues need a resque worker · 792c3a91

由 Ilya Dryomov 提交于 10月 10, 2014

Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.

Cc: stable@vger.kernel.org # 3.17, needs backporting for 3.16
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Tested-by: NMicha Krause <micha@krausam.de>
Reviewed-by: NSage Weil <sage@redhat.com>

792c3a91

libceph: ceph-msgr workqueue needs a resque worker · f9865f06

由 Ilya Dryomov 提交于 10月 10, 2014

Commit f363e45f ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq.  This is
wrong - libceph is very much a memory reclaim path, so restore it.

Cc: stable@vger.kernel.org # needs backporting for < 3.12
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Tested-by: NMicha Krause <micha@krausam.de>
Reviewed-by: NSage Weil <sage@redhat.com>

f9865f06

ceph: fix bool assignments · ab6c2c3e

由 Fabian Frederick 提交于 10月 09, 2014

Fix some coccinelle warnings:
fs/ceph/caps.c:2400:6-10: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2401:6-15: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2402:6-17: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2403:6-22: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2404:6-22: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2405:6-19: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2440:4-20: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2469:3-16: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2490:2-18: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2519:3-7: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2549:3-12: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2575:2-6: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2589:3-7: WARNING: Assignment of bool to 0/1
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

ab6c2c3e

libceph: separate multiple ops with commas in debugfs output · 25f89777

由 Ilya Dryomov 提交于 10月 06, 2014

For requests with multiple ops, separate ops with commas instead of \t,
which is a field separator here.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

25f89777

libceph: sync osd op definitions in rados.h · 70b5bfa3

由 Ilya Dryomov 提交于 10月 02, 2014

Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

70b5bfa3

libceph: remove redundant declaration · eb179d39

由 Fabian Frederick 提交于 9月 30, 2014

ceph_release_page_vector was defined twice in libceph.h
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

eb179d39

ceph: additional debugfs output · 14ed9703

由 John Spray 提交于 9月 12, 2014

MDS session state and client global ID is
useful instrumentation when testing.
Signed-off-by: NJohn Spray <john.spray@redhat.com>

14ed9703

ceph: export ceph_session_state_name function · a687ecaf

由 John Spray 提交于 9月 19, 2014

...so that it can be used from the ceph debugfs
code when dumping session info.
Signed-off-by: NJohn Spray <john.spray@redhat.com>

a687ecaf

ceph: include the initial ACL in create/mkdir/mknod MDS requests · b1ee94aa

由 Yan, Zheng 提交于 9月 16, 2014

Current code set new file/directory's initial ACL in a non-atomic
manner.
Client first sends request to MDS to create new file/directory, then set
the initial ACL after the new file/directory is successfully created.

The fix is include the initial ACL in create/mkdir/mknod MDS requests.
So MDS can handle creating file/directory and setting the initial ACL in
one request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b1ee94aa

ceph: use pagelist to present MDS request data · 25e6bae3

由 Yan, Zheng 提交于 9月 16, 2014

Current code uses page array to present MDS request data. Pages in the
array are allocated/freed by caller of ceph_mdsc_do_request(). If request
is interrupted, the pages can be freed while they are still being used by
the request message.

The fix is use pagelist to present MDS request data. Pagelist is
reference counted.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

25e6bae3

libceph: reference counting pagelist · e4339d28

由 Yan, Zheng 提交于 9月 16, 2014

this allow pagelist to present data that may be sent multiple times.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

e4339d28

ceph: fix llistxattr on symlink · 0abb43dc

由 Yan, Zheng 提交于 9月 18, 2014

only regular file and directory have vxattrs.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0abb43dc

ceph: send client metadata to MDS · dbd0c8bf

由 John Spray 提交于 9月 09, 2014

Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax,
which includes additional client metadata to allow
the MDS to report on clients by user-sensible names
like hostname.
Signed-off-by: NJohn Spray <john.spray@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

dbd0c8bf

ceph: remove redundant code for max file size verification · a4483e8a

由 Chao Yu 提交于 9月 17, 2014

Both ceph_update_writeable_page and ceph_setattr will verify file size
with max size ceph supported.
There are two caller for ceph_update_writeable_page, ceph_write_begin and
ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in
generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no
chance to change file size when mmap. Likewise we have already verified the size
in inode_change_ok when we call ceph_setattr.
So let's remove the redundant code for max file size verification.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

a4483e8a

ceph: remove redundant io_iter_advance() · 3b70b388

由 Yan, Zheng 提交于 9月 17, 2014

ceph_sync_read and generic_file_read_iter() have already advanced the
IO iterator.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3b70b388

ceph: move ceph_find_inode() outside the s_mutex · 6cd3bcad

由 Yan, Zheng 提交于 9月 17, 2014

ceph_find_inode() may wait on freeing inode, using it inside the s_mutex
may cause deadlock. (the freeing inode is waiting for OSD read reply, but
dispatch thread is blocked by the s_mutex)
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

6cd3bcad

ceph: request xattrs if xattr_version is zero · 508b32d8

由 Yan, Zheng 提交于 9月 16, 2014

Following sequence of events can happen.
  - Client releases an inode, queues cap release message.
  - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

The fix is force sending a getattr request to MDS if xattrs_version
is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
does not have xattr.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

508b32d8

rbd: set the remaining discard properties to enable support · b76f8239

由 Josh Durgin 提交于 4月 07, 2014

max_discard_sectors must be set for the queue to support discard.
Operations implementing discard for rbd zero data, so report that.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

b76f8239

rbd: use helpers to handle discard for layered images correctly · d3246fb0

由 Josh Durgin 提交于 4月 07, 2014

Only allocate two osd ops for discard requests, since the
preallocation hint is only added for regular writes.  Use
rbd_img_obj_request_fill() to recreate the original write or discard
osd operations, isolating that logic to one place, and change the
assert in rbd_osd_req_create_copyup() to accept discard requests as
well.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

d3246fb0

rbd: extract a method for adding object operations · 3b434a2a

由 Josh Durgin 提交于 4月 04, 2014

rbd_img_request_fill() creates a ceph_osd_request and has logic for
adding the appropriate osd ops to it based on the request type and
image properties.

For layered images, the original rbd_obj_request is resent with a
copyup operation in front, using a new ceph_osd_request. The logic for
adding the original operations should be the same as when first
sending them, so move it to a helper function.

op_type only needs to be checked once, so create a helper for that as
well and call it outside the loop in rbd_img_request_fill().
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

3b434a2a

rbd: make discard trigger copy-on-write · 1c220881

由 Josh Durgin 提交于 4月 04, 2014

Discard requests are a form of write, so they should go through the
same process as plain write requests and trigger copy-on-write for
layered images.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

1c220881

rbd: tolerate -ENOENT for discard operations · d0265de7

由 Josh Durgin 提交于 4月 07, 2014

Discard may try to delete an object from a non-layered image that does not exist.
If this occurs, the image already has no data in that range, so change the
result to success.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

d0265de7

rbd: fix snapshot context reference count for discards · bef95455

由 Josh Durgin 提交于 4月 04, 2014

Discards take a reference to the snapshot context of an image when
they are created.  This reference needs to be cleaned up when the
request is done just as it is for regular writes.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

bef95455

rbd: read image size for discard check safely · 3c5df893

由 Josh Durgin 提交于 4月 04, 2014

In rbd_img_request_fill() the image size is only checked to determine
whether we can truncate an object instead of zeroing it for discard
requests. Take rbd_dev->header_rwsem while reading the image size, and
move this read into the discard check, so that non-discard ops don't
need to take the semaphore in this function.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

3c5df893

rbd: initial discard bits from Guangliang Zhao · 90e98c52

由 Guangliang Zhao 提交于 4月 01, 2014

This patch add the discard support for rbd driver.

There are three types operation in the driver:
1. The objects would be removed if they completely contained
   within the discard range.
2. The objects would be truncated if they partly contained within
   the discard range, and align with their boundary.
3. Others would be zeroed.

A discard request from blkdev_issue_discard() is defined which
REQ_WRITE and REQ_DISCARD both marked and no data, so we must
check the REQ_DISCARD first when getting the request type.

This resolve:
	http://tracker.ceph.com/issues/190

[ Ilya Dryomov: This is incomplete and somewhat buggy, see follow up
  commits by Josh Durgin for refinements and fixes which weren't
  folded in to preserve authorship. ]
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

90e98c52

rbd: extend the operation type · 6d2940c8

由 Guangliang Zhao 提交于 3月 13, 2014

It could only handle the read and write operations now,
extend it for the coming discard support.
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

6d2940c8

rbd: skip the copyup when an entire object writing · c622d226

由 Guangliang Zhao 提交于 4月 01, 2014

It need to copyup the parent's content when layered writing,
but an entire object write would overwrite it, so skip it.
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

c622d226

rbd: add img_obj_request_simple() helper · 70d045f6

由 Ilya Dryomov 提交于 9月 12, 2014

To clarify the conditions and make it easier to add new ones.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

70d045f6

rbd: access snapshot context and mapping size safely · 4e752f0a

由 Josh Durgin 提交于 4月 08, 2014

These fields may both change while the image is mapped if a snapshot
is created or deleted or the image is resized. They are guarded by
rbd_dev->header_rwsem, so hold that while reading them, and store a
local copy to refer to outside of the critical section. The local copy
will stay consistent since the snapshot context is reference counted,
and the mapping size is just a u64. This prevents torn loads from
giving us inconsistent values.

Move reading header.snapc into the caller of rbd_img_request_create()
so that we only need to take the semaphore once. The read-only caller,
rbd_parent_request_create() can just pass NULL for snapc, since the
snapshot context is only relevant for writes.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

4e752f0a

rbd: do not return -ERANGE on auth failures · 7dd440c9

由 Ilya Dryomov 提交于 9月 11, 2014

Trying to map an image out of a pool for which we don't have an 'x'
permission bit fails with -ERANGE from ceph_extract_encoded_string()
due to an unsigned vs signed bug.  Fix it and get rid of the -EINVAL
sink, thus propagating rbd::get_id cls method errors.  (I've seen
a bunch of unexplained -ERANGE reports, I bet this is it).
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7dd440c9

libceph: don't try checking queue_work() return value · 91883cd2

由 Ilya Dryomov 提交于 9月 11, 2014

queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it.  So don't bother at all.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

91883cd2

ceph: make sure request isn't in any waiting list when kicking request. · 03974e81

由 Yan, Zheng 提交于 9月 11, 2014

we may corrupt waiting list if a request in the waiting list is kicked.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

03974e81

Y
ceph: protect kick_requests() with mdsc->mutex · 656e4382
由 Yan, Zheng 提交于 9月 11, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
656e4382

libceph: Convert pr_warning to pr_warn · b9a67899

由 Joe Perches 提交于 9月 09, 2014

Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

b9a67899

ceph: trim unused inodes before reconnecting to recovering MDS · 5d23371f

由 Yan, Zheng 提交于 9月 10, 2014

So the recovering MDS does not need to fetch these ununsed inodes during
cache rejoin. This may reduce MDS recovery time.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5d23371f

libceph: fix a use after free issue in osdmap_set_max_osd · 589506f1

由 Li RongQing 提交于 9月 07, 2014

If the state variable is krealloced successfully, map->osd_state will be
freed, once following two reallocation failed, and exit the function
without resetting map->osd_state, map->osd_state become a wild pointer.

fix it by resetting them after krealloc successfully.
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

589506f1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功