提交 · acead002b200569273bed331c93c4a91d25e10b8 · openeuler / Kernel

02 5月, 2013 9 次提交

libceph: don't build request in ceph_osdc_new_request() · acead002

由 Alex Elder 提交于 3月 14, 2013

This patch moves the call to ceph_osdc_build_request() out of
ceph_osdc_new_request() and into its caller.

This is in order to defer formatting osd operation information into
the request message until just before request is started.

The only unusual (ab)user of ceph_osdc_build_request() is
ceph_writepages_start(), where the final length of write request may
change (downward) based on the current inode size or the oldest
snapshot context with dirty data for the inode.

The remaining callers don't change anything in the request after has
been built.

This means the ops array is now supplied by the caller.  It also
means there is no need to pass the mtime to ceph_osdc_new_request()
(it gets provided to ceph_osdc_build_request()).  And rather than
passing a do_sync flag, have the number of ops in the ops array
supplied imply adding a second STARTSYNC operation after the READ or
WRITE requested.

This and some of the patches that follow are related to having the
messenger (only) be responsible for filling the content of the
message header, as described here:
    http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

acead002

ceph: use page_offset() in ceph_writepages_start() · 25d71cb9

由 Alex Elder 提交于 4月 03, 2013

There's one spot in ceph_writepages_start() that open-codes what
page_offset() does safely.  Use the macro so we don't have to worry
about wrapping.

This resolves:
    http://tracker.ceph.com/issues/4648Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

25d71cb9

libceph: record byte count not page count · e0c59487

由 Alex Elder 提交于 3月 07, 2013

Record the byte count for an osd request rather than the page count.
The number of pages can always be derived from the byte count (and
alignment/offset) but the reverse is not true.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e0c59487

libceph: separate read and write data · 0fff87ec

由 Alex Elder 提交于 2月 14, 2013

An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.

Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.

This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data.  It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.

This resolves:
    http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

0fff87ec

libceph: distinguish page and bio requests · 2ac2b7a6

由 Alex Elder 提交于 2月 14, 2013

An osd request uses either pages or a bio list for its data.  Use a
union to record information about the two, and add a data type
tag to select between them.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

2ac2b7a6

libceph: separate osd request data info · 2794a82a

由 Alex Elder 提交于 2月 14, 2013

Pull the fields in an osd request structure that define the data for
the request out into a separate structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

2794a82a

libceph: don't assign page info in ceph_osdc_new_request() · 153e5167

由 Alex Elder 提交于 3月 01, 2013

Currently ceph_osdc_new_request() assigns an osd request's
r_num_pages and r_alignment fields.  The only thing it does
after that is call ceph_osdc_build_request(), and that doesn't
need those fields to be assigned.

Move the assignment of those fields out of ceph_osdc_new_request()
and into its caller.  As a result, the page_align parameter is no
longer used, so get rid of it.

Note that in ceph_sync_write(), the value for req->r_num_pages had
already been calculated earlier (as num_pages, and fortunately
it was computed the same way).  So don't bother recomputing it,
but because it's not needed earlier, move that calculation after the
call to ceph_osdc_new_request().  Hold off making the assignment to
r_alignment, doing it instead r_pages and r_num_pages are
getting set.

Similarly, in start_read(), nr_pages already holds the number of
pages in the array (and is calculated the same way), so there's no
need to recompute it.  Move the assignment of the page alignment
down with the others there as well.

This and the next few patches are preparation work for:
    http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

153e5167

ceph: use calc_pages_for() in start_read() · cf7b7e14

由 Alex Elder 提交于 3月 01, 2013

There's a spot that computes the number of pages to allocate for a
page-aligned length by just shifting it.  Use calc_pages_for()
instead, to be consistent with usage everywhere else.  The result
is the same.

The reason for this is to make it clearer in an upcoming patch that
this calculation is duplicated.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

cf7b7e14

ceph: revert commit · 7971bd92

由 Sage Weil 提交于 5月 01, 2013

commit 22cddde1 breaks the atomicity of write operation, it also
introduces a deadlock between write and truncate.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

Conflicts:
	fs/ceph/addr.c

7971bd92

27 2月, 2013 1 次提交

libceph: update osd request/reply encoding · 1b83bef2

由 Sage Weil 提交于 2月 25, 2013

Use the new version of the encoding for osd requests and replies.  In the
process, update the way we are tracking request ops and reply lengths and
results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
appropriately.

The main changes are:
 - we keep pointers into the request memory for fields we need to update
   each time the request is sent out over the wire
 - we keep information about the result in an array in the request struct
   where the users can easily get at it.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

1b83bef2

23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
19 2月, 2013 4 次提交

ceph: kill ceph_osdc_new_request() "num_reply" parameter · a3bea47e

由 Alex Elder 提交于 2月 15, 2013

The "num_reply" parameter to ceph_osdc_new_request() is never
used inside that function, so get rid of it.

Note that ceph_sync_write() passes 2 for that argument, while all
other callers pass 1.  It doesn't matter, but perhaps someone should
verify this doesn't indicate a problem.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

a3bea47e

ceph: kill ceph_osdc_writepages() "flags" parameter · 24808826

由 Alex Elder 提交于 2月 15, 2013

There is only one caller of ceph_osdc_writepages(), and it always
passes 0 as its "flags" argument.  Get rid of that argument and
replace its use in ceph_osdc_writepages() with 0.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

24808826

ceph: kill ceph_osdc_writepages() "dosync" parameter · fbf8685f

由 Alex Elder 提交于 2月 15, 2013

There is only one caller of ceph_osdc_writepages(), and it always
passes 0 as its "dosync" argument.  Get rid of that argument and
replace its use in ceph_osdc_writepages() with 0.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

fbf8685f

ceph: kill ceph_osdc_writepages() "nofail" parameter · 87f979d3

由 Alex Elder 提交于 2月 15, 2013

There is only one caller of ceph_osdc_writepages(), and it always
passes the value true as its "nofail" argument.  Get rid of that
argument and replace its use in ceph_osdc_writepages() with the
constant value true.

This and a number of cleanup patches that follow resolve:
    http://tracker.ceph.com/issues/4126Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

87f979d3

13 12月, 2012 1 次提交

libceph: Unlock unprocessed pages in start_read() error path · 8884d53d

由 David Zafman 提交于 12月 03, 2012

Function start_read() can get an error before processing all pages.
It must not only release the remaining pages, but unlock them too.

This fixes http://tracker.newdream.net/issues/3370Signed-off-by: NDavid Zafman <david.zafman@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8884d53d

06 11月, 2012 1 次提交

ceph: Fix i_size update race · 22cddde1

由 Sage Weil 提交于 11月 05, 2012

ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR
dirty before data is copied to page cache and inode size is updated.
If ceph_check_caps() flushes the dirty cap before the inode size is
updated, MDS can miss the new inode size. The fix is move
ceph_{get,put}_cap_refs() into ceph_write_{begin,end}() and call
__ceph_mark_dirty_caps() after inode size is updated.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

22cddde1

09 10月, 2012 1 次提交

mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4

由 Konstantin Khlebnikov 提交于 10月 08, 2012

Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0b173bc4

03 10月, 2012 1 次提交

ceph: avoid 32-bit page index overflow · 6285bc23

由 Alex Elder 提交于 10月 02, 2012

A pgoff_t is defined (by default) to have type (unsigned long).  On
architectures such as i686 that's a 32-bit type.  The ceph address
space code was attempting to produce 64 bit offsets by shifting a
page's index by PAGE_CACHE_SHIFT, but the result was not what was
desired because the shift occurred before the result got promoted
to 64 bits.

Fix this by converting all uses of page->index used in this way to
use the page_offset() macro, which ensures the 64-bit result has the
intended value.

This fixes http://tracker.newdream.net/issues/3112Reported-by: NMohamed Pakkeer <pakkeer.mohideen@realimage.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6285bc23

02 10月, 2012 1 次提交

ceph: propagate layout error on osd request creation · 6816282d

由 Sage Weil 提交于 9月 24, 2012

If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller.  We switch up the return to have an error
code instead of NULL implying -ENOMEM.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6816282d

31 7月, 2012 1 次提交

ceph: Push file_update_time() into ceph_page_mkwrite() · 3ca9c3bd

由 Jan Kara 提交于 6月 12, 2012

CC: Sage Weil <sage@newdream.net>
CC: ceph-devel@vger.kernel.org
Acked-by: NSage Weil <sage@newdream.net>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3ca9c3bd

20 6月, 2012 1 次提交

ceph: check PG_Private flag before accessing page->private · 61600ef8

由 Yan, Zheng 提交于 5月 28, 2012

I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.
Signed-off-by: NZheng Yan <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>
(cherry picked from commit 28c0254e)

61600ef8

01 6月, 2012 1 次提交

ceph: check PG_Private flag before accessing page->private · 28c0254e

由 Yan, Zheng 提交于 5月 28, 2012

I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.
Signed-off-by: NZheng Yan <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

28c0254e

08 12月, 2011 1 次提交

ceph: use i_ceph_lock instead of i_lock · be655596

由 Sage Weil 提交于 11月 30, 2011

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().
Reported-by: NAmon Ott <a.ott@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

be655596

26 10月, 2011 3 次提交

libceph: fix double-free of page vector · 33957340

由 Sage Weil 提交于 10月 24, 2011

ceph_release_page_vector() kfrees the vector; we shouldn't do it here too.
Reported-by: NJeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: NSage Weil <sage@newdream.net>

33957340

ceph: implement (optional) max read size · 0d66a487

由 Sage Weil 提交于 8月 04, 2011

The 'rsize' mount option limits the maximum size of an individual
read(ahead) operation that is sent off to an OSD.  This is distinct from
'rasize', which controls the size of the readahead window.
Signed-off-by: NSage Weil <sage@newdream.net>

0d66a487

ceph: make readpages fully async · 7c272194

由 Sage Weil 提交于 8月 03, 2011

When we get a ->readpages() aop, submit async reads for all page ranges
in the provided page list.  Lock the pages immediately, so that VFS/MM
will block until the reads complete.
Signed-off-by: NSage Weil <sage@newdream.net>

7c272194

08 6月, 2011 1 次提交

ceph: use ihold when we already have an inode ref · 70b666c3

由 Sage Weil 提交于 5月 27, 2011

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: NSage Weil <sage@newdream.net>

70b666c3

20 5月, 2011 2 次提交

ceph: check return value for start_request in writepages · 9d6fcb08

由 Sage Weil 提交于 5月 12, 2011

Since we pass the nofail arg, we should never get an error; BUG if we do.
(And fix the function to not return an error if __map_request fails.)
Signed-off-by: NSage Weil <sage@newdream.net>

9d6fcb08

ceph: remove useless check · 6b4a3b51

由 Sage Weil 提交于 5月 12, 2011

rc is only ever 0 or negative in this method.
Signed-off-by: NSage Weil <sage@newdream.net>

6b4a3b51

04 5月, 2011 1 次提交

ceph: handle ceph_osdc_new_request failure in ceph_writepages_start · 8c71897b

由 Henry C Chang 提交于 5月 03, 2011

We should unlock the page and return -ENOMEM if ceph_osdc_new_request
failed.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

8c71897b

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

29 3月, 2011 1 次提交

fs: don't use igrab() while holding i_lock · 0444d76a

由 Dave Chinner 提交于 3月 29, 2011

Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0444d76a

10 11月, 2010 1 次提交

ceph: make page alignment explicit in osd interface · b7495fc2

由 Sage Weil 提交于 11月 09, 2010

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.
Signed-off-by: NSage Weil <sage@newdream.net>

b7495fc2

27 10月, 2010 1 次提交

writeback: remove nonblocking/encountered_congestion references · 1b430bee

由 Wu Fengguang 提交于 10月 26, 2010

This removes more dead code that was somehow missed by commit 0d99519e
(writeback: remove unused nonblocking and congestion checks).  There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion.  The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
   that would skip some dirty inodes on congestion and page out others, which
   is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b430bee

21 10月, 2010 1 次提交

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

17 9月, 2010 1 次提交

ceph: fix cap_snap and realm split · ae00d4f3

由 Sage Weil 提交于 9月 16, 2010

The cap_snap creation/queueing relies on both the current i_head_snapc
_and_ the i_snap_realm pointers being correct, so that the new cap_snap
can properly reference the old context and the new i_head_snapc can be
updated to reference the new snaprealm's context.  To fix this, we:

 - move inodes completely to the new (split) realm so that i_snap_realm
   is correct, and
 - generate the new snapc's _before_ queueing the cap_snaps in
   ceph_update_snap_trace().
Signed-off-by: NSage Weil <sage@newdream.net>

ae00d4f3

12 9月, 2010 1 次提交

ceph: fix file offset wrapping at 4GB on 32-bit archs · a77d9f7d

由 Sage Weil 提交于 9月 11, 2010

Cast the value before shifting so that we don't run out of bits with a
32-bit unsigned long.  This fixes wrapping of high file offsets into the
low 4GB of a file on disk, and the subsequent data corruption for large
files.
Signed-off-by: NSage Weil <sage@newdream.net>

a77d9f7d

25 8月, 2010 1 次提交

ceph: maintain i_head_snapc when any caps are dirty, not just for data · 7d8cb26d

由 Sage Weil 提交于 8月 24, 2010

We used to use i_head_snapc to keep track of which snapc the current epoch
of dirty data was dirtied under.  It is used by queue_cap_snap to set up
the cap_snap.  However, since we queue cap snaps for any dirty caps, not
just for dirty file data, we need to keep a valid i_head_snapc anytime
we have dirty|flushing caps.  This fixes a NULL pointer deref in
queue_cap_snap when writing back dirty caps without data (e.g.,
snaptest-authwb.sh).
Signed-off-by: NSage Weil <sage@newdream.net>

7d8cb26d

23 8月, 2010 1 次提交

mm: exporting account_page_dirty · 679ceace

由 Michael Rubin 提交于 8月 20, 2010

This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.
Signed-off-by: NMichael Rubin <mrubin@google.com>
Signed-off-by: NSage Weil <sage@newdream.net>

679ceace

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功