1. 15 10月, 2014 1 次提交
    • C
      ceph: remove redundant code for max file size verification · a4483e8a
      Chao Yu 提交于
      Both ceph_update_writeable_page and ceph_setattr will verify file size
      with max size ceph supported.
      There are two caller for ceph_update_writeable_page, ceph_write_begin and
      ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in
      generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no
      chance to change file size when mmap. Likewise we have already verified the size
      in inode_change_ok when we call ceph_setattr.
      So let's remove the redundant code for max file size verification.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      a4483e8a
  2. 07 6月, 2014 1 次提交
  3. 06 6月, 2014 1 次提交
  4. 07 5月, 2014 1 次提交
  5. 29 1月, 2014 1 次提交
  6. 01 1月, 2014 2 次提交
  7. 14 12月, 2013 2 次提交
  8. 24 11月, 2013 1 次提交
  9. 07 9月, 2013 3 次提交
    • M
      ceph: page still marked private_2 · d4d3aa38
      Milosz Tanski 提交于
      Previous patch that allowed us to cleanup most of the issues with pages marked
      as private_2 when calling ceph_readpages. However, there seams to be a case in
      the error case clean up in start read that still trigers this from time to
      time. I've only seen this one a couple times.
      
      BUG: Bad page state in process petabucket  pfn:335b82
      page:ffffea000cd6e080 count:0 mapcount:0 mapping:          (null) index:0x0
      page flags: 0x200000000001000(private_2)
      Call Trace:
       [<ffffffff81563442>] dump_stack+0x46/0x58
       [<ffffffff8112c7f7>] bad_page+0xc7/0x120
       [<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
       [<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
       [<ffffffff81132427>] __put_single_page+0x27/0x30
       [<ffffffff81132d95>] put_page+0x25/0x40
       [<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
       [<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      d4d3aa38
    • M
      ceph: clean PgPrivate2 on returning from readpages · 76be778b
      Milosz Tanski 提交于
      In some cases the ceph readapages code code bails without filling all the pages
      already marked by fscache. When we return back to readahead code this causes
      a BUG.
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      76be778b
    • M
      ceph: use fscache as a local presisent cache · 99ccbd22
      Milosz Tanski 提交于
      Adding support for fscache to the Ceph filesystem. This would bring it to on
      par with some of the other network filesystems in Linux (like NFS, AFS, etc...)
      
      In order to mount the filesystem with fscache the 'fsc' mount option must be
      passed.
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      99ccbd22
  10. 28 8月, 2013 1 次提交
    • S
      ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem · 7d6e1f54
      Sha Zhengju 提交于
      Following we will begin to add memcg dirty page accounting around
      __set_page_dirty_{buffers,nobuffers} in vfs layer, so we'd better use vfs interface to
      avoid exporting those details to filesystems.
      
      Since vfs set_page_dirty() should be called under page lock, here we don't need elaborate
      codes to handle racy anymore, and two WARN_ON() are added to detect such exceptions.
      Thanks very much for Sage and Yan Zheng's coaching!
      
      I tested it in a two server's ceph environment that one is client and the other is
      mds/osd/mon, and run the following fsx test from xfstests:
      
        ./fsx   1MB -N 50000 -p 10000 -l 1048576
        ./fsx  10MB -N 50000 -p 10000 -l 10485760
        ./fsx 100MB -N 50000 -p 10000 -l 104857600
      
      The fsx does lots of mmap-read/mmap-write/truncate operations and the tests completed
      successfully without triggering any of WARN_ON.
      Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      7d6e1f54
  11. 16 8月, 2013 1 次提交
  12. 10 8月, 2013 1 次提交
  13. 04 7月, 2013 2 次提交
  14. 22 5月, 2013 2 次提交
    • L
      ceph: use ->invalidatepage() length argument · 569d39fc
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in ceph_invalidatepage().
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Acked-by: NSage Weil <sage@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      569d39fc
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  15. 02 5月, 2013 20 次提交
    • A
      libceph: kill off osd data write_request parameters · 406e2c9f
      Alex Elder 提交于
      In the incremental move toward supporting distinct data items in an
      osd request some of the functions had "write_request" parameters to
      indicate, basically, whether the data belonged to in_data or the
      out_data.  Now that we maintain the data fields in the op structure
      there is no need to indicate the direction, so get rid of the
      "write_request" parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      406e2c9f
    • Y
      ceph: fix race between writepages and truncate · 1ac0fc8a
      Yan, Zheng 提交于
      ceph_writepages_start() reads inode->i_size in two places. It can get
      different values between successive read, because truncate can change
      inode->i_size at any time. The race can lead to mismatch between data
      length of osd request and pages marked as writeback. When osd request
      finishes, it clear writeback page according to its data length. So
      some pages can be left in writeback state forever. The fix is only
      read inode->i_size once, save its value to a local variable and use
      the local variable when i_size is needed.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      1ac0fc8a
    • A
      libceph: combine initializing and setting osd data · a4ce40a9
      Alex Elder 提交于
      This ends up being a rather large patch but what it's doing is
      somewhat straightforward.
      
      Basically, this is replacing two calls with one.  The first of the
      two calls is initializing a struct ceph_osd_data with data (either a
      page array, a page list, or a bio list); the second is setting an
      osd request op so it associates that data with one of the op's
      parameters.  In place of those two will be a single function that
      initializes the op directly.
      
      That means we sort of fan out a set of the needed functions:
          - extent ops with pages data
          - extent ops with pagelist data
          - extent ops with bio list data
      and
          - class ops with page data for receiving a response
      
      We also have define another one, but it's only used internally:
          - class ops with pagelist data for request parameters
      
      Note that we *still* haven't gotten rid of the osd request's
      r_data_in and r_data_out fields.  All the osd ops refer to them for
      their data.  For now, these data fields are pointers assigned to the
      appropriate r_data_* field when these new functions are called.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a4ce40a9
    • A
      libceph: specify osd op by index in request · c99d2d4a
      Alex Elder 提交于
      An osd request now holds all of its source op structures, and every
      place that initializes one of these is in fact initializing one
      of the entries in the the osd request's array.
      
      So rather than supplying the address of the op to initialize, have
      caller specify the osd request and an indication of which op it
      would like to initialize.  This better hides the details the
      op structure (and faciltates moving the data pointers they use).
      
      Since osd_req_op_init() is a common routine, and it's not used
      outside the osd client code, give it static scope.  Also make
      it return the address of the specified op (so all the other
      init routines don't have to repeat that code).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c99d2d4a
    • A
      libceph: add data pointers in osd op structures · 8c042b0d
      Alex Elder 提交于
      An extent type osd operation currently implies that there will
      be corresponding data supplied in the data portion of the request
      (for write) or response (for read) message.  Similarly, an osd class
      method operation implies a data item will be supplied to receive
      the response data from the operation.
      
      Add a ceph_osd_data pointer to each of those structures, and assign
      it to point to eithre the incoming or the outgoing data structure in
      the osd message.  The data is not always available when an op is
      initially set up, so add two new functions to allow setting them
      after the op has been initialized.
      
      Begin to make use of the data item pointer available in the osd
      operation rather than the request data in or out structure in
      places where it's convenient.  Add some assertions to verify
      pointers are always set the way they're expected to be.
      
      This is a sort of stepping stone toward really moving the data
      into the osd request ops, to allow for some validation before
      making that jump.
      
      This is the first in a series of patches that resolve:
          http://tracker.ceph.com/issues/4657Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8c042b0d
    • A
      libceph: keep source rather than message osd op array · 79528734
      Alex Elder 提交于
      An osd request keeps a pointer to the osd operations (ops) array
      that it builds in its request message.
      
      In order to allow each op in the array to have its own distinct
      data, we will need to keep track of each op's data, and that
      information does not go over the wire.
      
      As long as we're tracking the data we might as well just track the
      entire (source) op definition for each of the ops.  And if we're
      doing that, we'll have no more need to keep a pointer to the
      wire-encoded version.
      
      This patch makes the array of source ops be kept with the osd
      request structure, and uses that instead of the version encoded in
      the message in places where that was previously used.  The array
      will be embedded in the request structure, and the maximum number of
      ops we ever actually use is currently 2.  So reduce CEPH_OSD_MAX_OP
      to 2 to reduce the size of the structure.
      
      The result of doing this sort of ripples back up, and as a result
      various function parameters and local variables become unnecessary.
      
      Make r_num_ops be unsigned, and move the definition of struct
      ceph_osd_req_op earlier to ensure it's defined where needed.
      
      It does not yet add per-op data, that's coming soon.
      
      This resolves:
          http://tracker.ceph.com/issues/4656Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      79528734
    • A
      libceph: a few more osd data cleanups · 87060c10
      Alex Elder 提交于
      These are very small changes that make use osd_data local pointers
      as shorthands for structures being operated on.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      87060c10
    • A
      libceph: define osd data initialization helpers · 43bfe5de
      Alex Elder 提交于
      Define and use functions that encapsulate the initializion of a
      ceph_osd_data structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      43bfe5de
    • A
      ceph: build osd request message later for writepages · e5975c7c
      Alex Elder 提交于
      Hold off building the osd request message in ceph_writepages_start()
      until just before it will be submitted to the osd client for
      execution.
      
      We'll still create the request and allocate the page pointer array
      after we learn we have at least one page to write.  A local variable
      will be used to keep track of the allocated array of pages.  Wait
      until just before submitting the request for assigning that page
      array pointer to the request message.
      
      Create ands use a new function osd_req_op_extent_update() whose
      purpose is to serve this one spot where the length value supplied
      when an osd request's op was initially formatted might need to get
      changed (reduced, never increased) before submitting the request.
      
      Previously, ceph_writepages_start() assigned the message header's
      data length because of this update.  That's no longer necessary,
      because ceph_osdc_build_request() will recalculate the right
      value to use based on the content of the ops in the request.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e5975c7c
    • A
      libceph: hold off building osd request · 02ee07d3
      Alex Elder 提交于
      Defer building the osd request until just before submitting it in
      all callers except ceph_writepages_start().  (That caller will be
      handed in the next patch.)
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      02ee07d3
    • A
      ceph: kill ceph alloc_page_vec() · 88486957
      Alex Elder 提交于
      There is a helper function alloc_page_vec() that, despite its
      generic sounding name depends heavily on an osd request structure
      being populated with certain information.
      
      There is only one place this function is used, and it ends up
      being a bit simpler to just open code what it does, so get
      rid of the helper.
      
      The real motivation for this is deferring building the of the osd
      request message, and this is a step in that direction.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      88486957
    • A
      ceph: define ceph_writepages_osd_request() · 94fe8420
      Alex Elder 提交于
      Mostly for readability, define ceph_writepages_osd_request() and
      use it to allocate the osd request for ceph_writepages_start().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      94fe8420
    • A
      libceph: don't build request in ceph_osdc_new_request() · acead002
      Alex Elder 提交于
      This patch moves the call to ceph_osdc_build_request() out of
      ceph_osdc_new_request() and into its caller.
      
      This is in order to defer formatting osd operation information into
      the request message until just before request is started.
      
      The only unusual (ab)user of ceph_osdc_build_request() is
      ceph_writepages_start(), where the final length of write request may
      change (downward) based on the current inode size or the oldest
      snapshot context with dirty data for the inode.
      
      The remaining callers don't change anything in the request after has
      been built.
      
      This means the ops array is now supplied by the caller.  It also
      means there is no need to pass the mtime to ceph_osdc_new_request()
      (it gets provided to ceph_osdc_build_request()).  And rather than
      passing a do_sync flag, have the number of ops in the ops array
      supplied imply adding a second STARTSYNC operation after the READ or
      WRITE requested.
      
      This and some of the patches that follow are related to having the
      messenger (only) be responsible for filling the content of the
      message header, as described here:
          http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      acead002
    • A
      ceph: use page_offset() in ceph_writepages_start() · 25d71cb9
      Alex Elder 提交于
      There's one spot in ceph_writepages_start() that open-codes what
      page_offset() does safely.  Use the macro so we don't have to worry
      about wrapping.
      
      This resolves:
          http://tracker.ceph.com/issues/4648Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      25d71cb9
    • A
      libceph: record byte count not page count · e0c59487
      Alex Elder 提交于
      Record the byte count for an osd request rather than the page count.
      The number of pages can always be derived from the byte count (and
      alignment/offset) but the reverse is not true.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e0c59487
    • A
      libceph: separate read and write data · 0fff87ec
      Alex Elder 提交于
      An osd request defines information about where data to be read
      should be placed as well as where data to write comes from.
      Currently these are represented by common fields.
      
      Keep information about data for writing separate from data to be
      read by splitting these into data_in and data_out fields.
      
      This is the key patch in this whole series, in that it actually
      identifies which osd requests generate outgoing data and which
      generate incoming data.  It's less obvious (currently) that an osd
      CALL op generates both outgoing and incoming data; that's the focus
      of some upcoming work.
      
      This resolves:
          http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0fff87ec
    • A
      libceph: distinguish page and bio requests · 2ac2b7a6
      Alex Elder 提交于
      An osd request uses either pages or a bio list for its data.  Use a
      union to record information about the two, and add a data type
      tag to select between them.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2ac2b7a6
    • A
      libceph: separate osd request data info · 2794a82a
      Alex Elder 提交于
      Pull the fields in an osd request structure that define the data for
      the request out into a separate structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2794a82a
    • A
      libceph: don't assign page info in ceph_osdc_new_request() · 153e5167
      Alex Elder 提交于
      Currently ceph_osdc_new_request() assigns an osd request's
      r_num_pages and r_alignment fields.  The only thing it does
      after that is call ceph_osdc_build_request(), and that doesn't
      need those fields to be assigned.
      
      Move the assignment of those fields out of ceph_osdc_new_request()
      and into its caller.  As a result, the page_align parameter is no
      longer used, so get rid of it.
      
      Note that in ceph_sync_write(), the value for req->r_num_pages had
      already been calculated earlier (as num_pages, and fortunately
      it was computed the same way).  So don't bother recomputing it,
      but because it's not needed earlier, move that calculation after the
      call to ceph_osdc_new_request().  Hold off making the assignment to
      r_alignment, doing it instead r_pages and r_num_pages are
      getting set.
      
      Similarly, in start_read(), nr_pages already holds the number of
      pages in the array (and is calculated the same way), so there's no
      need to recompute it.  Move the assignment of the page alignment
      down with the others there as well.
      
      This and the next few patches are preparation work for:
          http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      153e5167
    • A
      ceph: use calc_pages_for() in start_read() · cf7b7e14
      Alex Elder 提交于
      There's a spot that computes the number of pages to allocate for a
      page-aligned length by just shifting it.  Use calc_pages_for()
      instead, to be consistent with usage everywhere else.  The result
      is the same.
      
      The reason for this is to make it clearer in an upcoming patch that
      this calculation is duplicated.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      cf7b7e14