1. 16 10月, 2016 1 次提交
  2. 03 10月, 2016 1 次提交
    • N
      ceph: ignore error from invalidate_inode_pages2_range() in direct write · 5d7eb1a3
      NeilBrown 提交于
      This call can fail if there are dirty pages.  The preceding call to
      filemap_write_and_wait_range() will normally remove dirty pages, but
      as inode_lock() is not held over calls to ceph_direct_read_write(), it
      could race with non-direct writes and pages could be dirtied
      immediately after filemap_write_and_wait_range() returns
      
      If there are dirty pages, they will be removed by the subsequent call
      to truncate_inode_pages_range(), so having them here is not a problem.
      
      If the 'ret' value is left holding an error, then in the async IO case
      (aio_req is not NULL) the loop that would normally call
      ceph_osdc_start_request() will see the error in 'ret' and abort all
      requests.  This doesn't seem like correct behaviour.
      
      So use separate 'ret2' instead of overloading 'ret'.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      5d7eb1a3
  3. 28 9月, 2016 1 次提交
  4. 28 7月, 2016 5 次提交
  5. 06 7月, 2016 1 次提交
  6. 01 6月, 2016 1 次提交
  7. 31 5月, 2016 1 次提交
  8. 26 5月, 2016 7 次提交
    • Y
      ceph: renew caps for read/write if mds session got killed. · 77310320
      Yan, Zheng 提交于
      When mds session gets killed, read/write operation may hang.
      Client waits for Frw caps, but mds does not know what caps client
      wants. To recover this, client sends an open request to mds. The
      request will tell mds what caps client wants.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      77310320
    • I
      libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e
      Ilya Dryomov 提交于
      If you specify ACK | ONDISK and set ->r_unsafe_callback, both
      ->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
      very confusing.  Redo this so that only one of them is called:
      
          ->r_unsafe_callback(true), on ack
          ->r_unsafe_callback(false), on commit
      
      or
      
          ->r_callback, on ack|commit
      
      Decode everything in decode_MOSDOpReply() to reduce clutter.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fe5da05e
    • I
      libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe
      Ilya Dryomov 提交于
      finish_read(), its only user, uses it to get to hdr.data_len, which is
      what ->r_result is set to on success.  This gains us the ability to
      safely call callbacks from contexts other than reply, e.g. map check.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      85e084fe
    • I
      libceph: switch to calc_target(), part 2 · bb873b53
      Ilya Dryomov 提交于
      The crux of this is getting rid of ceph_osdc_build_request(), so that
      MOSDOp can be encoded not before but after calc_target() calculates the
      actual target.  Encoding now happens within ceph_osdc_start_request().
      
      Also nuked is the accompanying bunch of pointers into the encoded
      buffer that was used to update fields on each send - instead, the
      entire front is re-encoded.  If we want to support target->name_len !=
      base->name_len in the future, there is no other way, because oid is
      surrounded by other fields in the encoded buffer.
      
      Encoding OSD ops and adding data items to the request message were
      mixed together in osd_req_encode_op().  While we want to re-encode OSD
      ops, we don't want to add duplicate data items to the message when
      resending, so all call to ceph_osdc_msg_data_add() are factored out
      into a new setup_request_data().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      bb873b53
    • I
      libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1
      Ilya Dryomov 提交于
      Introduce ceph_osd_request_target, containing all mapping-related
      fields of ceph_osd_request and calc_target() for calculating mappings
      and populating it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      63244fa1
    • I
      libceph: variable-sized ceph_object_id · d30291b9
      Ilya Dryomov 提交于
      Currently ceph_object_id can hold object names of up to 100
      (CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
      expect one - long rbd image names:
      
      - a format 1 header is named "<imgname>.rbd"
      - an object that points to a format 2 header is named "rbd_id.<imgname>"
      
      We operate on these potentially long-named objects during rbd map, and,
      for format 1 images, during header refresh.  (A format 2 header name is
      a small system-generated string.)
      
      Lift this 100 character limit by making ceph_object_id be able to point
      to an externally-allocated string.  Apart from being able to work with
      almost arbitrarily-long named objects, this allows us to reduce the
      size of ceph_object_id from >100 bytes to 64 bytes.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d30291b9
    • I
      libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16
      Ilya Dryomov 提交于
      The size of ->r_request and ->r_reply messages depends on the size of
      the object name (ceph_object_id), while the size of ceph_osd_request is
      fixed.  Move message allocation into a separate function that would
      have to be called after ceph_object_id and ceph_object_locator (which
      is also going to become variable in size with RADOS namespaces) have
      been filled in:
      
          req = ceph_osdc_alloc_request(...);
          <fill in req->r_base_oid>
          <fill in req->r_base_oloc>
          ceph_osdc_alloc_messages(req);
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      13d1ad16
  9. 02 5月, 2016 1 次提交
  10. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  11. 26 3月, 2016 4 次提交
    • G
      ceph: use kmem_cache_zalloc · 99ec2697
      Geliang Tang 提交于
      Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      99ec2697
    • Y
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng 提交于
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      315f2408
    • D
      ceph: replace CURRENT_TIME by current_fs_time() · 8bbd4714
      Deepa Dinamani 提交于
      CURRENT_TIME macro is not appropriate for filesystems as it
      doesn't use the right granularity for filesystem timestamps.
      Use current_fs_time() instead.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      8bbd4714
    • Y
      ceph: remove useless BUG_ON · a587d71b
      Yan, Zheng 提交于
      ceph_osdc_start_request() never return -EOLDSNAP
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      a587d71b
  12. 05 2月, 2016 2 次提交
  13. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  14. 22 1月, 2016 3 次提交
    • Y
      ceph: use i_size_{read,write} to get/set i_size · 99c88e69
      Yan, Zheng 提交于
      Cap message from MDS can update i_size. In that case, we don't
      hold i_mutex. So it's unsafe to directly access inode->i_size
      while holding i_mutex.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      99c88e69
    • Y
      ceph: re-send AIO write request when getting -EOLDSNAP error · 5be0389d
      Yan, Zheng 提交于
      When receiving -EOLDSNAP from OSD, we need to re-send corresponding
      write request. Due to locking issue, we can send new request inside
      another OSD request's complete callback. So we use worker to re-send
      request for AIO write.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5be0389d
    • Y
      ceph: Asynchronous IO support · c8fe9b17
      Yan, Zheng 提交于
      The basic idea of AIO support is simple, just call kiocb::ki_complete()
      in OSD request's complete callback. But there are several special cases.
      
      when IO span multiple objects, we need to wait until all OSD requests
      are complete, then call kiocb::ki_complete(). Error handling in this case
      is tricky too. For simplify, AIO both span multiple objects and extends
      i_size are not allowed.
      
      Another special case is check EOF for reading (other client can write to
      the file and extend i_size concurrently). For simplify, the direct-IO/AIO
      code path does do the check, fallback to normal syn read instead.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      c8fe9b17
  15. 03 11月, 2015 1 次提交
  16. 09 9月, 2015 3 次提交
  17. 25 6月, 2015 5 次提交
    • Y
      ceph: rework dcache readdir · fdd4e158
      Yan, Zheng 提交于
      Previously our dcache readdir code relies on that child dentries in
      directory dentry's d_subdir list are sorted by dentry's offset in
      descending order. When adding dentries to the dcache, if a dentry
      already exists, our readdir code moves it to head of directory
      dentry's d_subdir list. This design relies on dcache internals.
      Al Viro suggests using ncpfs's approach: keeping array of pointers
      to dentries in page cache of directory inode. the validity of those
      pointers are presented by directory inode's complete and ordered
      flags. When a dentry gets pruned, we clear directory inode's complete
      flag in the d_prune() callback. Before moving a dentry to other
      directory, we clear the ordered flag for both old and new directory.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fdd4e158
    • Y
      ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL · 687265e5
      Yan, Zheng 提交于
      GFP_NOFS memory allocation is required for page writeback path.
      But there is no need to use GFP_NOFS in syscall path and readpage
      path
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      687265e5
    • Y
      f66fd9f0
    • Y
      ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference · 5dda377c
      Yan, Zheng 提交于
      In most cases that snap context is needed, we are holding
      reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
      i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
      and make codes get snap context from i_head_snapc. This makes
      the code simpler.
      
      Another benefit of this change is that we can handle snap
      notification more elegantly. Especially when snap context
      is updated while someone else is doing write. The old queue
      cap_snap code may set cap_snap's context to ether the old
      context or the new snap context, depending on if i_head_snapc
      is set. The new queue capp_snap code always set cap_snap's
      context to the old snap context.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5dda377c
    • Y
      libceph: allow setting osd_req_op's flags · 144cba14
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      144cba14
  18. 24 6月, 2015 1 次提交