1. 04 5月, 2017 2 次提交
    • Y
      ceph: choose readdir frag based on previous readdir reply · b50c2de5
      Yan, Zheng 提交于
      The dirfragtree is lazily updated, it's not always accurate. Infinite
      loops happens in following circumstance.
      
      - client send request to read frag A
      - frag A has been fragmented into frag B and C. So mds fills the reply
        with contents of frag B
      - client wants to read next frag C. ceph_choose_frag(frag value of C)
        return frag A.
      
      The fix is using previous readdir reply to calculate next readdir frag
      when possible.
      Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      b50c2de5
    • Y
      ceph: make seeky readdir more efficient · 79162547
      Yan, Zheng 提交于
      Current cephfs client uses string to indicate start position of
      readdir. The string is last entry of previous readdir reply.
      This approach does not work for seeky readdir because we can
      not easily convert the new postion to a string. For seeky readdir,
      mds needs to return dentries from the beginning. Client keeps
      retrying if the reply does not contain the dentry it wants.
      
      In current version of ceph, mds sorts CDentry in its cache in
      hash order. Client also uses dentry hash to compose dir postion.
      For seeky readdir, if client passes the hash part of dir postion
      to mds. mds can avoid replying useless dentries.
      Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      79162547
  2. 20 2月, 2017 4 次提交
  3. 19 1月, 2017 1 次提交
  4. 08 12月, 2016 1 次提交
    • J
      ceph: don't set req->r_locked_dir in ceph_d_revalidate · c3f4688a
      Jeff Layton 提交于
      This function sets req->r_locked_dir which is supposed to indicate to
      ceph_fill_trace that the parent's i_rwsem is locked for write.
      Unfortunately, there is no guarantee that the dir will be locked when
      d_revalidate is called, so we really don't want ceph_fill_trace to do
      any dcache manipulation from this context. Clear req->r_locked_dir since
      it's clearly not safe to do that.
      
      What we really want to know with d_revalidate is whether the dentry
      still points to the same inode. ceph_fill_trace installs a pointer to
      the inode in req->r_target_inode, so we can just compare that to
      d_inode(dentry) to see if it's the same one after the lookup.
      
      Also, since we aren't generally interested in the parent here, we can
      switch to using a GETATTR to hint that to the MDS, which also means that
      we only need to reserve one cap.
      
      Finally, just remove the d_unhashed check. That's really outside the
      purview of a filesystem's d_revalidate. If the thing became unhashed
      while we're checking it, then that's up to the VFS to handle anyway.
      
      Fixes: 200fd27c ("ceph: use lookup request to revalidate dentry")
      Link: http://tracker.ceph.com/issues/18041Reported-by: NDonatas Abraitis <donatas.abraitis@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c3f4688a
  5. 29 10月, 2016 2 次提交
  6. 08 10月, 2016 1 次提交
  7. 27 9月, 2016 2 次提交
    • M
      fs: rename "rename2" i_op to "rename" · 2773bf00
      Miklos Szeredi 提交于
      Generated patch:
      
      sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
      sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2773bf00
    • M
      fs: make remaining filesystems use .rename2 · 1cd66c93
      Miklos Szeredi 提交于
      This is trivial to do:
      
       - add flags argument to foo_rename()
       - check if flags is zero
       - assign foo_rename() to .rename2 instead of .rename
      
      This doesn't mean it's impossible to support RENAME_NOREPLACE for these
      filesystems, but it is not trivial, like for local filesystems.
      RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
      for a file to be created on one host while it is overwritten by rename on
      another host).
      
      Filesystems converted:
      
      9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
      
      After this, we can get rid of the duplicate interfaces for rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: David Howells <dhowells@redhat.com> [AFS]
      Acked-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      1cd66c93
  8. 05 9月, 2016 1 次提交
    • N
      ceph: do not modify fi->frag in need_reset_readdir() · 0f5aa88a
      Nicolas Iooss 提交于
      Commit f3c4ebe6 ("ceph: using hash value to compose dentry offset")
      modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
      fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
      comparison operator with an assignment one.
      
      This looks like a typo which is reported by clang when building the
      kernel with some warning flags:
      
          fs/ceph/dir.c:600:22: error: using the result of an assignment as a
          condition without parentheses [-Werror,-Wparentheses]
                  } else if (fi->frag |= fpos_frag(new_pos)) {
                             ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
          fs/ceph/dir.c:600:22: note: place parentheses around the assignment
          to silence this warning
                  } else if (fi->frag |= fpos_frag(new_pos)) {
                                      ^
                             (                             )
          fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
          assignment into an inequality comparison
                  } else if (fi->frag |= fpos_frag(new_pos)) {
                                      ^~
                                      !=
      
      Fixes: f3c4ebe6 ("ceph: using hash value to compose dentry offset")
      Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      0f5aa88a
  9. 28 7月, 2016 4 次提交
  10. 26 5月, 2016 9 次提交
  11. 24 4月, 2016 1 次提交
    • A
      ceph: Switch to generic xattr handlers · 2cdeb1e4
      Andreas Gruenbacher 提交于
      Add a catch-all xattr handler at the end of ceph_xattr_handlers.  Check
      for valid attribute names there, and remove those checks from
      __ceph_{get,set,remove}xattr instead.  No "system.*" xattrs need to be
      handled by the catch-all handler anymore.
      
      The set xattr handler is called with a NULL value to indicate that the
      attribute should be removed; __ceph_setxattr already handles that case
      correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL
      value).
      
      Move the check for snapshots from ceph_{set,remove}xattr into
      __ceph_{set,remove}xattr.  With that, ceph_{get,set,remove}xattr can be
      replaced with the generic iops.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2cdeb1e4
  12. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  13. 26 3月, 2016 4 次提交
    • G
      ceph: use kmem_cache_zalloc · 99ec2697
      Geliang Tang 提交于
      Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      99ec2697
    • Y
      ceph: use lookup request to revalidate dentry · 200fd27c
      Yan, Zheng 提交于
      If dentry has no lease, ceph_d_revalidate() previously return 0.
      This causes VFS to invalidate the dentry and create a new dentry
      for later lookup. Invalidating a dentry also detach any underneath
      mount points. So mount point inside cephfs can disapear mystically
      (even the mount point is not modified by other hosts).
      
      The fix is using lookup request to revalidate dentry without lease.
      This can partly solve the mount points disapear issue (as long as
      the mount point is not modified by other hosts)
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      200fd27c
    • Y
      ceph: kill ceph_get_dentry_parent_inode() · 641235d8
      Yan, Zheng 提交于
      use vfs helper dget_parent() instead
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      641235d8
    • Y
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng 提交于
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      315f2408
  14. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  15. 25 6月, 2015 5 次提交
    • Y
      ceph: rework dcache readdir · fdd4e158
      Yan, Zheng 提交于
      Previously our dcache readdir code relies on that child dentries in
      directory dentry's d_subdir list are sorted by dentry's offset in
      descending order. When adding dentries to the dcache, if a dentry
      already exists, our readdir code moves it to head of directory
      dentry's d_subdir list. This design relies on dcache internals.
      Al Viro suggests using ncpfs's approach: keeping array of pointers
      to dentries in page cache of directory inode. the validity of those
      pointers are presented by directory inode's complete and ordered
      flags. When a dentry gets pruned, we clear directory inode's complete
      flag in the d_prune() callback. Before moving a dentry to other
      directory, we clear the ordered flag for both old and new directory.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fdd4e158
    • Y
      ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL · 687265e5
      Yan, Zheng 提交于
      GFP_NOFS memory allocation is required for page writeback path.
      But there is no need to use GFP_NOFS in syscall path and readpage
      path
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      687265e5
    • Y
      ceph: fix directory fsync · da819c81
      Yan, Zheng 提交于
      fsync() on directory should flush dirty caps and wait for any
      uncommitted directory opertions to commit. But ceph_dir_fsync()
      only waits for uncommitted directory opertions.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      da819c81
    • I
      ceph: simplify two mount_timeout sites · 5be73034
      Ilya Dryomov 提交于
      No need to bifurcate wait now that we've got ceph_timeout_jiffies().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      5be73034
    • I
      libceph: store timeouts in jiffies, verify user input · a319bf56
      Ilya Dryomov 提交于
      There are currently three libceph-level timeouts that the user can
      specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
      these are in seconds and no checking is done on user input: negative
      values are accepted, we multiply them all by HZ which may or may not
      overflow, arbitrarily large jiffies then get added together, etc.
      
      There is also a bug in the way mount_timeout=0 is handled.  It's
      supposed to mean "infinite timeout", but that's not how wait.h APIs
      treat it and so __ceph_open_session() for example will busy loop
      without much chance of being interrupted if none of ceph-mons are
      there.
      
      Fix all this by verifying user input, storing timeouts capped by
      msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
      helper for all user-specified waits to handle infinite timeouts
      correctly.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      a319bf56
  16. 24 6月, 2015 1 次提交