1. 26 3月, 2016 3 次提交
    • Y
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng 提交于
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      315f2408
    • D
      ceph: replace CURRENT_TIME by current_fs_time() · 8bbd4714
      Deepa Dinamani 提交于
      CURRENT_TIME macro is not appropriate for filesystems as it
      doesn't use the right granularity for filesystem timestamps.
      Use current_fs_time() instead.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      8bbd4714
    • Y
      ceph: remove useless BUG_ON · a587d71b
      Yan, Zheng 提交于
      ceph_osdc_start_request() never return -EOLDSNAP
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      a587d71b
  2. 05 2月, 2016 2 次提交
  3. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  4. 22 1月, 2016 3 次提交
    • Y
      ceph: use i_size_{read,write} to get/set i_size · 99c88e69
      Yan, Zheng 提交于
      Cap message from MDS can update i_size. In that case, we don't
      hold i_mutex. So it's unsafe to directly access inode->i_size
      while holding i_mutex.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      99c88e69
    • Y
      ceph: re-send AIO write request when getting -EOLDSNAP error · 5be0389d
      Yan, Zheng 提交于
      When receiving -EOLDSNAP from OSD, we need to re-send corresponding
      write request. Due to locking issue, we can send new request inside
      another OSD request's complete callback. So we use worker to re-send
      request for AIO write.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5be0389d
    • Y
      ceph: Asynchronous IO support · c8fe9b17
      Yan, Zheng 提交于
      The basic idea of AIO support is simple, just call kiocb::ki_complete()
      in OSD request's complete callback. But there are several special cases.
      
      when IO span multiple objects, we need to wait until all OSD requests
      are complete, then call kiocb::ki_complete(). Error handling in this case
      is tricky too. For simplify, AIO both span multiple objects and extends
      i_size are not allowed.
      
      Another special case is check EOF for reading (other client can write to
      the file and extend i_size concurrently). For simplify, the direct-IO/AIO
      code path does do the check, fallback to normal syn read instead.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      c8fe9b17
  5. 03 11月, 2015 1 次提交
  6. 09 9月, 2015 3 次提交
  7. 25 6月, 2015 5 次提交
    • Y
      ceph: rework dcache readdir · fdd4e158
      Yan, Zheng 提交于
      Previously our dcache readdir code relies on that child dentries in
      directory dentry's d_subdir list are sorted by dentry's offset in
      descending order. When adding dentries to the dcache, if a dentry
      already exists, our readdir code moves it to head of directory
      dentry's d_subdir list. This design relies on dcache internals.
      Al Viro suggests using ncpfs's approach: keeping array of pointers
      to dentries in page cache of directory inode. the validity of those
      pointers are presented by directory inode's complete and ordered
      flags. When a dentry gets pruned, we clear directory inode's complete
      flag in the d_prune() callback. Before moving a dentry to other
      directory, we clear the ordered flag for both old and new directory.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fdd4e158
    • Y
      ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL · 687265e5
      Yan, Zheng 提交于
      GFP_NOFS memory allocation is required for page writeback path.
      But there is no need to use GFP_NOFS in syscall path and readpage
      path
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      687265e5
    • Y
      f66fd9f0
    • Y
      ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference · 5dda377c
      Yan, Zheng 提交于
      In most cases that snap context is needed, we are holding
      reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
      i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
      and make codes get snap context from i_head_snapc. This makes
      the code simpler.
      
      Another benefit of this change is that we can handle snap
      notification more elegantly. Especially when snap context
      is updated while someone else is doing write. The old queue
      cap_snap code may set cap_snap's context to ether the old
      context or the new snap context, depending on if i_head_snapc
      is set. The new queue capp_snap code always set cap_snap's
      context to the old snap context.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5dda377c
    • Y
      libceph: allow setting osd_req_op's flags · 144cba14
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      144cba14
  8. 24 6月, 2015 1 次提交
  9. 16 4月, 2015 1 次提交
  10. 12 4月, 2015 4 次提交
  11. 26 3月, 2015 1 次提交
  12. 13 3月, 2015 1 次提交
  13. 23 2月, 2015 1 次提交
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  14. 19 2月, 2015 3 次提交
    • Y
      ceph: fix atomic_open snapdir · bf91c315
      Yan, Zheng 提交于
      ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
      and creates snapdir inode if it's -ENOENT
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      bf91c315
    • Y
      ceph: fix reading inline data when i_size > PAGE_SIZE · fcc02d2a
      Yan, Zheng 提交于
      when inode has inline data but its size > PAGE_SIZE (it was truncated
      to larger size), previous direct read code return -EIO. This patch adds
      code to return zeros for data whose offset > PAGE_SIZE.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fcc02d2a
    • Y
      ceph: properly zero data pages for file holes. · 1487a688
      Yan, Zheng 提交于
      A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls
      ceph_zero_pape_vector_range().  The first argument, page_align + read + ret,
      passed to ceph_zero_pape_vector_range() is wrong.
      
      When a file has holes, this wrong parameter may cause memory corruption
      either in kernal space or user space. Kernel space memory may be corrupted in
      the case of non direct IO; user space memory may be corrupted in the case of
      direct IO. In the latter case, the application doing direct IO may crash due
      to memory corruption, as we have experienced.
      
      The correct value should be initial_align + read + ret, where intial_align =
      o_direct ? buf_align : io_align.  Compared with page_align, the current page
      offest, initial_align is the initial page offest, which should be used to
      calculate the page and offset in ceph_zero_pape_vector_range().
      Reported-by: Ncaifeng zhu <zhucaifeng@unissoft-nj.com>
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      1487a688
  15. 21 1月, 2015 1 次提交
  16. 18 12月, 2014 4 次提交
  17. 20 11月, 2014 2 次提交
  18. 15 10月, 2014 3 次提交