1. 20 7月, 2018 25 次提交
    • V
      ovl: Check redirect on index as well · 0a2d0d3f
      Vivek Goyal 提交于
      Right now we seem to check redirect only if upperdentry is found.  But it
      is possible that there is no upperdentry but later we found an index.
      
      We need to check redirect on index as well and set it in
      ovl_inode->redirect.  Otherwise link code can assume that dentry does not
      have redirect and place a new one which breaks things.  In my testing
      overlay/033 test started failing in xfstests.  Following are the details.
      
      For example do following.
      
      $ mkdir lower upper work merged
      
       - Make lower dir with 4 links.
        $ echo "foo" > lower/l0.txt
        $ ln  lower/l0.txt lower/l1.txt
        $ ln  lower/l0.txt lower/l2.txt
        $ ln  lower/l0.txt lower/l3.txt
      
       - Mount with index on and metacopy on.
      
        $ mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work,\
                              index=on,metacopy=on none merged
      
       - Link lower
      
        $ ln merged/l0.txt merged/l4.txt
          (This will metadata copy up of l0.txt and put an absolute redirect
           /l0.txt)
      
        $ echo 2 > /proc/sys/vm/drop/caches
      
        $ ls merged/l1.txt
        (Now l1.txt will be looked up.  There is no upper dentry but there is
         lower dentry and index will be found.  We don't check for redirect on
         index, hence ovl_inode->redirect will be NULL.)
      
       - Link Upper
      
        $ ln merged/l4.txt merged/l5.txt
        (Lookup of l4.txt will use inode from l1.txt lookup which is still in
         cache.  It has ovl_inode->redirect NULL, hence link will put a new
         redirect and replace /l0.txt with /l4.txt
      
       - Drop caches.
        echo 2 > /proc/sys/vm/drop_caches
      
       - List l1.txt and it returns -ESTALE
      
        $ ls merged/l0.txt
      
        (It returns stale because, we found a metacopy of l0.txt in upper and it
         has redirect l4.txt but there is no file named l4.txt in lower layer.
         So lower data copy is not found and -ESTALE is returned.)
      
      So problem here is that we did not process redirect on index.  Check
      redirect on index as well and then problem is fixed.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0a2d0d3f
    • V
      ovl: Set redirect on upper inode when it is linked · 4120fe64
      Vivek Goyal 提交于
      When we create a hardlink to a metacopy upper file, first the redirect on
      that inode.  Path based lookup will not work with newly created link and
      redirect will solve that issue.
      
      Also use absolute redirect as two hardlinks could be in different
      directores and relative redirect will not work.
      
      I have not put any additional locking around setting redirects while
      introducing redirects for non-dir files.  For now it feels like existing
      locking is sufficient.  If that's not the case, we will have add more
      locking.  Following is my rationale about why do I think current locking
      seems ok.
      
      Basic problem for non-dir files is that more than on dentry could be
      pointing to same inode and in theory only relying on dentry based locks
      (d->d_lock) did not seem sufficient.
      
      We set redirect upon rename and upon link creation.  In both the paths for
      non-dir file, VFS locks both source and target inodes (->i_rwsem).  That
      means vfs rename and link operations on same source and target can't he
      happening in parallel (Even if there are multiple dentries pointing to same
      inode).  So that probably means that at a time on an inode, only one call
      of ovl_set_redirect() could be working and we don't need additional locking
      in ovl_set_redirect().
      
      ovl_inode->redirect is initialized only when inode is created new.  That
      means it should not race with any other path and setting
      ovl_inode->redirect should be fine.
      
      Reading of ovl_inode->redirect happens in ovl_get_redirect() path.  And
      this called only in ovl_set_redirect().  And ovl_set_redirect() already
      seemed to be protected using ->i_rwsem.  That means ovl_set_redirect() and
      ovl_get_redirect() on source/target inode should not make progress in
      parallel and is mutually exclusive.  Hence no additional locking required.
      
      Now, only case where ovl_set_redirect() and ovl_get_redirect() could race
      seems to be case of absolute redirects where ovl_get_redirect() has to
      travel up the tree.  In that case we already take d->d_lock and that should
      be sufficient as directories will not have multiple dentries pointing to
      same inode.
      
      So given VFS locking and current usage of redirect, current locking around
      redirect seems to be ok for non-dir as well.  Once we have the logic to
      remove redirect when metacopy file gets copied up, then we probably will
      need additional locking.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4120fe64
    • V
      ovl: Set redirect on metacopy files upon rename · 7bb08383
      Vivek Goyal 提交于
      Set redirect on metacopy files upon rename.  This will help find data
      dentry in lower dirs.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7bb08383
    • V
      ovl: Do not set dentry type ORIGIN for broken hardlinks · 60124877
      Vivek Goyal 提交于
      If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN.  So far this
      decision was easy that we had to check only for oe->numlower and if it is
      non-zero, we knew there is copy up origin.  (For non-dir we installed
      origin dentry in lowerstack[0]).
      
      But we don't create ORGIN xattr for broken hardlinks (index=off).  And with
      metacopy feature it is possible that we will install lowerstack[0] but
      ORIGIN xattr is not there.  It is data dentry of upper metacopy dentry
      which has been found using regular name based lookup or using REDIRECT.  So
      with addition of this new case, just presence of oe->numlower is not
      sufficient to guarantee that ORIGIN xattr is present.
      
      So to differentiate between two cases, look at OVL_CONST_INO flag.  If this
      flag is set and upperdentry is there, that means it can be marked as type
      ORIGIN.  OVL_CONST_INO is not set if lower hardlink is broken or will be
      broken over copy up.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      60124877
    • V
      ovl: Add an inode flag OVL_CONST_INO · a00c2d59
      Vivek Goyal 提交于
      Add an ovl_inode flag OVL_CONST_INO.  This flag signifies if inode number
      will remain constant over copy up or not.  This flag does not get updated
      over copy up and remains unmodifed after setting once.
      
      Next patch in the series will make use of this flag.  It will basically
      figure out if dentry is of type ORIGIN or not.  And this can be derived by
      this flag.
      
      ORIGIN = (upperdentry && ovl_test_flag(OVL_CONST_INO, inode)).
      Suggested-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a00c2d59
    • V
      ovl: Treat metacopy dentries as type OVL_PATH_MERGE · 0b17c28a
      Vivek Goyal 提交于
      Right now OVL_PATH_MERGE is used only for merged directories.  But
      conceptually, a metacopy dentry (backed by a lower data dentry) is a merged
      entity as well.
      
      So mark metacopy dentries as OVL_PATH_MERGE and ovl_rename() makes use of
      this property later to set redirect on a metacopy file.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0b17c28a
    • V
      ovl: Check redirects for metacopy files · b8a8824c
      Vivek Goyal 提交于
      Right now we rely on path based lookup for data origin of metacopy upper.
      This will work only if upper has not been renamed.  We solved this problem
      already for merged directories using redirect.  Use same logic for metacopy
      files.
      
      This patch just goes on to check redirects for metacopy files.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b8a8824c
    • V
      ovl: Move some dir related ovl_lookup_single() code in else block · 0618a816
      Vivek Goyal 提交于
      Move some directory related code in else block.  This is pure code
      reorganization and no functionality change.
      
      Next patch enables redirect processing on metacopy files and needs this
      change.  By keeping non-functional changes in a separate patch, next patch
      looks much smaller and cleaner.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0618a816
    • V
      ovl: Do not expose metacopy only dentry from d_real() · 2c3d7358
      Vivek Goyal 提交于
      Metacopy dentry/inode is internal to overlay and is never exposed outside
      of it.  Exception is metacopy upper file used for fsync().  Modify d_real()
      to look for dentries/inode which have data, but also allow matching upper
      inode without data for the fsync case.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2c3d7358
    • V
      ovl: Open file with data except for the case of fsync · 8c444d2a
      Vivek Goyal 提交于
      ovl_open() should open file which contains data and not open metacopy
      inode.  With the introduction of metacopy inodes, with current
      implementaion we will end up opening metacopy inode as well.
      
      But there can be certain circumstances like ovl_fsync() where we want to
      allow opening a metacopy inode instead.
      
      Hence, change ovl_open_realfile() and and add extra parameter which
      specifies whether to allow opening metacopy inode or not.  If this
      parameter is false, we look for data inode and open that.
      
      This should allow covering both the cases.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8c444d2a
    • V
      ovl: Add helper ovl_inode_realdata() · 4823d49c
      Vivek Goyal 提交于
      Add an helper to retrieve real data inode associated with overlay inode.
      This helper will ignore all metacopy inodes and will return only the real
      inode which has data.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4823d49c
    • V
      ovl: Store lower data inode in ovl_inode · 2664bd08
      Vivek Goyal 提交于
      Right now ovl_inode stores inode pointer for lower inode.  This helps with
      quickly getting lower inode given overlay inode (ovl_inode_lower()).
      
      Now with metadata only copy-up, we can have metacopy inode in middle layer
      as well and inode containing data can be different from ->lower.  I need to
      be able to open the real file in ovl_open_realfile() and for that I need to
      quickly find the lower data inode.
      
      Hence store lower data inode also in ovl_inode.  Also provide an helper
      ovl_inode_lowerdata() to access this field.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2664bd08
    • V
      ovl: Fix ovl_getattr() to get number of blocks from lower · 67d756c2
      Vivek Goyal 提交于
      If an inode has been copied up metadata only, then we need to query the
      number of blocks from lower and fill up the stat->st_blocks.
      
      We need to be careful about races where we are doing stat on one cpu and
      data copy up is taking place on other cpu.  We want to return
      stat->st_blocks either from lower or stable upper and not something in
      between.  Hence, ovl_has_upperdata() is called first to figure out whether
      block reporting will take place from lower or upper.
      
      We now support metacopy dentries in middle layer.  That means number of
      blocks reporting needs to come from lowest data dentry and this could be
      different from lower dentry.  Hence we end up making a separate
      vfs_getxattr() call for metacopy dentries to get number of blocks.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      67d756c2
    • V
      ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry · 647d253f
      Vivek Goyal 提交于
      Now we have the notion of data dentry and metacopy dentry.
      ovl_dentry_lower() will return uppermost lower dentry, but it could be
      either data or metacopy dentry.  Now we support metacopy dentries in lower
      layers so it is possible that lowerstack[0] is metacopy dentry while
      lowerstack[1] is actual data dentry.
      
      So add an helper which returns lowest most dentry which is supposed to be
      data dentry.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      647d253f
    • V
      ovl: Copy up meta inode data from lowest data inode · 4f93b426
      Vivek Goyal 提交于
      So far lower could not be a meta inode.  So whenever it was time to copy up
      data of a meta inode, we could copy it up from top most lower dentry.
      
      But now lower itself can be a metacopy inode.  That means data copy up
      needs to take place from a data inode in metacopy inode chain.  Find lower
      data inode in the chain and use that for data copy up.
      
      Introduced a helper called ovl_path_lowerdata() to find the lower data
      inode chain.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4f93b426
    • V
      ovl: Modify ovl_lookup() and friends to lookup metacopy dentry · 9d3dfea3
      Vivek Goyal 提交于
      This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
      It also allows for presence of metacopy dentries in lower layer.
      
      During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
      set OVL_UPPERDATA bit in flags.
      
      We don't support metacopy feature with nfs_export.  So in nfs_export code,
      we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
      
      Do not follow metacopy origin if we find a metacopy only inode and metacopy
      feature is not enabled for that mount.  Like redirect, this can have
      security implications where an attacker could hand craft upper and try to
      gain access to file on lower which it should not have to begin with.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9d3dfea3
    • V
      ovl: Use out_err instead of out_nomem · 027065b7
      Vivek Goyal 提交于
      Right now we use goto out_nomem which assumes error code is -ENOMEM.  But
      there are other errors returned like -ESTALE as well.  So instead of
      out_nomem, use out_err which will do ERR_PTR(err).  That way one can put
      error code in err and jump to out_err.
      
      This just code reorganization and no change of functionality.
      
      I am about to add more code and this organization helps laying more code
      and error paths on top of it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      027065b7
    • V
      ovl: A new xattr OVL_XATTR_METACOPY for file on upper · 0c288874
      Vivek Goyal 提交于
      Now we will have the capability to have upper inodes which might be only
      metadata copy up and data is still on lower inode.  So add a new xattr
      OVL_XATTR_METACOPY to distinguish between two cases.
      
      Presence of OVL_XATTR_METACOPY reflects that file has been copied up
      metadata only and and data will be copied up later from lower origin.  So
      this xattr is set when a metadata copy takes place and cleared when data
      copy takes place.
      
      We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
      whether ovl inode has data or not (as opposed to metadata only copy up).
      
      If a file is copied up metadata only and later when same file is opened for
      WRITE, then data copy up takes place.  We copy up data, remove METACOPY
      xattr and then set the UPPERDATA flag in ovl_inode->flags.  While all these
      operations happen with oi->lock held, read side of oi->flags can be
      lockless.  That is another thread on another cpu can check if UPPERDATA
      flag is set or not.
      
      So this gives us an ordering requirement w.r.t UPPERDATA flag.  That is, if
      another cpu sees UPPERDATA flag set, then it should be guaranteed that
      effects of data copy up and remove xattr operations are also visible.
      
      For example.
      
      	CPU1				CPU2
      ovl_open()				acquire(oi->lock)
       ovl_open_maybe_copy_up()                ovl_copy_up_data()
        open_open_need_copy_up()		 vfs_removexattr()
         ovl_already_copied_up()
          ovl_dentry_needs_data_copy_up()	 ovl_set_flag(OVL_UPPERDATA)
           ovl_test_flag(OVL_UPPERDATA)       release(oi->lock)
      
      Say CPU2 is copying up data and in the end sets UPPERDATA flag.  But if
      CPU1 perceives the effects of setting UPPERDATA flag but not the effects of
      preceding operations (ex. upper that is not fully copied up), it will be a
      problem.
      
      Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
      and smp_rmb() on UPPERDATA flag test operation.
      
      May be some other lock or barrier is already covering it. But I am not sure
      what that is and is it obvious enough that we will not break it in future.
      
      So hence trying to be safe here and introducing barriers explicitly for
      UPPERDATA flag/bit.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0c288874
    • V
      ovl: Add helper ovl_already_copied_up() · 2002df85
      Vivek Goyal 提交于
      There are couple of places where we need to know if file is already copied
      up (in lockless manner).  Right now its open coded and there are only two
      conditions to check.  Soon this patch series will introduce another
      condition to check and Amir wants to introduce one more.  So introduce a
      helper instead to check this so that code is easier to read.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2002df85
    • V
      ovl: Copy up only metadata during copy up where it makes sense · 44d5bf10
      Vivek Goyal 提交于
      If it makes sense to copy up only metadata during copy up, do it.  This is
      done for regular files which are not opened for WRITE.
      
      Right now ->metacopy is set to 0 always.  Last patch in the series will
      remove the hard coded statement and enable metacopy feature.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      44d5bf10
    • V
      ovl: During copy up, first copy up metadata and then data · bd64e575
      Vivek Goyal 提交于
      Just a little re-ordering of code.  This helps with next patch where after
      copying up metadata, we skip data copying step, if needed.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      bd64e575
    • V
      ovl: Provide a mount option metacopy=on/off for metadata copyup · d5791044
      Vivek Goyal 提交于
      By default metadata only copy up is disabled.  Provide a mount option so
      that users can choose one way or other.
      
      Also provide a kernel config and module option to enable/disable metacopy
      feature.
      
      metacopy feature requires redirect_dir=on when upper is present.
      Otherwise, it requires redirect_dir=follow atleast.
      
      As of now, metacopy does not work with nfs_export=on.  So if both
      metacopy=on and nfs_export=on then nfs_export is disabled.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d5791044
    • V
      ovl: Move the copy up helpers to copy_up.c · d6eac039
      Vivek Goyal 提交于
      Right now two copy up helpers are in inode.c.  Amir suggested it might be
      better to move these to copy_up.c.
      
      There will one more related function which will come in later patch.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d6eac039
    • V
      ovl: Initialize ovl_inode->redirect in ovl_get_inode() · 9cec54c8
      Vivek Goyal 提交于
      ovl_inode->redirect is an inode property and should be initialized in
      ovl_get_inode() only when we are adding a new inode to cache.  If inode is
      already in cache, it is already initialized and we should not be touching
      ovl_inode->redirect field.
      
      As of now this is not a problem as redirects are used only for directories
      which don't share inode.  But soon I want to use redirects for regular
      files also and there it can become an issue.
      
      Hence, move ->redirect initialization in ovl_get_inode().
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9cec54c8
    • M
      ovl: fix documentation of non-standard behavior · 0c31d675
      Miklos Szeredi 提交于
      We can now drop description of the ro/rw inconsistency from the
      documentation.
      
      Also clarify, that now fully standard compliant behavior can be enabled
      with kernel/module/mount options.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0c31d675
  2. 18 7月, 2018 15 次提交