1. 12 4月, 2018 18 次提交
    • A
      ovl: add support for "xino" mount and config options · 795939a9
      Amir Goldstein 提交于
      With mount option "xino=on", mounter declares that there are enough
      free high bits in underlying fs to hold the layer fsid.
      If overlayfs does encounter underlying inodes using the high xino
      bits reserved for layer fsid, a warning will be emitted and the original
      inode number will be used.
      
      The mount option name "xino" goes after a similar meaning mount option
      of aufs, but in overlayfs case, the mapping is stateless.
      
      An example for a use case of "xino=on" is when upper/lower is on an xfs
      filesystem. xfs uses 64bit inode numbers, but it currently never uses the
      upper 8bit for inode numbers exposed via stat(2) and that is not likely to
      change in the future without user opting-in for a new xfs feature. The
      actual number of unused upper bit is much larger and determined by the xfs
      filesystem geometry (64 - agno_log - agblklog - inopblog). That means
      that for all practical purpose, there are enough unused bits in xfs
      inode numbers for more than OVL_MAX_STACK unique fsid's.
      
      Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode
      numbers are allocated sequentially since boot, so they will practially
      never use the high inode number bits.
      
      For compatibility with applications that expect 32bit inodes, the feature
      can be disabled with "xino=off". The option "xino=auto" automatically
      detects underlying filesystem that use 32bit inodes and enables the
      feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of
      the same name, determine if the default mode for overlayfs mount is
      "xino=auto" or "xino=off".
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      795939a9
    • A
      ovl: consistent d_ino for non-samefs with xino · adbf4f7e
      Amir Goldstein 提交于
      When overlay layers are not all on the same fs, but all inode numbers
      of underlying fs do not use the high 'xino' bits, overlay st_ino values
      are constant and persistent.
      
      In that case, relax non-samefs constraint for consistent d_ino and always
      iterate non-merge dir using ovl_fill_real() actor so we can remap lower
      inode numbers to unique lower fs range.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      adbf4f7e
    • A
      ovl: consistent i_ino for non-samefs with xino · 12574a9f
      Amir Goldstein 提交于
      When overlay layers are not all on the same fs, but all inode numbers
      of underlying fs do not use the high 'xino' bits, overlay st_ino values
      are constant and persistent.
      
      In that case, set i_ino value to the same value as st_ino for nfsd
      readdirplus validator.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      12574a9f
    • A
      ovl: constant st_ino for non-samefs with xino · e487d889
      Amir Goldstein 提交于
      On 64bit systems, when overlay layers are not all on the same fs, but
      all inode numbers of underlying fs are not using the high bits, use the
      high bits to partition the overlay st_ino address space.  The high bits
      hold the fsid (upper fsid is 0).  This way overlay inode numbers are unique
      and all inodes use overlay st_dev.  Inode numbers are also persistent
      for a given layer configuration.
      
      Currently, our only indication for available high ino bits is from a
      filesystem that supports file handles and uses the default encode_fh()
      operation, which encodes a 32bit inode number.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e487d889
    • A
      ovl: allocate anon bdev per unique lower fs · 5148626b
      Amir Goldstein 提交于
      Instead of allocating an anonymous bdev per lower layer, allocate
      one anonymous bdev per every unique lower fs that is different than
      upper fs.
      
      Every unique lower fs is assigned an fsid > 0 and the number of
      unique lower fs are stored in ofs->numlowerfs.
      
      The assigned fsid is stored in the lower layer struct and will be
      used also for inode number multiplexing.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5148626b
    • A
      ovl: factor out ovl_map_dev_ino() helper · da309e8c
      Amir Goldstein 提交于
      A helper for ovl_getattr() to map the values of st_dev and st_ino
      according to constant st_ino rules.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      da309e8c
    • M
      ovl: cleanup ovl_update_time() · 8f35cf51
      Miklos Szeredi 提交于
      No need to mess with an alias, the upperdentry can be retrieved directly
      from the overlay inode.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8f35cf51
    • M
      ovl: add WARN_ON() for non-dir redirect cases · 3a291774
      Miklos Szeredi 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3a291774
    • V
      ovl: cleanup setting OVL_INDEX · 0471a9cd
      Vivek Goyal 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0471a9cd
    • V
      ovl: set d->is_dir and d->opaque for last path element · 102b0d11
      Vivek Goyal 提交于
      Certain properties in ovl_lookup_data should be set only for the last
      element of the path. IOW, if we are calling ovl_lookup_single() for an
      absolute redirect, then d->is_dir and d->opaque do not make much sense
      for intermediate path elements. Instead set them only if dentry being
      lookup is last path element.
      
      As of now we do not seem to be making use of d->opaque if it is set for
      a path/dentry in lower. But just define the semantics so that future code
      can make use of this assumption.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      102b0d11
    • V
      ovl: Do not check for redirect if this is last layer · e9b77f90
      Vivek Goyal 提交于
      If we are looking in last layer, then there should not be any need to
      process redirect. redirect information is used only for lookup in next
      lower layer and there is no more lower layer to look into. So no need
      to process redirects.
      
      IOW, ignore redirects on lowest layer.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e9b77f90
    • A
      ovl: lookup in inode cache first when decoding lower file handle · 8b58924a
      Amir Goldstein 提交于
      When decoding a lower file handle, we need to check if lower file was
      copied up and indexed and if it has a whiteout index, we need to check
      if this is an unlinked but open non-dir before returning -ESTALE.
      
      To find out if this is an unlinked but open non-dir we need to lookup
      an overlay inode in inode cache by lower inode and that requires decoding
      the lower file handle before looking in inode cache.
      
      Before this change, if the lower inode turned out to be a directory, we
      may have paid an expensive cost to reconnect that lower directory for
      nothing.
      
      After this change, we start by decoding a disconnected lower dentry and
      using the lower inode for looking up an overlay inode in inode cache.
      If we find overlay inode and dentry in cache, we avoid the index lookup
      overhead. If we don't find an overlay inode and dentry in cache, then we
      only need to decode a connected lower dentry in case the lower dentry is
      a non-indexed directory.
      
      The xfstests group overlay/exportfs tests decoding overlayfs file
      handles after drop_caches with different states of the file at encode
      and decode time. Overall the tests in the group call ovl_lower_fh_to_d()
      89 times to decode a lower file handle.
      
      Before this change, the tests called ovl_get_index_fh() 75 times and
      reconnect_one() 61 times.
      After this change, the tests call ovl_get_index_fh() 70 times and
      reconnect_one() 59 times. The 2 cases where reconnect_one() was avoided
      are cases where a non-upper directory file handle was encoded, then the
      directory removed and then file handle was decoded.
      
      To demonstrate the affect on decoding file handles with hot inode/dentry
      cache, the drop_caches call in the tests was disabled. Without
      drop_caches, there are no reconnect_one() calls at all before or after
      the change. Before the change, there are 75 calls to ovl_get_index_fh(),
      exactly as the case with drop_caches. After the change, there are only
      10 calls to ovl_get_index_fh().
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8b58924a
    • A
      ovl: do not try to reconnect a disconnected origin dentry · 8a22efa1
      Amir Goldstein 提交于
      On lookup of non directory, we try to decode the origin file handle
      stored in upper inode. The origin file handle is supposed to be decoded
      to a disconnected non-dir dentry, which is fine, because we only need
      the lower inode of a copy up origin.
      
      However, if the origin file handle somehow turns out to be a directory
      we pay the expensive cost of reconnecting the directory dentry, only to
      get a mismatch file type and drop the dentry.
      
      Optimize this case by explicitly opting out of reconnecting the dentry.
      Opting-out of reconnect is done by passing a NULL acceptable callback
      to exportfs_decode_fh().
      
      While the case described above is a strange corner case that does not
      really need to be optimized, the API added for this optimization will
      be used by a following patch to optimize a more common case of decoding
      an overlayfs file handle.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8a22efa1
    • A
      ovl: disambiguate ovl_encode_fh() · 5b2cccd3
      Amir Goldstein 提交于
      Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the
      exportfs function ovl_encode_inode_fh() and change the latter to
      ovl_encode_fh() to match the exportfs method name.
      
      Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5b2cccd3
    • A
      ovl: set lower layer st_dev only if setting lower st_ino · 9f99e50d
      Amir Goldstein 提交于
      For broken hardlinks, we do not return lower st_ino, so we should
      also not return lower pseudo st_dev.
      
      Fixes: a0c5ad30 ("ovl: relax same fs constraint for constant st_ino")
      Cc: <stable@vger.kernel.org> #v4.15
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9f99e50d
    • A
      ovl: fix lookup with middle layer opaque dir and absolute path redirects · 3ec9b3fa
      Amir Goldstein 提交于
      As of now if we encounter an opaque dir while looking for a dentry, we set
      d->last=true. This means that there is no need to look further in any of
      the lower layers. This works fine as long as there are no redirets or
      relative redircts. But what if there is an absolute redirect on the
      children dentry of opaque directory. We still need to continue to look into
      next lower layer. This patch fixes it.
      
      Here is an example to demonstrate the issue. Say you have following setup.
      
      upper:  /redirect (redirect=/a/b/c)
      lower1: /a/[b]/c       ([b] is opaque) (c has absolute redirect=/a/b/d/)
      lower0: /a/b/d/foo
      
      Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d.
      Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look
      into lower0 because children c has an absolute redirect.
      
      Following is a reproducer.
      
      Watch me make foo disappear:
      
       $ mkdir lower middle upper work work2 merged
       $ mkdir lower/origin
       $ touch lower/origin/foo
       $ mount -t overlay none merged/ \
               -olowerdir=lower,upperdir=middle,workdir=work2
       $ mkdir merged/pure
       $ mv merged/origin merged/pure/redirect
       $ umount merged
       $ mount -t overlay none merged/ \
               -olowerdir=middle:lower,upperdir=upper,workdir=work
       $ mv merged/pure/redirect merged/redirect
      
      Now you see foo inside a twice redirected merged dir:
      
       $ ls merged/redirect
       foo
       $ umount merged
       $ mount -t overlay none merged/ \
               -olowerdir=middle:lower,upperdir=upper,workdir=work
      
      After mount cycle you don't see foo inside the same dir:
      
       $ ls merged/redirect
      
      During middle layer lookup, the opaqueness of middle/pure is left in
      the lookup state and then middle/pure/redirect is wrongly treated as
      opaque.
      
      Fixes: 02b69b28 ("ovl: lookup redirects")
      Cc: <stable@vger.kernel.org> #v4.10
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3ec9b3fa
    • V
      ovl: Set d->last properly during lookup · 452061fd
      Vivek Goyal 提交于
      d->last signifies that this is the last layer we are looking into and there
      is no more. And that means this allows for some optimzation opportunities
      during lookup. For example, in ovl_lookup_single() we don't have to check
      for opaque xattr of a directory is this is the last layer we are looking
      into (d->last = true).
      
      But knowing for sure whether we are looking into last layer can be very
      tricky. If redirects are not enabled, then we can look at poe->numlower and
      figure out if the lookup we are about to is last layer or not. But if
      redircts are enabled then it is possible poe->numlower suggests that we are
      looking in last layer, but there is an absolute redirect present in found
      element and that redirects us to a layer in root and that means lookup will
      continue in lower layers further.
      
      For example, consider following.
      
      /upperdir/pure (opaque=y)
      /upperdir/pure/foo (opaque=y,redirect=/bar)
      /lowerdir/bar
      
      In this case pure is "pure upper". When we look for "foo", that time
      poe->numlower=0. But that alone does not mean that we will not search for a
      merge candidate in /lowerdir. Absolute redirect changes that.
      
      IOW, d->last should not be set just based on poe->numlower if redirects are
      enabled. That can lead to setting d->last while it should not have and that
      means we will not check for opaque xattr while we should have.
      
      So do this.
      
       - If redirects are not enabled, then continue to rely on poe->numlower
         information to determine if it is last layer or not.
      
       - If redirects are enabled, then set d->last = true only if this is the
         last layer in root ovl_entry (roe).
      Suggested-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 02b69b28 ("ovl: lookup redirects")
      Cc: <stable@vger.kernel.org> #v4.10
      452061fd
    • A
      ovl: set i_ino to the value of st_ino for NFS export · 695b46e7
      Amir Goldstein 提交于
      Eddie Horng reported that readdir of an overlayfs directory that
      was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN.
      The reason is that while preparing the response for readdirplus, nfsd
      checks inside encode_entryplus_baggage() that a child dentry's inode
      number matches the value of d_ino returns by overlayfs readdir iterator.
      
      Because the overlayfs inodes use arbitrary inode numbers that are not
      correlated with the values of st_ino/d_ino, NFSv3 falls back to not
      encoding d_type. Although this is an allowed behavior, we can fix it for
      the case of all overlayfs layers on the same underlying filesystem.
      
      When NFS export is enabled and d_ino is consistent with st_ino
      (samefs), set the same value also to i_ino in ovl_fill_inode() for all
      overlayfs inodes, nfsd readdirplus sanity checks will pass.
      ovl_fill_inode() may be called from ovl_new_inode(), before real inode
      was created with ino arg 0. In that case, i_ino will be updated to real
      upper inode i_ino on ovl_inode_init() or ovl_inode_update().
      Reported-by: NEddie Horng <eddiehorng.tw@gmail.com>
      Tested-by: NEddie Horng <eddiehorng.tw@gmail.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Fixes: 8383f174 ("ovl: wire up NFS export operations")
      Cc: <stable@vger.kernel.org> #v4.16
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      695b46e7
  2. 07 3月, 2018 1 次提交
  3. 26 2月, 2018 2 次提交
    • V
      ovl: redirect_dir=nofollow should not follow redirect for opaque lower · d1fe96c0
      Vivek Goyal 提交于
      redirect_dir=nofollow should not follow a redirect. But in a specific
      configuration it can still follow it.  For example try this.
      
      $ mkdir -p lower0 lower1/foo upper work merged
      $ touch lower1/foo/lower-file.txt
      $ setfattr -n "trusted.overlay.opaque" -v "y" lower1/foo
      $ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=on none merged
      $ cd merged
      $ mv foo foo-renamed
      $ umount merged
      
      # mount again. This time with redirect_dir=nofollow
      $ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=nofollow none merged
      $ ls merged/foo-renamed/
      # This lists lower-file.txt, while it should not have.
      
      Basically, we are doing redirect check after we check for d.stop. And
      if this is not last lower, and we find an opaque lower, d.stop will be
      set.
      
      ovl_lookup_single()
              if (!d->last && ovl_is_opaquedir(this)) {
                      d->stop = d->opaque = true;
                      goto out;
              }
      
      To fix this, first check redirect is allowed. And after that check if
      d.stop has been set or not.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Fixes: 438c84c2 ("ovl: don't follow redirects if redirect_dir=off")
      Cc: <stable@vger.kernel.org> #v4.15
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d1fe96c0
    • F
      ovl: fix ptr_ret.cocci warnings · b5095f24
      Fengguang Wu 提交于
      fs/overlayfs/export.c:459:10-16: WARNING: PTR_ERR_OR_ZERO can be used
      
       Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
      
      Generated by: scripts/coccinelle/api/ptr_ret.cocci
      
      Fixes: 4b91c30a ("ovl: lookup connected ancestor of dir in inode cache")
      CC: Amir Goldstein <amir73il@gmail.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b5095f24
  4. 16 2月, 2018 3 次提交
    • A
      ovl: check ERR_PTR() return value from ovl_lookup_real() · 7168179f
      Amir Goldstein 提交于
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 06170154 ("ovl: lookup indexed ancestor of lower dir")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7168179f
    • A
      ovl: check lower ancestry on encode of lower dir file handle · 2ca3c148
      Amir Goldstein 提交于
      This change relaxes copy up on encode of merge dir with lower layer > 1
      and handles the case of encoding a merge dir with lower layer 1, where an
      ancestor is a non-indexed merge dir. In that case, decode of the lower
      file handle will not have been possible if the non-indexed ancestor is
      redirected before or after encode.
      
      Before encoding a non-upper directory file handle from real layer N, we
      need to check if it will be possible to reconnect an overlay dentry from
      the real lower decoded dentry. This is done by following the overlay
      ancestry up to a "layer N connected" ancestor and verifying that all
      parents along the way are "layer N connectable". If an ancestor that is
      NOT "layer N connectable" is found, we need to copy up an ancestor, which
      is "layer N connectable", thus making that ancestor "layer N connected".
      For example:
      
       layer 1: /a
       layer 2: /a/b/c
      
      The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
      copied up and renamed, upper dir /a will be indexed by lower dir /a from
      layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
      in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
      overlay dentry from the connected lower dentry /a/b/c.
      
      To avoid this problem on decode time, we need to copy up an ancestor of
      /a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
      /a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
      and when the time comes to decode the file handle from lower dentry /a/b/c,
      ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
      a connected overlay dentry will be accomplished.
      
      (*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
      an entry /a in the lower layers above layer N and find the indexed dir /a
      from layer 1. If that improvement is made, then the check for "layer N
      connected" will need to verify there are no redirects in lower layers above
      layer N. In the example above, /a will be "layer 2 connectable". However,
      if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
      "layer 2 connectable":
      
       layer 1: /A (redirect = /a)
       layer 2: /a/b/c
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2ca3c148
    • A
      ovl: hash non-dir by lower inode for fsnotify · 764baba8
      Amir Goldstein 提交于
      Commit 31747eda ("ovl: hash directory inodes for fsnotify")
      fixed an issue of inotify watch on directory that stops getting
      events after dropping dentry caches.
      
      A similar issue exists for non-dir non-upper files, for example:
      
      $ mkdir -p lower upper work merged
      $ touch lower/foo
      $ mount -t overlay -o
      lowerdir=lower,workdir=work,upperdir=upper none merged
      $ inotifywait merged/foo &
      $ echo 2 > /proc/sys/vm/drop_caches
      $ cat merged/foo
      
      inotifywait doesn't get the OPEN event, because ovl_lookup() called
      from 'cat' allocates a new overlay inode and does not reuse the
      watched inode.
      
      Fix this by hashing non-dir overlay inodes by lower real inode in
      the following cases that were not hashed before this change:
       - A non-upper overlay mount
       - A lower non-hardlink when index=off
      
      A helper ovl_hash_bylower() was added to put all the logic and
      documentation about which real inode an overlay inode is hashed by
      into one place.
      
      The issue dates back to initial version of overlayfs, but this
      patch depends on ovl_inode code that was introduced in kernel v4.13.
      
      Cc: <stable@vger.kernel.org> #v4.13
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      764baba8
  5. 05 2月, 2018 2 次提交
  6. 24 1月, 2018 14 次提交
    • A
      ovl: wire up NFS export operations · 8383f174
      Amir Goldstein 提交于
      Now that NFS export operations are implemented, enable overlayfs NFS
      export support if the "nfs_export" feature is enabled.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8383f174
    • A
      ovl: lookup indexed ancestor of lower dir · 06170154
      Amir Goldstein 提交于
      ovl_lookup_real() in lower layer walks back lower parents to find the
      topmost indexed parent. If an indexed ancestor is found before reaching
      lower layer root, ovl_lookup_real() is called recursively with upper
      layer to walk back from indexed upper to the topmost connected/hashed
      upper parent (or up to root).
      
      ovl_lookup_real() in upper layer then walks forward to connect the topmost
      upper overlay dir dentry and ovl_lookup_real() in lower layer continues to
      walk forward to connect the decoded lower overlay dir dentry.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      06170154
    • A
      ovl: lookup connected ancestor of dir in inode cache · 4b91c30a
      Amir Goldstein 提交于
      Decoding a dir file handle requires walking backward up to layer root and
      for lower dir also checking the index to see if any of the parents have
      been copied up.
      
      Lookup overlay ancestor dentry in inode/dentry cache by decoded real
      parents to shortcut looking up all the way back to layer root.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4b91c30a
    • A
      ovl: hash non-indexed dir by upper inode for NFS export · 7a9dadef
      Amir Goldstein 提交于
      Non-indexed upper dirs are encoded as upper file handles. When NFS export
      is enabled, hash non-indexed directory inodes by upper inode, so we can
      find them in inode cache using the decoded upper inode.
      
      When NFS export is disabled, directories are not indexed on copy up, so
      hash non-indexed directory inodes by origin inode, the same hash key
      that is used before copy up.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7a9dadef
    • A
      ovl: decode pure lower dir file handles · 98892516
      Amir Goldstein 提交于
      Similar to decoding a pure upper dir file handle, decoding a pure lower
      dir file handle is implemented by looking an overlay dentry of the same
      path as the pure lower path and verifying that the overlay dentry's
      real lower matches the decoded real lower file handle.
      
      Unlike the case of upper dir file handle, the lookup of overlay path by
      lower real path can fail or find a mismatched overlay dentry if any of
      the lower parents have been copied up and renamed. To address this case
      we will need to check if any of the lower parents are indexed.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      98892516
    • A
      ovl: decode indexed dir file handles · 3b0bfc6e
      Amir Goldstein 提交于
      Decoding an indexed dir file handle is done by looking up the file handle
      in index dir by name and then decoding the upper dir from the index origin
      file handle. The decoded upper path is used to lookup an overlay dentry of
      the same path.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3b0bfc6e
    • A
      ovl: decode lower file handles of unlinked but open files · 9436a1a3
      Amir Goldstein 提交于
      Lookup overlay inode in cache by origin inode, so we can decode a file
      handle of an open file even if the index has a whiteout index entry to
      mark this overlay inode was unlinked.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9436a1a3
    • A
      ovl: decode indexed non-dir file handles · f71bd9cf
      Amir Goldstein 提交于
      Decoding an indexed non-dir file handle is similar to decoding a lower
      non-dir file handle, but additionally, we lookup the file handle in index
      dir by name to find the real upper inode.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f71bd9cf
    • A
      ovl: decode lower non-dir file handles · f941866f
      Amir Goldstein 提交于
      Decoding a lower non-dir file handle is done by decoding the lower dentry
      from underlying lower fs, finding or allocating an overlay inode that is
      hashed by the real lower inode and instantiating an overlay dentry with
      that inode.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f941866f
    • A
      ovl: encode lower file handles · 03e1c584
      Amir Goldstein 提交于
      For indexed or lower non-dir, encode a non-connectable lower file handle
      from origin inode. For indexed or lower dir, when ofs->numlower == 1,
      encode a lower file handle from lower dir.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      03e1c584
    • A
      ovl: copy up before encoding non-connectable dir file handle · 05e1f118
      Amir Goldstein 提交于
      Decoding a merge dir, whose origin's parent is under a redirected
      lower dir is not always possible. As a simple aproximation, we do
      not encode lower dir file handles when overlay has multiple lower
      layers and origin is below the topmost lower layer.
      
      We should later relax this condition and copy up only the parent
      that is under a redirected lower.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      05e1f118
    • A
      ovl: encode non-indexed upper file handles · b305e844
      Amir Goldstein 提交于
      We only need to encode origin if there is a chance that the same object was
      encoded pre copy up and then we need to stay consistent with the same
      encoding also after copy up.
      
      In case a non-pure upper is not indexed, then it was copied up before NFS
      export support was enabled. In that case, we don't need to worry about
      staying consistent with pre copy up encoding and we encode an upper file
      handle.
      
      This mitigates the problem that with no index, we cannot find an upper
      inode from origin inode, so we cannot decode a non-indexed upper from
      origin file handle.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b305e844
    • A
      ovl: decode connected upper dir file handles · 3985b70a
      Amir Goldstein 提交于
      Until this change, we decoded upper file handles by instantiating an
      overlay dentry from the real upper dentry. This is sufficient to handle
      pure upper files, but insufficient to handle merge/impure dirs.
      
      To that end, if decoded real upper dir is connected and hashed, we
      lookup an overlay dentry with the same path as the real upper dir.
      If decoded real upper is non-dir, we instantiate a disconnected overlay
      dentry as before this change.
      
      Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
      exportfs never needs to call get_parent() and get_name() to reconnect an
      upper overlay dir. Because connectable non-dir file handles are not
      supported, exportfs will not be able to use fh_to_parent() and get_name()
      methods to reconnect a disconnected non-dir to its parent. Therefore, the
      methods get_parent() and get_name() are implemented just to print out a
      sanity warning and the method fh_to_parent() is implemented to warn the
      user that using the 'subtree_check' exportfs option is not supported.
      
      An alternative approach could have been to implement instantiating of
      an overlay directory inode from origin/index and implement get_parent()
      and get_name() by calling into underlying fs operations and them
      instantiating the overlay parent dir.
      
      The reasons for not choosing the get_parent() approach were:
      - Obtaining a disconnected overlay dir dentry would requires a
        delicate re-factoring of ovl_lookup() to get a dentry with overlay
        parent info. It was preferred to avoid doing that re-factoring unless
        it was proven worthy.
      - Going down the path of disconnected dir would mean that the (non
        trivial) code path of d_splice_alias() could be traveled and that
        meant writing more tests and introduces race cases that are very hard
        to hit on purpose. Taking the path of connecting overlay dentry by
        forward lookup is therefore the safe and boring way to avoid surprises.
      
      The culprits of the chosen "connected overlay dentry" approach:
      - We need to take special care to rename of ancestors while connecting
        the overlay dentry by real dentry path. These subtleties are usually
        handled by generic exportfs and VFS code.
      - In a hypothetical workload, we could end up in a loop trying to connect,
        interrupted by rename and restarting connect forever.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3985b70a
    • A
      ovl: decode pure upper file handles · 8556a420
      Amir Goldstein 提交于
      Decoding an upper file handle is done by decoding the upper dentry from
      underlying upper fs, finding or allocating an overlay inode that is
      hashed by the real upper inode and instantiating an overlay dentry with
      that inode.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8556a420