1. 20 7月, 2018 3 次提交
    • V
      ovl: Store lower data inode in ovl_inode · 2664bd08
      Vivek Goyal 提交于
      Right now ovl_inode stores inode pointer for lower inode.  This helps with
      quickly getting lower inode given overlay inode (ovl_inode_lower()).
      
      Now with metadata only copy-up, we can have metacopy inode in middle layer
      as well and inode containing data can be different from ->lower.  I need to
      be able to open the real file in ovl_open_realfile() and for that I need to
      quickly find the lower data inode.
      
      Hence store lower data inode also in ovl_inode.  Also provide an helper
      ovl_inode_lowerdata() to access this field.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2664bd08
    • V
      ovl: A new xattr OVL_XATTR_METACOPY for file on upper · 0c288874
      Vivek Goyal 提交于
      Now we will have the capability to have upper inodes which might be only
      metadata copy up and data is still on lower inode.  So add a new xattr
      OVL_XATTR_METACOPY to distinguish between two cases.
      
      Presence of OVL_XATTR_METACOPY reflects that file has been copied up
      metadata only and and data will be copied up later from lower origin.  So
      this xattr is set when a metadata copy takes place and cleared when data
      copy takes place.
      
      We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
      whether ovl inode has data or not (as opposed to metadata only copy up).
      
      If a file is copied up metadata only and later when same file is opened for
      WRITE, then data copy up takes place.  We copy up data, remove METACOPY
      xattr and then set the UPPERDATA flag in ovl_inode->flags.  While all these
      operations happen with oi->lock held, read side of oi->flags can be
      lockless.  That is another thread on another cpu can check if UPPERDATA
      flag is set or not.
      
      So this gives us an ordering requirement w.r.t UPPERDATA flag.  That is, if
      another cpu sees UPPERDATA flag set, then it should be guaranteed that
      effects of data copy up and remove xattr operations are also visible.
      
      For example.
      
      	CPU1				CPU2
      ovl_open()				acquire(oi->lock)
       ovl_open_maybe_copy_up()                ovl_copy_up_data()
        open_open_need_copy_up()		 vfs_removexattr()
         ovl_already_copied_up()
          ovl_dentry_needs_data_copy_up()	 ovl_set_flag(OVL_UPPERDATA)
           ovl_test_flag(OVL_UPPERDATA)       release(oi->lock)
      
      Say CPU2 is copying up data and in the end sets UPPERDATA flag.  But if
      CPU1 perceives the effects of setting UPPERDATA flag but not the effects of
      preceding operations (ex. upper that is not fully copied up), it will be a
      problem.
      
      Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
      and smp_rmb() on UPPERDATA flag test operation.
      
      May be some other lock or barrier is already covering it. But I am not sure
      what that is and is it obvious enough that we will not break it in future.
      
      So hence trying to be safe here and introducing barriers explicitly for
      UPPERDATA flag/bit.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0c288874
    • V
      ovl: Provide a mount option metacopy=on/off for metadata copyup · d5791044
      Vivek Goyal 提交于
      By default metadata only copy up is disabled.  Provide a mount option so
      that users can choose one way or other.
      
      Also provide a kernel config and module option to enable/disable metacopy
      feature.
      
      metacopy feature requires redirect_dir=on when upper is present.
      Otherwise, it requires redirect_dir=follow atleast.
      
      As of now, metacopy does not work with nfs_export=on.  So if both
      metacopy=on and nfs_export=on then nfs_export is disabled.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d5791044
  2. 18 7月, 2018 5 次提交
  3. 31 5月, 2018 3 次提交
  4. 12 4月, 2018 3 次提交
    • A
      ovl: add support for "xino" mount and config options · 795939a9
      Amir Goldstein 提交于
      With mount option "xino=on", mounter declares that there are enough
      free high bits in underlying fs to hold the layer fsid.
      If overlayfs does encounter underlying inodes using the high xino
      bits reserved for layer fsid, a warning will be emitted and the original
      inode number will be used.
      
      The mount option name "xino" goes after a similar meaning mount option
      of aufs, but in overlayfs case, the mapping is stateless.
      
      An example for a use case of "xino=on" is when upper/lower is on an xfs
      filesystem. xfs uses 64bit inode numbers, but it currently never uses the
      upper 8bit for inode numbers exposed via stat(2) and that is not likely to
      change in the future without user opting-in for a new xfs feature. The
      actual number of unused upper bit is much larger and determined by the xfs
      filesystem geometry (64 - agno_log - agblklog - inopblog). That means
      that for all practical purpose, there are enough unused bits in xfs
      inode numbers for more than OVL_MAX_STACK unique fsid's.
      
      Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode
      numbers are allocated sequentially since boot, so they will practially
      never use the high inode number bits.
      
      For compatibility with applications that expect 32bit inodes, the feature
      can be disabled with "xino=off". The option "xino=auto" automatically
      detects underlying filesystem that use 32bit inodes and enables the
      feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of
      the same name, determine if the default mode for overlayfs mount is
      "xino=auto" or "xino=off".
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      795939a9
    • A
      ovl: constant st_ino for non-samefs with xino · e487d889
      Amir Goldstein 提交于
      On 64bit systems, when overlay layers are not all on the same fs, but
      all inode numbers of underlying fs are not using the high bits, use the
      high bits to partition the overlay st_ino address space.  The high bits
      hold the fsid (upper fsid is 0).  This way overlay inode numbers are unique
      and all inodes use overlay st_dev.  Inode numbers are also persistent
      for a given layer configuration.
      
      Currently, our only indication for available high ino bits is from a
      filesystem that supports file handles and uses the default encode_fh()
      operation, which encodes a 32bit inode number.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e487d889
    • A
      ovl: allocate anon bdev per unique lower fs · 5148626b
      Amir Goldstein 提交于
      Instead of allocating an anonymous bdev per lower layer, allocate
      one anonymous bdev per every unique lower fs that is different than
      upper fs.
      
      Every unique lower fs is assigned an fsid > 0 and the number of
      unique lower fs are stored in ofs->numlowerfs.
      
      The assigned fsid is stored in the lower layer struct and will be
      used also for inode number multiplexing.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5148626b
  5. 16 2月, 2018 1 次提交
    • A
      ovl: check lower ancestry on encode of lower dir file handle · 2ca3c148
      Amir Goldstein 提交于
      This change relaxes copy up on encode of merge dir with lower layer > 1
      and handles the case of encoding a merge dir with lower layer 1, where an
      ancestor is a non-indexed merge dir. In that case, decode of the lower
      file handle will not have been possible if the non-indexed ancestor is
      redirected before or after encode.
      
      Before encoding a non-upper directory file handle from real layer N, we
      need to check if it will be possible to reconnect an overlay dentry from
      the real lower decoded dentry. This is done by following the overlay
      ancestry up to a "layer N connected" ancestor and verifying that all
      parents along the way are "layer N connectable". If an ancestor that is
      NOT "layer N connectable" is found, we need to copy up an ancestor, which
      is "layer N connectable", thus making that ancestor "layer N connected".
      For example:
      
       layer 1: /a
       layer 2: /a/b/c
      
      The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
      copied up and renamed, upper dir /a will be indexed by lower dir /a from
      layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
      in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
      overlay dentry from the connected lower dentry /a/b/c.
      
      To avoid this problem on decode time, we need to copy up an ancestor of
      /a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
      /a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
      and when the time comes to decode the file handle from lower dentry /a/b/c,
      ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
      a connected overlay dentry will be accomplished.
      
      (*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
      an entry /a in the lower layers above layer N and find the indexed dir /a
      from layer 1. If that improvement is made, then the check for "layer N
      connected" will need to verify there are no redirects in lower layers above
      layer N. In the example above, /a will be "layer 2 connectable". However,
      if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
      "layer 2 connectable":
      
       layer 1: /A (redirect = /a)
       layer 2: /a/b/c
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2ca3c148
  6. 24 1月, 2018 9 次提交
    • A
      ovl: wire up NFS export operations · 8383f174
      Amir Goldstein 提交于
      Now that NFS export operations are implemented, enable overlayfs NFS
      export support if the "nfs_export" feature is enabled.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8383f174
    • A
      ovl: store 'has_upper' and 'opaque' as bit flags · c62520a8
      Amir Goldstein 提交于
      We need to make some room in struct ovl_entry to store information
      about redirected ancestors for NFS export, so cram two booleans as
      bit flags.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c62520a8
    • A
      ovl: use directory index entries for consistency verification · ad1d615c
      Amir Goldstein 提交于
      A directory index is a directory type entry in index dir with a
      "trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge
      directory upper dir inode.
      
      On lookup of non-dir files, lower file is followed by origin file handle.
      On lookup of dir entries, lower dir is found by name and then compared
      to origin file handle. We only trust dir index if we verified that lower
      dir matches origin file handle, otherwise index may be inconsistent and
      we ignore it.
      
      If we find an indexed non-upper dir or an indexed merged dir, whose
      index 'upper' xattr points to a different upper dir, that means that the
      lower directory may be also referenced by another upper dir via redirect,
      so we fail the lookup on inconsistency error.
      
      To be consistent with directory index entries format, the association of
      index dir to upper root dir, that was stored by older kernels in
      "trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper"
      xattr. This also serves as an indication that overlay was mounted with a
      kernel that support index directory entries. For backward compatibility,
      if an 'origin' xattr exists on the index dir we also verify it on mount.
      
      Directory index entries are going to be used for NFS export.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      ad1d615c
    • A
      ovl: add support for "nfs_export" configuration · f168f109
      Amir Goldstein 提交于
      Introduce the "nfs_export" config, module and mount options.
      
      The NFS export feature depends on the "index" feature and enables two
      implicit overlayfs features: "index_all" and "verify_lower".
      The "index_all" feature creates an index on copy up of every file and
      directory. The "verify_lower" feature uses the full index to detect
      overlay filesystems inconsistencies on lookup, like redirect from
      multiple upper dirs to the same lower dir.
      
      NFS export can be enabled for non-upper mount with no index. However,
      because lower layer redirects cannot be verified with the index, enabling
      NFS export support on an overlay with no upper layer requires turning off
      redirect follow (e.g. "redirect_dir=nofollow").
      
      The full index may incur some overhead on mount time, especially when
      verifying that lower directory file handles are not stale.
      
      NFS export support, full index and consistency verification will be
      implemented by following patches.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f168f109
    • A
      ovl: generalize ovl_verify_origin() and helpers · 05122443
      Amir Goldstein 提交于
      Remove the "origin" language from the functions that handle set, get
      and verify of "origin" xattr and pass the xattr name as an argument.
      
      The same helpers are going to be used for NFS export to get, get and
      verify the "upper" xattr for directory index entries.
      
      ovl_verify_origin() is now a helper used only to verify non upper
      file handle stored in "origin" xattr of upper inode.
      
      The upper root dir file handle is still stored in "origin" xattr on
      the index dir for backward compatibility. This is going to be changed
      by the patch that adds directory index entries support.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      05122443
    • A
      ovl: simplify arguments to ovl_check_origin_fh() · 1eff1a1d
      Amir Goldstein 提交于
      Pass the fs instance with lower_layers array instead of the dentry
      lowerstack array to ovl_check_origin_fh(), because the dentry members
      of lowerstack play no role in this helper.
      
      This change simplifies the argument list of ovl_check_origin(),
      ovl_cleanup_index() and ovl_verify_index().
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1eff1a1d
    • A
      ovl: store layer index in ovl_layer · d583ed7d
      Amir Goldstein 提交于
      Store the fs root layer index inside ovl_layer struct, so we can
      get the root fs layer index from merge dir lower layer instead of
      find it with ovl_find_layer() helper.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d583ed7d
    • A
      ovl: force r/o mount when index dir creation fails · 972d0093
      Amir Goldstein 提交于
      When work dir creation fails, a warning is emitted and overlay is
      mounted r/o. Trying to remount r/w will fail with no work dir.
      
      When index dir creation fails, the same warning is emitted and overlay
      is mounted r/o, but trying to remount r/w will succeed. This may cause
      unintentional corruption of filesystem consistency.
      
      Adjust the behavior of index dir creation failure to that of work dir
      creation failure and do not allow to remount r/w. User needs to state
      an explicitly intention to work without an index by mounting with
      option 'index=off' to allow r/w mount with no index dir.
      
      When mounting with option 'index=on' and no 'upperdir', index is
      implicitly disabled, so do not warn about no file handle support.
      
      The issue was introduced with inodes index feature in v4.13, but this
      patch will not apply cleanly before ovl_fill_super() re-factoring in
      v4.15.
      
      Fixes: 02bcd157 ("ovl: introduce the inodes index dir feature")
      Cc: <stable@vger.kernel.org> #v4.13
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      972d0093
    • A
      ovl: disable index when no xattr support · a683737b
      Amir Goldstein 提交于
      Overlayfs falls back to index=off if lower/upper fs does not support
      file handles. Do the same if upper fs does not support xattr.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a683737b
  7. 20 1月, 2018 1 次提交
  8. 19 1月, 2018 1 次提交
    • A
      ovl: hash directory inodes for fsnotify · 31747eda
      Amir Goldstein 提交于
      fsnotify pins a watched directory inode in cache, but if directory dentry
      is released, new lookup will allocate a new dentry and a new inode.
      Directory events will be notified on the new inode, while fsnotify listener
      is watching the old pinned inode.
      
      Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
      dirs are hashes by real upper inode, merge and lower dirs are hashed by
      real lower inode.
      
      The reference to lower inode was being held by the lower dentry object
      in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
      may drop lower inode refcount to zero. Add a refcount on behalf of the
      overlay inode to prevent that.
      
      As a by-product, hashing directory inodes also detects multiple
      redirected dirs to the same lower dir and uncovered redirected dir
      target on and returns -ESTALE on lookup.
      
      The reported issue dates back to initial version of overlayfs, but this
      patch depends on ovl_inode code that was introduced in kernel v4.13.
      
      Cc: <stable@vger.kernel.org> #v4.13
      Reported-by: NNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: NNiklas Cassel <niklas.cassel@axis.com>
      31747eda
  9. 11 12月, 2017 2 次提交
  10. 28 11月, 2017 1 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  11. 10 11月, 2017 9 次提交
  12. 09 11月, 2017 2 次提交