1. 27 3月, 2020 3 次提交
    • A
      ovl: enable xino automatically in more cases · 926e94d7
      Amir Goldstein 提交于
      So far, with xino=auto, we only enable xino if we know that all
      underlying filesystem use 32bit inode numbers.
      
      When users configure overlay with xino=auto, they already declare that
      they are ready to handle 64bit inode number from overlay.
      
      It is a very common case, that underlying filesystem uses 64bit ino,
      but rarely or never uses the high inode number bits (e.g. tmpfs, xfs).
      Leaving it for the users to declare high ino bits are unused with
      xino=on is not a recipe for many users to enjoy the benefits of xino.
      
      There appears to be very little reason not to enable xino when users
      declare xino=auto even if we do not know how many bits underlying
      filesystem uses for inode numbers.
      
      In the worst case of xino bits overflow by real inode number, we
      already fall back to the non-xino behavior - real inode number with
      unique pseudo dev or to non persistent inode number and overlay st_dev
      (for directories).
      
      The only annoyance from auto enabling xino is that xino bits overflow
      emits a warning to kmsg. Suppress those warnings unless users explicitly
      asked for xino=on, suggesting that they expected high ino bits to be
      unused by underlying filesystem.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      926e94d7
    • A
      ovl: avoid possible inode number collisions with xino=on · dfe51d47
      Amir Goldstein 提交于
      When xino feature is enabled and a real directory inode number overflows
      the lower xino bits, we cannot map this directory inode number to a unique
      and persistent inode number and we fall back to the real inode st_ino and
      overlay st_dev.
      
      The real inode st_ino with high bits may collide with a lower inode number
      on overlay st_dev that was mapped using xino.
      
      To avoid possible collision with legitimate xino values, map a non
      persistent inode number to a dedicated range in the xino address space.
      The dedicated range is created by adding one more bit to the number of
      reserved high xino bits.  We could have added just one more fsid, but that
      would have had the undesired effect of changing persistent overlay inode
      numbers on kernel or require more complex xino mapping code.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      dfe51d47
    • A
      ovl: use a private non-persistent ino pool · 4d314f78
      Amir Goldstein 提交于
      There is no reason to deplete the system's global get_next_ino() pool for
      overlay non-persistent inode numbers and there is no reason at all to
      allocate non-persistent inode numbers for non-directories.
      
      For non-directories, it is much better to leave i_ino the same as real
      i_ino, to be consistent with st_ino/d_ino.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4d314f78
  2. 17 3月, 2020 8 次提交
  3. 13 3月, 2020 1 次提交
    • A
      ovl: fix some xino configurations · 53afcd31
      Amir Goldstein 提交于
      Fix up two bugs in the coversion to xino_mode:
      1. xino=off does not always end up in disabled mode
      2. xino=auto on 32bit arch should end up in disabled mode
      
      Take a proactive approach to disabling xino on 32bit kernel:
      1. Disable XINO_AUTO config during build time
      2. Disable xino with a warning on mount time
      
      As a by product, xino=on on 32bit arch also ends up in disabled mode.
      We never intended to enable xino on 32bit arch and this will make the
      rest of the logic simpler.
      
      Fixes: 0f831ec8 ("ovl: simplify ovl_same_sb() helper")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      53afcd31
  4. 24 1月, 2020 6 次提交
    • J
      ovl: implement async IO routines · 2406a307
      Jiufei Xue 提交于
      A performance regression was observed since linux v4.19 with aio test using
      fio with iodepth 128 on overlayfs.  The queue depth of the device was
      always 1 which is unexpected.
      
      After investigation, it was found that commit 16914e6f ("ovl: add
      ovl_read_iter()") and commit 2a92e07e ("ovl: add ovl_write_iter()")
      resulted in vfs_iter_{read,write} being called on underlying filesystem,
      which always results in syncronous IO.
      
      Implement async IO for stacked reading and writing.  This resolves the
      performance regresion.
      
      This is implemented by allocating a new kiocb for submitting the AIO
      request on the underlying filesystem.  When the request is completed, the
      new kiocb is freed and the completion callback is called on the original
      iocb.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2406a307
    • M
      ovl: layer is const · 13464165
      Miklos Szeredi 提交于
      The ovl_layer struct is never modified except at initialization.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      13464165
    • A
      ovl: fix corner case of non-constant st_dev;st_ino · b7bf9908
      Amir Goldstein 提交于
      On non-samefs overlay without xino, non pure upper inodes should use a
      pseudo_dev assigned to each unique lower fs, but if lower layer is on the
      same fs and upper layer, it has no pseudo_dev assigned.
      
      In this overlay layers setup:
       - two filesystems, A and B
       - upper layer is on A
       - lower layer 1 is also on A
       - lower layer 2 is on B
      
      Non pure upper overlay inode, whose origin is in layer 1 will have the
      st_dev;st_ino values of the real lower inode before copy up and the
      st_dev;st_ino values of the real upper inode after copy up.
      
      Fix this inconsitency by assigning a unique pseudo_dev also for upper fs,
      that will be used as st_dev value along with the lower inode st_dev for
      overlay inodes in the case above.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b7bf9908
    • A
      ovl: fix corner case of conflicting lower layer uuid · 1b81dddd
      Amir Goldstein 提交于
      This fixes ovl_lower_uuid_ok() to correctly detect the corner case:
       - two filesystems, A and B, both have null uuid
       - upper layer is on A
       - lower layer 1 is also on A
       - lower layer 2 is on B
      
      In this case, bad_uuid would not have been set for B, because the check
      only involved the list of lower fs.  Hence we'll try to decode a layer 2
      origin on layer 1 and fail.
      
      We check for conflicting (and null) uuid among all lower layers, including
      those layers that are on the same fs as the upper layer.
      Reported-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1b81dddd
    • A
      ovl: generalize the lower_fs[] array · 07f1e596
      Amir Goldstein 提交于
      Rename lower_fs[] array to fs[], extend its size by one and use index fsid
      (instead of fsid-1) to access the fs[] array.
      
      Initialize fs[0] with upper fs values. fsid 0 is reserved even with lower
      only overlay, so fs[0] remains null in this case.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      07f1e596
    • A
      ovl: simplify ovl_same_sb() helper · 0f831ec8
      Amir Goldstein 提交于
      No code uses the sb returned from this helper, so make it retrun a boolean
      and rename it to ovl_same_fs().
      
      The xino mode is irrelevant when all layers are on same fs, so instead of
      describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which
      is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino
      enabled.
      
      Create a new helper ovl_same_dev(), to use instead of the common check for
      (ovl_same_fs() || xinobits).
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0f831ec8
  5. 23 1月, 2020 2 次提交
  6. 10 12月, 2019 1 次提交
    • A
      ovl: fix lookup failure on multi lower squashfs · 7e63c87f
      Amir Goldstein 提交于
      In the past, overlayfs required that lower fs have non null uuid in
      order to support nfs export and decode copy up origin file handles.
      
      Commit 9df085f3 ("ovl: relax requirement for non null uuid of
      lower fs") relaxed this requirement for nfs export support, as long
      as uuid (even if null) is unique among all lower fs.
      
      However, said commit unintentionally also relaxed the non null uuid
      requirement for decoding copy up origin file handles, regardless of
      the unique uuid requirement.
      
      Amend this mistake by disabling decoding of copy up origin file handle
      from lower fs with a conflicting uuid.
      
      We still encode copy up origin file handles from those fs, because
      file handles like those already exist in the wild and because they
      might provide useful information in the future.
      
      There is an unhandled corner case described by Miklos this way:
      - two filesystems, A and B, both have null uuid
      - upper layer is on A
      - lower layer 1 is also on A
      - lower layer 2 is on B
      
      In this case bad_uuid won't be set for B, because the check only
      involves the list of lower fs.  Hence we'll try to decode a layer 2
      origin on layer 1 and fail.
      
      We will deal with this corner case later.
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Tested-by: NColin Ian King <colin.king@canonical.com>
      Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/
      Fixes: 9df085f3 ("ovl: relax requirement for non null uuid ...")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7e63c87f
  7. 16 7月, 2019 1 次提交
  8. 19 6月, 2019 1 次提交
  9. 18 6月, 2019 3 次提交
    • N
      ovl: fix typo in MODULE_PARM_DESC · 253e7483
      Nicolas Schier 提交于
      Change first argument to MODULE_PARM_DESC() calls, that each of them
      matched the actual module parameter name.  The matching results in
      changing (the 'parm' section from) the output of `modinfo overlay` from:
      
          parm: ovl_check_copy_up:Obsolete; does nothing
          parm: redirect_max:ushort
          parm: ovl_redirect_max:Maximum length of absolute redirect xattr value
          parm: redirect_dir:bool
          parm: ovl_redirect_dir_def:Default to on or off for the redirect_dir feature
          parm: redirect_always_follow:bool
          parm: ovl_redirect_always_follow:Follow redirects even if redirect_dir feature is turned off
          parm: index:bool
          parm: ovl_index_def:Default to on or off for the inodes index feature
          parm: nfs_export:bool
          parm: ovl_nfs_export_def:Default to on or off for the NFS export feature
          parm: xino_auto:bool
          parm: ovl_xino_auto_def:Auto enable xino feature
          parm: metacopy:bool
          parm: ovl_metacopy_def:Default to on or off for the metadata only copy up feature
      
      into:
      
          parm: check_copy_up:Obsolete; does nothing
          parm: redirect_max:Maximum length of absolute redirect xattr value (ushort)
          parm: redirect_dir:Default to on or off for the redirect_dir feature (bool)
          parm: redirect_always_follow:Follow redirects even if redirect_dir feature is turned off (bool)
          parm: index:Default to on or off for the inodes index feature (bool)
          parm: nfs_export:Default to on or off for the NFS export feature (bool)
          parm: xino_auto:Auto enable xino feature (bool)
          parm: metacopy:Default to on or off for the metadata only copy up feature (bool)
      Signed-off-by: NNicolas Schier <n.schier@avm.de>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      253e7483
    • A
      ovl: fix bogus -Wmaybe-unitialized warning · 1dac6f5b
      Arnd Bergmann 提交于
      gcc gets a bit confused by the logic in ovl_setup_trap() and
      can't figure out whether the local 'trap' variable in the caller
      was initialized or not:
      
      fs/overlayfs/super.c: In function 'ovl_fill_super':
      fs/overlayfs/super.c:1333:4: error: 'trap' may be used uninitialized in this function [-Werror=maybe-uninitialized]
          iput(trap);
          ^~~~~~~~~~
      fs/overlayfs/super.c:1312:17: note: 'trap' was declared here
      
      Reword slightly to make it easier for the compiler to understand.
      
      Fixes: 146d62e5 ("ovl: detect overlapping layers")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1dac6f5b
    • M
      ovl: don't fail with disconnected lower NFS · 9179c21d
      Miklos Szeredi 提交于
      NFS mounts can be disconnected from fs root.  Don't fail the overlapping
      layer check because of this.
      
      The check is not authoritative anyway, since topology can change during or
      after the check.
      
      Reported-by: Antti Antinoja <antti@fennosys.fi> 
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 146d62e5 ("ovl: detect overlapping layers")
      9179c21d
  10. 29 5月, 2019 1 次提交
    • A
      ovl: detect overlapping layers · 146d62e5
      Amir Goldstein 提交于
      Overlapping overlay layers are not supported and can cause unexpected
      behavior, but overlayfs does not currently check or warn about these
      configurations.
      
      User is not supposed to specify the same directory for upper and
      lower dirs or for different lower layers and user is not supposed to
      specify directories that are descendants of each other for overlay
      layers, but that is exactly what this zysbot repro did:
      
          https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000
      
      Moving layer root directories into other layers while overlayfs
      is mounted could also result in unexpected behavior.
      
      This commit places "traps" in the overlay inode hash table.
      Those traps are dummy overlay inodes that are hashed by the layers
      root inodes.
      
      On mount, the hash table trap entries are used to verify that overlay
      layers are not overlapping.  While at it, we also verify that overlay
      layers are not overlapping with directories "in-use" by other overlay
      instances as upperdir/workdir.
      
      On lookup, the trap entries are used to verify that overlay layers
      root inodes have not been moved into other layers after mount.
      
      Some examples:
      
      $ ./run --ov --samefs -s
      ...
      ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt
        mount -o bind base/lower lower
        mount -o bind base/upper upper
        mount -t overlay none mnt ...
              -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w)
      
      $ umount mnt
      $ mount -t overlay none mnt ...
              -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w
      
        [   94.434900] overlayfs: overlapping upperdir path
        mount: mount overlay on mnt failed: Too many levels of symbolic links
      
      $ mount -t overlay none mnt ...
              -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w
      
        [  151.350132] overlayfs: conflicting lowerdir path
        mount: none is already mounted or mnt busy
      
      $ mount -t overlay none mnt ...
              -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w
      
        [  201.205045] overlayfs: overlapping lowerdir path
        mount: mount overlay on mnt failed: Too many levels of symbolic links
      
      $ mount -t overlay none mnt ...
              -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w
      $ mv base/upper/0/ base/lower/
      $ find mnt/0
        mnt/0
        mnt/0/w
        find: 'mnt/0/w/work': Too many levels of symbolic links
        find: 'mnt/0/u': Too many levels of symbolic links
      
      Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      146d62e5
  11. 02 5月, 2019 1 次提交
  12. 02 11月, 2018 1 次提交
    • M
      ovl: automatically enable redirect_dir on metacopy=on · d47748e5
      Miklos Szeredi 提交于
      Current behavior is to automatically disable metacopy if redirect_dir is
      not enabled and proceed with the mount.
      
      If "metacopy=on" mount option was given, then this behavior can confuse the
      user: no mount failure, yet metacopy is disabled.
      
      This patch makes metacopy=on imply redirect_dir=on.
      
      The converse is also true: turning off full redirect with redirect_dir=
      {off|follow|nofollow} will disable metacopy.
      
      If both metacopy=on and redirect_dir={off|follow|nofollow} is specified,
      then mount will fail, since there's no way to correctly resolve the
      conflict.
      Reported-by: NDaniel Walsh <dwalsh@redhat.com>
      Fixes: d5791044 ("ovl: Provide a mount option metacopy=on/off...")
      Cc: <stable@vger.kernel.org> # v4.19
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d47748e5
  13. 27 10月, 2018 1 次提交
    • A
      ovl: relax requirement for non null uuid of lower fs · 9df085f3
      Amir Goldstein 提交于
      We use uuid to associate an overlay lower file handle with a lower layer,
      so we can accept lower fs with null uuid as long as all lower layers with
      null uuid are on the same fs.
      
      This change allows enabling index and nfs_export features for the setup of
      single lower fs of type squashfs - squashfs supports file handles, but has
      a null uuid. This change also allows enabling index and nfs_export features
      for nested overlayfs, where the lower overlay has nfs_export enabled.
      
      Enabling the index feature with single lower squashfs fixes the
      unionmount-testsuite test:
        ./run --ov --squashfs --verify
      
      As a by-product, if, like the lower squashfs, upper fs also uses the
      generic export_encode_fh() implementation to export 32bit inode file
      handles (e.g. ext4), then the xino_auto config/module/mount option will
      enable unique overlay inode numbers.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9df085f3
  14. 10 9月, 2018 1 次提交
  15. 20 7月, 2018 4 次提交
    • V
      ovl: Do not expose metacopy only dentry from d_real() · 2c3d7358
      Vivek Goyal 提交于
      Metacopy dentry/inode is internal to overlay and is never exposed outside
      of it.  Exception is metacopy upper file used for fsync().  Modify d_real()
      to look for dentries/inode which have data, but also allow matching upper
      inode without data for the fsync case.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2c3d7358
    • V
      ovl: Store lower data inode in ovl_inode · 2664bd08
      Vivek Goyal 提交于
      Right now ovl_inode stores inode pointer for lower inode.  This helps with
      quickly getting lower inode given overlay inode (ovl_inode_lower()).
      
      Now with metadata only copy-up, we can have metacopy inode in middle layer
      as well and inode containing data can be different from ->lower.  I need to
      be able to open the real file in ovl_open_realfile() and for that I need to
      quickly find the lower data inode.
      
      Hence store lower data inode also in ovl_inode.  Also provide an helper
      ovl_inode_lowerdata() to access this field.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2664bd08
    • V
      ovl: A new xattr OVL_XATTR_METACOPY for file on upper · 0c288874
      Vivek Goyal 提交于
      Now we will have the capability to have upper inodes which might be only
      metadata copy up and data is still on lower inode.  So add a new xattr
      OVL_XATTR_METACOPY to distinguish between two cases.
      
      Presence of OVL_XATTR_METACOPY reflects that file has been copied up
      metadata only and and data will be copied up later from lower origin.  So
      this xattr is set when a metadata copy takes place and cleared when data
      copy takes place.
      
      We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
      whether ovl inode has data or not (as opposed to metadata only copy up).
      
      If a file is copied up metadata only and later when same file is opened for
      WRITE, then data copy up takes place.  We copy up data, remove METACOPY
      xattr and then set the UPPERDATA flag in ovl_inode->flags.  While all these
      operations happen with oi->lock held, read side of oi->flags can be
      lockless.  That is another thread on another cpu can check if UPPERDATA
      flag is set or not.
      
      So this gives us an ordering requirement w.r.t UPPERDATA flag.  That is, if
      another cpu sees UPPERDATA flag set, then it should be guaranteed that
      effects of data copy up and remove xattr operations are also visible.
      
      For example.
      
      	CPU1				CPU2
      ovl_open()				acquire(oi->lock)
       ovl_open_maybe_copy_up()                ovl_copy_up_data()
        open_open_need_copy_up()		 vfs_removexattr()
         ovl_already_copied_up()
          ovl_dentry_needs_data_copy_up()	 ovl_set_flag(OVL_UPPERDATA)
           ovl_test_flag(OVL_UPPERDATA)       release(oi->lock)
      
      Say CPU2 is copying up data and in the end sets UPPERDATA flag.  But if
      CPU1 perceives the effects of setting UPPERDATA flag but not the effects of
      preceding operations (ex. upper that is not fully copied up), it will be a
      problem.
      
      Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
      and smp_rmb() on UPPERDATA flag test operation.
      
      May be some other lock or barrier is already covering it. But I am not sure
      what that is and is it obvious enough that we will not break it in future.
      
      So hence trying to be safe here and introducing barriers explicitly for
      UPPERDATA flag/bit.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0c288874
    • V
      ovl: Provide a mount option metacopy=on/off for metadata copyup · d5791044
      Vivek Goyal 提交于
      By default metadata only copy up is disabled.  Provide a mount option so
      that users can choose one way or other.
      
      Also provide a kernel config and module option to enable/disable metacopy
      feature.
      
      metacopy feature requires redirect_dir=on when upper is present.
      Otherwise, it requires redirect_dir=follow atleast.
      
      As of now, metacopy does not work with nfs_export=on.  So if both
      metacopy=on and nfs_export=on then nfs_export is disabled.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d5791044
  16. 18 7月, 2018 5 次提交