1. 24 1月, 2021 7 次提交
    • C
      overlayfs: do not mount on top of idmapped mounts · 029a52ad
      Christian Brauner 提交于
      Prevent overlayfs from being mounted on top of idmapped mounts.
      Stacking filesystems need to be prevented from being mounted on top of
      idmapped mounts until they have have been converted to handle this.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-29-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      029a52ad
    • C
      fs: make helpers idmap mount aware · 549c7297
      Christian Brauner 提交于
      Extend some inode methods with an additional user namespace argument. A
      filesystem that is aware of idmapped mounts will receive the user
      namespace the mount has been marked with. This can be used for
      additional permission checking and also to enable filesystems to
      translate between uids and gids if they need to. We have implemented all
      relevant helpers in earlier patches.
      
      As requested we simply extend the exisiting inode method instead of
      introducing new ones. This is a little more code churn but it's mostly
      mechanical and doesnt't leave us with additional inode methods.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      549c7297
    • T
      xattr: handle idmapped mounts · c7c7a1a1
      Tycho Andersen 提交于
      When interacting with extended attributes the vfs verifies that the
      caller is privileged over the inode with which the extended attribute is
      associated. For posix access and posix default extended attributes a uid
      or gid can be stored on-disk. Let the functions handle posix extended
      attributes on idmapped mounts. If the inode is accessed through an
      idmapped mount we need to map it according to the mount's user
      namespace. Afterwards the checks are identical to non-idmapped mounts.
      This has no effect for e.g. security xattrs since they don't store uids
      or gids and don't perform permission checks on them like posix acls do.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: NTycho Andersen <tycho@tycho.pizza>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      c7c7a1a1
    • C
      acl: handle idmapped mounts · e65ce2a5
      Christian Brauner 提交于
      The posix acl permission checking helpers determine whether a caller is
      privileged over an inode according to the acls associated with the
      inode. Add helpers that make it possible to handle acls on idmapped
      mounts.
      
      The vfs and the filesystems targeted by this first iteration make use of
      posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
      translate basic posix access and default permissions such as the
      ACL_USER and ACL_GROUP type according to the initial user namespace (or
      the superblock's user namespace) to and from the caller's current user
      namespace. Adapt these two helpers to handle idmapped mounts whereby we
      either map from or into the mount's user namespace depending on in which
      direction we're translating.
      Similarly, cap_convert_nscap() is used by the vfs to translate user
      namespace and non-user namespace aware filesystem capabilities from the
      superblock's user namespace to the caller's user namespace. Enable it to
      handle idmapped mounts by accounting for the mount's user namespace.
      
      In addition the fileystems targeted in the first iteration of this patch
      series make use of the posix_acl_chmod() and, posix_acl_update_mode()
      helpers. Both helpers perform permission checks on the target inode. Let
      them handle idmapped mounts. These two helpers are called when posix
      acls are set by the respective filesystems to handle this case we extend
      the ->set() method to take an additional user namespace argument to pass
      the mount's user namespace down.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      e65ce2a5
    • C
      attr: handle idmapped mounts · 2f221d6f
      Christian Brauner 提交于
      When file attributes are changed most filesystems rely on the
      setattr_prepare(), setattr_copy(), and notify_change() helpers for
      initialization and permission checking. Let them handle idmapped mounts.
      If the inode is accessed through an idmapped mount map it into the
      mount's user namespace. Afterwards the checks are identical to
      non-idmapped mounts. If the initial user namespace is passed nothing
      changes so non-idmapped mounts will see identical behavior as before.
      
      Helpers that perform checks on the ia_uid and ia_gid fields in struct
      iattr assume that ia_uid and ia_gid are intended values and have already
      been mapped correctly at the userspace-kernelspace boundary as we
      already do today. If the initial user namespace is passed nothing
      changes so non-idmapped mounts will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      2f221d6f
    • C
      inode: make init and permission helpers idmapped mount aware · 21cb47be
      Christian Brauner 提交于
      The inode_owner_or_capable() helper determines whether the caller is the
      owner of the inode or is capable with respect to that inode. Allow it to
      handle idmapped mounts. If the inode is accessed through an idmapped
      mount it according to the mount's user namespace. Afterwards the checks
      are identical to non-idmapped mounts. If the initial user namespace is
      passed nothing changes so non-idmapped mounts will see identical
      behavior as before.
      
      Similarly, allow the inode_init_owner() helper to handle idmapped
      mounts. It initializes a new inode on idmapped mounts by mapping the
      fsuid and fsgid of the caller from the mount's user namespace. If the
      initial user namespace is passed nothing changes so non-idmapped mounts
      will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      21cb47be
    • C
      capability: handle idmapped mounts · 0558c1bf
      Christian Brauner 提交于
      In order to determine whether a caller holds privilege over a given
      inode the capability framework exposes the two helpers
      privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
      verifies that the inode has a mapping in the caller's user namespace and
      the latter additionally verifies that the caller has the requested
      capability in their current user namespace.
      If the inode is accessed through an idmapped mount map it into the
      mount's user namespace. Afterwards the checks are identical to
      non-idmapped inodes. If the initial user namespace is passed all
      operations are a nop so non-idmapped mounts will not see a change in
      behavior.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      0558c1bf
  2. 14 12月, 2020 2 次提交
    • M
      ovl: unprivieged mounts · 459c7c56
      Miklos Szeredi 提交于
      Enable unprivileged user namespace mounts of overlayfs.  Overlayfs's
      permission model (*) ensures that the mounter itself cannot gain additional
      privileges by the act of creating an overlayfs mount.
      
      This feature request is coming from the "rootless" container crowd.
      
      (*) Documentation/filesystems/overlayfs.txt#Permission model
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      459c7c56
    • M
      ovl: user xattr · 2d2f2d73
      Miklos Szeredi 提交于
      Optionally allow using "user.overlay." namespace instead of
      "trusted.overlay."
      
      This is necessary for overlayfs to be able to be mounted in an unprivileged
      namepsace.
      
      Make the option explicit, since it makes the filesystem format be
      incompatible.
      
      Disable redirect_dir and metacopy options, because these would allow
      privilege escalation through direct manipulation of the
      "user.overlay.redirect" or "user.overlay.metacopy" xattrs.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      2d2f2d73
  3. 12 11月, 2020 2 次提交
    • M
      ovl: expand warning in ovl_d_real() · cef4cbff
      Miklos Szeredi 提交于
      There was a syzbot report with this warning but insufficient information...
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      cef4cbff
    • P
      ovl: introduce new "uuid=off" option for inodes index feature · 5830fb6b
      Pavel Tikhomirov 提交于
      This replaces uuid with null in overlayfs file handles and thus relaxes
      uuid checks for overlay index feature. It is only possible in case there is
      only one filesystem for all the work/upper/lower directories and bare file
      handles from this backing filesystem are unique. In other case when we have
      multiple filesystems lets just fallback to "uuid=on" which is and
      equivalent of how it worked before with all uuid checks.
      
      This is needed when overlayfs is/was mounted in a container with index
      enabled (e.g.: to be able to resolve inotify watch file handles on it to
      paths in CRIU), and this container is copied and started alongside with the
      original one. This way the "copy" container can't have the same uuid on the
      superblock and mounting the overlayfs from it later would fail.
      
      That is an example of the problem on top of loop+ext4:
      
      dd if=/dev/zero of=loopbackfile.img bs=100M count=10
      losetup -fP loopbackfile.img
      losetup -a
        #/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img)
      mkfs.ext4 loopbackfile.img
      mkdir loop-mp
      mount -o loop /dev/loop0 loop-mp
      mkdir loop-mp/{lower,upper,work,merged}
      mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
      upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
      umount loop-mp/merged
      umount loop-mp
      e2fsck -f /dev/loop0
      tune2fs -U random /dev/loop0
      
      mount -o loop /dev/loop0 loop-mp
      mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
      upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
        #mount: /loop-test/loop-mp/merged:
        #mount(2) system call failed: Stale file handle.
      
      If you just change the uuid of the backing filesystem, overlay is not
      mounting any more. In Virtuozzo we copy container disks (ploops) when
      create the copy of container and we require fs uuid to be unique for a new
      container.
      Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5830fb6b
  4. 02 9月, 2020 5 次提交
    • M
      ovl: pass ovl_fs down to functions accessing private xattrs · 610afc0b
      Miklos Szeredi 提交于
      This paves the way for optionally using the "user.overlay." xattr
      namespace.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      610afc0b
    • M
      ovl: drop flags argument from ovl_do_setxattr() · 26150ab5
      Miklos Szeredi 提交于
      All callers pass zero flags to ovl_do_setxattr().  So drop this argument.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      26150ab5
    • M
      ovl: adhere to the vfs_ vs. ovl_do_ conventions for xattrs · 71097047
      Miklos Szeredi 提交于
      Call ovl_do_*xattr() when accessing an overlay private xattr, vfs_*xattr()
      otherwise.
      
      This has an effect on debug output, which is made more consistent by this
      patch.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      71097047
    • V
      ovl: provide a mount option "volatile" · c86243b0
      Vivek Goyal 提交于
      Container folks are complaining that dnf/yum issues too many sync while
      installing packages and this slows down the image build. Build requirement
      is such that they don't care if a node goes down while build was still
      going on. In that case, they will simply throw away unfinished layer and
      start new build. So they don't care about syncing intermediate state to the
      disk and hence don't want to pay the price associated with sync.
      
      So they are asking for mount options where they can disable sync on overlay
      mount point.
      
      They primarily seem to have two use cases.
      
      - For building images, they will mount overlay with nosync and then sync
        upper layer after unmounting overlay and reuse upper as lower for next
        layer.
      
      - For running containers, they don't seem to care about syncing upper layer
        because if node goes down, they will simply throw away upper layer and
        create a fresh one.
      
      So this patch provides a mount option "volatile" which disables all forms
      of sync. Now it is caller's responsibility to throw away upper if system
      crashes or shuts down and start fresh.
      
      With "volatile", I am seeing roughly 20% speed up in my VM where I am just
      installing emacs in an image. Installation time drops from 31 seconds to 25
      seconds when nosync option is used. This is for the case of building on top
      of an image where all packages are already cached. That way I take out the
      network operations latency out of the measurement.
      
      Giuseppe is also looking to cut down on number of iops done on the disk. He
      is complaining that often in cloud their VMs are throttled if they cross
      the limit. This option can help them where they reduce number of iops (by
      cutting down on frequent sync and writebacks).
      Signed-off-by: NGiuseppe Scrivano <gscrivan@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c86243b0
    • A
      ovl: check for incompatible features in work dir · 235ce9ed
      Amir Goldstein 提交于
      An incompatible feature is marked by a non-empty directory nested
      2 levels deep under "work" dir, e.g.:
      workdir/work/incompat/volatile.
      
      This commit checks for marked incompat features, warns about them
      and fails to mount the overlay, for example:
        overlayfs: overlay with incompat feature 'volatile' cannot be mounted
      
      Very old kernels (i.e. v3.18) will fail to remove a non-empty "work"
      dir and fail the mount.  Newer kernels will fail to remove a "work"
      dir with entries nested 3 levels and fall back to read-only mount.
      
      User mounting with old kernel will see a warning like these in dmesg:
        overlayfs: cleanup of 'incompat/...' failed (-39)
        overlayfs: cleanup of 'work/incompat' failed (-39)
        overlayfs: cleanup of 'ovl-work/work' failed (-39)
        overlayfs: failed to create directory /vdf/ovl-work/work (errno: 17);
                   mounting read-only
      
      These warnings should give the hint to the user that:
      1. mount failure is caused by backward incompatible features
      2. mount failure can be resolved by manually removing the "work" directory
      
      There is nothing preventing users on old kernels from manually removing
      workdir entirely or mounting overlay with a new workdir, so this is in
      no way a full proof backward compatibility enforcement, but only a best
      effort.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      235ce9ed
  5. 16 7月, 2020 5 次提交
    • A
      ovl: fix mount option checks for nfs_export with no upperdir · f0e1266e
      Amir Goldstein 提交于
      Without upperdir mount option, there is no index dir and the dependency
      checks nfs_export => index for mount options parsing are incorrect.
      
      Allow the combination nfs_export=on,index=off with no upperdir and move
      the check for dependency redirect_dir=nofollow for non-upper mount case
      to mount options parsing.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f0e1266e
    • A
      ovl: force read-only sb on failure to create index dir · 470c1563
      Amir Goldstein 提交于
      With index feature enabled, on failure to create index dir, overlay is
      being mounted read-only.  However, we do not forbid user to remount overlay
      read-write.  Fix that by setting ofs->workdir to NULL, which prevents
      remount read-write.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      470c1563
    • A
      ovl: fix regression with re-formatted lower squashfs · a888db31
      Amir Goldstein 提交于
      Commit 9df085f3 ("ovl: relax requirement for non null uuid of lower
      fs") relaxed the requirement for non null uuid with single lower layer to
      allow enabling index and nfs_export features with single lower squashfs.
      
      Fabian reported a regression in a setup when overlay re-uses an existing
      upper layer and re-formats the lower squashfs image.  Because squashfs
      has no uuid, the origin xattr in upper layer are decoded from the new
      lower layer where they may resolve to a wrong origin file and user may
      get an ESTALE or EIO error on lookup.
      
      To avoid the reported regression while still allowing the new features
      with single lower squashfs, do not allow decoding origin with lower null
      uuid unless user opted-in to one of the new features that require
      following the lower inode of non-dir upper (index, xino, metacopy).
      Reported-by: NFabian <godi.beat@gmx.net>
      Link: https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/
      Fixes: 9df085f3 ("ovl: relax requirement for non null uuid of lower fs")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a888db31
    • A
      ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on · 20396365
      Amir Goldstein 提交于
      Mounting with nfs_export=on, xfstests overlay/031 triggers a kernel panic
      since v5.8-rc1 overlayfs updates.
      
       overlayfs: orphan index entry (index/00fb1..., ftype=4000, nlink=2)
       BUG: kernel NULL pointer dereference, address: 0000000000000030
       RIP: 0010:ovl_cleanup_and_whiteout+0x28/0x220 [overlay]
      
      Bisect point at commit c21c839b ("ovl: whiteout inode sharing")
      
      Minimal reproducer:
      --------------------------------------------------
      rm -rf l u w m
      mkdir -p l u w m
      mkdir -p l/testdir
      touch l/testdir/testfile
      mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
      echo 1 > m/testdir/testfile
      umount m
      rm -rf u/testdir
      mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
      umount m
      --------------------------------------------------
      
      When mount with nfs_export=on, and fail to verify an orphan index, we're
      cleaning this index from indexdir by calling ovl_cleanup_and_whiteout().
      This dereferences ofs->workdir, that was earlier set to NULL.
      
      The design was that ovl->workdir will point at ovl->indexdir, but we are
      assigning ofs->indexdir to ofs->workdir only after ovl_indexdir_cleanup().
      There is no reason not to do it sooner, because once we get success from
      ofs->indexdir = ovl_workdir_create(... there is no turning back.
      Reported-and-tested-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Fixes: c21c839b ("ovl: whiteout inode sharing")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      20396365
    • Y
      ovl: inode reference leak in ovl_is_inuse true case. · 24f14009
      youngjun 提交于
      When "ovl_is_inuse" true case, trap inode reference not put.  plus adding
      the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.
      
      Fixes: 0be0bfd2 ("ovl: fix regression caused by overlapping layers detection")
      Cc: <stable@vger.kernel.org> # v4.19+
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      24f14009
  6. 08 6月, 2020 1 次提交
  7. 04 6月, 2020 3 次提交
    • M
      ovl: make private mounts longterm · df820f8d
      Miklos Szeredi 提交于
      Overlayfs is using clone_private_mount() to create internal mounts for
      underlying layers.  These are used for operations requiring a path, such as
      dentry_open().
      
      Since these private mounts are not in any namespace they are treated as
      short term, "detached" mounts and mntput() involves taking the global
      mount_lock, which can result in serious cacheline pingpong.
      
      Make these private mounts longterm instead, which trade the penalty on
      mntput() for a slightly longer shutdown time due to an added RCU grace
      period when putting these mounts.
      
      Introduce a new helper kern_unmount_many() that can take care of multiple
      longterm mounts with a single RCU grace period.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      df820f8d
    • M
      ovl: get rid of redundant members in struct ovl_fs · b8e42a65
      Miklos Szeredi 提交于
      ofs->upper_mnt is copied to ->layers[0].mnt and ->layers[0].trap could be
      used instead of a separate ->upperdir_trap.
      
      Split the lowerdir option early to get the number of layers, then allocate
      the ->layers array, and finally fill the upper and lower layers, as before.
      
      Get rid of path_put_init() in ovl_lower_dir(), since the only caller will
      take care of that.
      
      [Colin Ian King] Fix null pointer dereference on null stack pointer on
      error return found by Coverity.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b8e42a65
    • M
      ovl: add accessor for ofs->upper_mnt · 08f4c7c8
      Miklos Szeredi 提交于
      Next patch will remove ofs->upper_mnt, so add an accessor function for this
      field.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      08f4c7c8
  8. 13 5月, 2020 6 次提交
  9. 27 3月, 2020 3 次提交
    • A
      ovl: enable xino automatically in more cases · 926e94d7
      Amir Goldstein 提交于
      So far, with xino=auto, we only enable xino if we know that all
      underlying filesystem use 32bit inode numbers.
      
      When users configure overlay with xino=auto, they already declare that
      they are ready to handle 64bit inode number from overlay.
      
      It is a very common case, that underlying filesystem uses 64bit ino,
      but rarely or never uses the high inode number bits (e.g. tmpfs, xfs).
      Leaving it for the users to declare high ino bits are unused with
      xino=on is not a recipe for many users to enjoy the benefits of xino.
      
      There appears to be very little reason not to enable xino when users
      declare xino=auto even if we do not know how many bits underlying
      filesystem uses for inode numbers.
      
      In the worst case of xino bits overflow by real inode number, we
      already fall back to the non-xino behavior - real inode number with
      unique pseudo dev or to non persistent inode number and overlay st_dev
      (for directories).
      
      The only annoyance from auto enabling xino is that xino bits overflow
      emits a warning to kmsg. Suppress those warnings unless users explicitly
      asked for xino=on, suggesting that they expected high ino bits to be
      unused by underlying filesystem.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      926e94d7
    • A
      ovl: avoid possible inode number collisions with xino=on · dfe51d47
      Amir Goldstein 提交于
      When xino feature is enabled and a real directory inode number overflows
      the lower xino bits, we cannot map this directory inode number to a unique
      and persistent inode number and we fall back to the real inode st_ino and
      overlay st_dev.
      
      The real inode st_ino with high bits may collide with a lower inode number
      on overlay st_dev that was mapped using xino.
      
      To avoid possible collision with legitimate xino values, map a non
      persistent inode number to a dedicated range in the xino address space.
      The dedicated range is created by adding one more bit to the number of
      reserved high xino bits.  We could have added just one more fsid, but that
      would have had the undesired effect of changing persistent overlay inode
      numbers on kernel or require more complex xino mapping code.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      dfe51d47
    • A
      ovl: use a private non-persistent ino pool · 4d314f78
      Amir Goldstein 提交于
      There is no reason to deplete the system's global get_next_ino() pool for
      overlay non-persistent inode numbers and there is no reason at all to
      allocate non-persistent inode numbers for non-directories.
      
      For non-directories, it is much better to leave i_ino the same as real
      i_ino, to be consistent with st_ino/d_ino.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4d314f78
  10. 17 3月, 2020 6 次提交