1. 24 1月, 2018 15 次提交
    • A
      ovl: cleanup temp index entries · 9ee60ce2
      Amir Goldstein 提交于
      A previous failed attempt to create or whiteout a directory index may
      leave index entries named '#%x' in the index dir. Cleanup those temp
      entries on mount instead of failing the mount.
      
      In the future, we may drop 'work' dir and use 'index' dir instead.
      This change is enough for cleaning up copy up leftovers 'from the future',
      but it is not enough for cleaning up rmdir leftovers 'from the future'
      (i.e. temp dir containing whiteouts).
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9ee60ce2
    • A
      ovl: verify directory index entries on mount · e8f9e5b7
      Amir Goldstein 提交于
      Directory index entries should have 'upper' xattr pointing to the real
      upper dir. Verifying that the upper dir file handle is not stale is
      expensive, so only verify stale directory index entries on mount if
      NFS export feature is enabled.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e8f9e5b7
    • A
      ovl: verify whiteout index entries on mount · 7db25d36
      Amir Goldstein 提交于
      Whiteout index entries are used as an indication that an exported
      overlay file handle should be treated as stale (i.e. after unlink
      of the overlay inode).
      
      Check on mount that whiteout index entries have a name that looks like
      a valid file handle and cleanup invalid index entries.
      
      For whiteout index entries, do not check that they also have valid
      origin fh and nlink xattr, because those xattr do not exist for a
      whiteout index entry.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7db25d36
    • A
      ovl: use directory index entries for consistency verification · ad1d615c
      Amir Goldstein 提交于
      A directory index is a directory type entry in index dir with a
      "trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge
      directory upper dir inode.
      
      On lookup of non-dir files, lower file is followed by origin file handle.
      On lookup of dir entries, lower dir is found by name and then compared
      to origin file handle. We only trust dir index if we verified that lower
      dir matches origin file handle, otherwise index may be inconsistent and
      we ignore it.
      
      If we find an indexed non-upper dir or an indexed merged dir, whose
      index 'upper' xattr points to a different upper dir, that means that the
      lower directory may be also referenced by another upper dir via redirect,
      so we fail the lookup on inconsistency error.
      
      To be consistent with directory index entries format, the association of
      index dir to upper root dir, that was stored by older kernels in
      "trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper"
      xattr. This also serves as an indication that overlay was mounted with a
      kernel that support index directory entries. For backward compatibility,
      if an 'origin' xattr exists on the index dir we also verify it on mount.
      
      Directory index entries are going to be used for NFS export.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      ad1d615c
    • A
      ovl: unbless lower st_ino of unverified origin · 86eaa130
      Amir Goldstein 提交于
      On a malformed overlay, several redirected dirs can point to the same
      dir on a lower layer. This presents a similar challenge as broken
      hardlinks, because different objects in the overlay can return the same
      st_ino/st_dev pair from stat(2).
      
      For broken hardlinks, we do not provide constant st_ino on copy up to
      avoid this inconsistency. When NFS export feature is enabled, apply
      the same logic to files and directories with unverified lower origin.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      86eaa130
    • A
      ovl: verify stored origin fh matches lower dir · 37b12916
      Amir Goldstein 提交于
      When the NFS export feature is enabled, overlayfs implicitly enables the
      feature "verify_lower". When the "verify_lower" feature is enabled, a
      directory inode found in lower layer by name or by redirect_dir is
      verified against the file handle of the copy up origin that is stored in
      the upper layer.
      
      This introduces a change of behavior for the case of lower layer
      modification while overlay is offline. A lower directory created or
      moved offline under an exisitng upper directory, will not be merged with
      that upper directory.
      
      The NFS export feature should not be used after copying layers, because
      the new lower directory inodes would fail verification and won't be
      merged with upper directories.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      37b12916
    • A
      ovl: add support for "nfs_export" configuration · f168f109
      Amir Goldstein 提交于
      Introduce the "nfs_export" config, module and mount options.
      
      The NFS export feature depends on the "index" feature and enables two
      implicit overlayfs features: "index_all" and "verify_lower".
      The "index_all" feature creates an index on copy up of every file and
      directory. The "verify_lower" feature uses the full index to detect
      overlay filesystems inconsistencies on lookup, like redirect from
      multiple upper dirs to the same lower dir.
      
      NFS export can be enabled for non-upper mount with no index. However,
      because lower layer redirects cannot be verified with the index, enabling
      NFS export support on an overlay with no upper layer requires turning off
      redirect follow (e.g. "redirect_dir=nofollow").
      
      The full index may incur some overhead on mount time, especially when
      verifying that lower directory file handles are not stale.
      
      NFS export support, full index and consistency verification will be
      implemented by following patches.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f168f109
    • A
      ovl: update documentation of inodes index feature · 60b86642
      Amir Goldstein 提交于
      Document that inode index feature solves breaking hard links on
      copy up.
      
      Simplify Kconfig backward compatibility disclaimer.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      60b86642
    • A
      ovl: generalize ovl_verify_origin() and helpers · 05122443
      Amir Goldstein 提交于
      Remove the "origin" language from the functions that handle set, get
      and verify of "origin" xattr and pass the xattr name as an argument.
      
      The same helpers are going to be used for NFS export to get, get and
      verify the "upper" xattr for directory index entries.
      
      ovl_verify_origin() is now a helper used only to verify non upper
      file handle stored in "origin" xattr of upper inode.
      
      The upper root dir file handle is still stored in "origin" xattr on
      the index dir for backward compatibility. This is going to be changed
      by the patch that adds directory index entries support.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      05122443
    • A
      ovl: simplify arguments to ovl_check_origin_fh() · 1eff1a1d
      Amir Goldstein 提交于
      Pass the fs instance with lower_layers array instead of the dentry
      lowerstack array to ovl_check_origin_fh(), because the dentry members
      of lowerstack play no role in this helper.
      
      This change simplifies the argument list of ovl_check_origin(),
      ovl_cleanup_index() and ovl_verify_index().
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1eff1a1d
    • A
      ovl: factor out ovl_check_origin_fh() · 2e1a5328
      Amir Goldstein 提交于
      Re-factor ovl_check_origin() and ovl_get_origin(), so origin fh xattr is
      read from upper inode only once during lookup with multiple lower layers
      and only once when verifying index entry origin.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2e1a5328
    • A
      ovl: store layer index in ovl_layer · d583ed7d
      Amir Goldstein 提交于
      Store the fs root layer index inside ovl_layer struct, so we can
      get the root fs layer index from merge dir lower layer instead of
      find it with ovl_find_layer() helper.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d583ed7d
    • A
      ovl: force r/o mount when index dir creation fails · 972d0093
      Amir Goldstein 提交于
      When work dir creation fails, a warning is emitted and overlay is
      mounted r/o. Trying to remount r/w will fail with no work dir.
      
      When index dir creation fails, the same warning is emitted and overlay
      is mounted r/o, but trying to remount r/w will succeed. This may cause
      unintentional corruption of filesystem consistency.
      
      Adjust the behavior of index dir creation failure to that of work dir
      creation failure and do not allow to remount r/w. User needs to state
      an explicitly intention to work without an index by mounting with
      option 'index=off' to allow r/w mount with no index dir.
      
      When mounting with option 'index=on' and no 'upperdir', index is
      implicitly disabled, so do not warn about no file handle support.
      
      The issue was introduced with inodes index feature in v4.13, but this
      patch will not apply cleanly before ovl_fill_super() re-factoring in
      v4.15.
      
      Fixes: 02bcd157 ("ovl: introduce the inodes index dir feature")
      Cc: <stable@vger.kernel.org> #v4.13
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      972d0093
    • A
      ovl: disable index when no xattr support · a683737b
      Amir Goldstein 提交于
      Overlayfs falls back to index=off if lower/upper fs does not support
      file handles. Do the same if upper fs does not support xattr.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a683737b
    • A
      ovl: fix inconsistent d_ino for legacy merge dir · 9678e630
      Amir Goldstein 提交于
      For a merge dir that was copied up before v4.12 or that was hand crafted
      offline (e.g. mkdir {upper/lower}/dir), upper dir does not contain the
      'trusted.overlay.origin' xattr.  In that case, stat(2) on the merge dir
      returns the lower dir st_ino, but getdents(2) returns the upper dir d_ino.
      
      After this change, on merge dir lookup, missing origin xattr on upper
      dir will be fixed and 'impure' xattr will be fixed on parent of the legacy
      merge dir.
      Suggested-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9678e630
  2. 20 1月, 2018 4 次提交
    • A
      ovl: take mnt_want_write() for removing impure xattr · a5a927a7
      Amir Goldstein 提交于
      The optimization in ovl_cache_get_impure() that tries to remove an
      unneeded "impure" xattr needs to take mnt_want_write() on upper fs.
      
      Fixes: 4edb83bb ("ovl: constant d_ino for non-merge dirs")
      Cc: <stable@vger.kernel.org> #v4.14
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a5a927a7
    • A
      ovl: take mnt_want_write() for work/index dir setup · 2ba9d57e
      Amir Goldstein 提交于
      There are several write operations on upper fs not covered by
      mnt_want_write():
      
      - test set/remove OPAQUE xattr
      - test create O_TMPFILE
      - set ORIGIN xattr in ovl_verify_origin()
      - cleanup of index entries in ovl_indexdir_cleanup()
      
      Some of these go way back, but this patch only applies over the
      v4.14 re-factoring of ovl_fill_super().
      
      Cc: <stable@vger.kernel.org> #v4.14
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2ba9d57e
    • A
      ovl: fix another overlay: warning prefix · f8167817
      Amir Goldstein 提交于
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f8167817
    • A
      ovl: take lower dir inode mutex outside upper sb_writers lock · 6d0a8a90
      Amir Goldstein 提交于
      The functions ovl_lower_positive() and ovl_check_empty_dir() both take
      inode mutex on the real lower dir under ovl_want_write() which takes
      the upper_mnt sb_writers lock.
      
      While this is not a clear locking order or layering violation, it creates
      an undesired lock dependency between two unrelated layers for no good
      reason.
      
      This lock dependency materializes to a false(?) positive lockdep warning
      when calling rmdir() on a nested overlayfs, where both nested and
      underlying overlayfs both use the same fs type as upper layer.
      
      rmdir() on the nested overlayfs creates the lock chain:
        sb_writers of upper_mnt (e.g. tmpfs) in ovl_do_remove()
        ovl_i_mutex_dir_key[] of lower overlay dir in ovl_lower_positive()
      
      rmdir() on the underlying overlayfs creates the lock chain in
      reverse order:
        ovl_i_mutex_dir_key[] of lower overlay dir in vfs_rmdir()
        sb_writers of nested upper_mnt (e.g. tmpfs) in ovl_do_remove()
      
      To rid of the unneeded locking dependency, move both ovl_lower_positive()
      and ovl_check_empty_dir() to before ovl_want_write() in rmdir() and
      rename() implementation.
      
      This change spreads the pieces of ovl_check_empty_and_clear() directly
      inside the rmdir()/rename() implementations so the helper is no longer
      needed and removed.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6d0a8a90
  3. 19 1月, 2018 2 次提交
    • A
      ovl: fix failure to fsync lower dir · d796e77f
      Amir Goldstein 提交于
      As a writable mount, it is not expected for overlayfs to return
      EINVAL/EROFS for fsync, even if dir/file is not changed.
      
      This commit fixes the case of fsync of directory, which is easier to
      address, because overlayfs already implements fsync file operation for
      directories.
      
      The problem reported by Raphael is that new PostgreSQL 10.0 with a
      database in overlayfs where lower layer in squashfs fails to start.
      The failure is due to fsync error, when PostgreSQL does fsync on all
      existing db directories on startup and a specific directory exists
      lower layer with no changes.
      Reported-by: NRaphael Hertzog <raphael@ouaza.com>
      Cc: <stable@vger.kernel.org> # v3.18
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Tested-by: NRaphaël Hertzog <hertzog@debian.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d796e77f
    • A
      ovl: hash directory inodes for fsnotify · 31747eda
      Amir Goldstein 提交于
      fsnotify pins a watched directory inode in cache, but if directory dentry
      is released, new lookup will allocate a new dentry and a new inode.
      Directory events will be notified on the new inode, while fsnotify listener
      is watching the old pinned inode.
      
      Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
      dirs are hashes by real upper inode, merge and lower dirs are hashed by
      real lower inode.
      
      The reference to lower inode was being held by the lower dentry object
      in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
      may drop lower inode refcount to zero. Add a refcount on behalf of the
      overlay inode to prevent that.
      
      As a by-product, hashing directory inodes also detects multiple
      redirected dirs to the same lower dir and uncovered redirected dir
      target on and returns -ESTALE on lookup.
      
      The reported issue dates back to initial version of overlayfs, but this
      patch depends on ovl_inode code that was introduced in kernel v4.13.
      
      Cc: <stable@vger.kernel.org> #v4.13
      Reported-by: NNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: NNiklas Cassel <niklas.cassel@axis.com>
      31747eda
  4. 15 1月, 2018 6 次提交
    • L
      Linux 4.15-rc8 · a8750ddc
      Linus Torvalds 提交于
      a8750ddc
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aaae98a8
      Linus Torvalds 提交于
      Pull x86 fixlet from Thomas Gleixner.
      
      Remove a warning about lack of compiler support for retpoline that most
      people can't do anything about, so it just annoys them needlessly.
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/retpoline: Remove compile time warning
      aaae98a8
    • L
      Merge tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 6bb82119
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "One fix for an oops at boot if we take a hotplug interrupt before we
        are ready to handle it.
      
        The bulk is patches to implement mitigation for Meltdown, see the
        change logs for more details.
      
        Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon
        Masters, Jose Ricardo Ziviani, David Gibson"
      
      * tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Check device-tree for RFI flush settings
        powerpc/pseries: Query hypervisor for RFI flush settings
        powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
        powerpc/64s: Add support for RFI flush of L1-D cache
        powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
        powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
        powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
        powerpc/64s: Simple RFI macro conversions
        powerpc/64: Add macros for annotating the destination of rfid/hrfid
        powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
        powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
      6bb82119
    • T
      x86/retpoline: Remove compile time warning · b8b9ce4b
      Thomas Gleixner 提交于
      Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
      does not have retpoline support. Linus rationale for this is:
      
        It's wrong because it will just make people turn off RETPOLINE, and the
        asm updates - and return stack clearing - that are independent of the
        compiler are likely the most important parts because they are likely the
        ones easiest to target.
      
        And it's annoying because most people won't be able to do anything about
        it. The number of people building their own compiler? Very small. So if
        their distro hasn't got a compiler yet (and pretty much nobody does), the
        warning is just annoying crap.
      
        It is already properly reported as part of the sysfs interface. The
        compile-time warning only encourages bad things.
      
      Fixes: 76b04384 ("x86/retpoline: Add initial retpoline support")
      Requested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
      b8b9ce4b
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 9443c168
      Linus Torvalds 提交于
      Pull NVMe fix from Jens Axboe:
       "Just a single fix for nvme over fabrics that should go into 4.15"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvme-fabrics: initialize default host->id in nvmf_host_default()
      9443c168
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 40548c6b
      Linus Torvalds 提交于
      Pull x86 pti updates from Thomas Gleixner:
       "This contains:
      
         - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
           disabled. This seems to cause issues on a virtual machine at least
           and is incorrect according to the AMD manual.
      
         - a PTI bugfix which disables the perf BTS facility if PTI is
           enabled. The BTS AUX buffer is not globally visible and causes the
           CPU to fault when the mapping disappears on switching CR3 to user
           space. A full fix which restores BTS on PTI is non trivial and will
           be worked on.
      
         - PTI bugfixes for EFI and trusted boot which make sure that the user
           space visible page table entries have the NX bit cleared
      
         - removal of dead code in the PTI pagetable setup functions
      
         - add PTI documentation
      
         - add a selftest for vsyscall to verify that the kernel actually
           implements what it advertises.
      
         - a sysfs interface to expose vulnerability and mitigation
           information so there is a coherent way for users to retrieve the
           status.
      
         - the initial spectre_v2 mitigations, aka retpoline:
      
            + The necessary ASM thunk and compiler support
      
            + The ASM variants of retpoline and the conversion of affected ASM
              code
      
            + Make LFENCE serializing on AMD so it can be used as speculation
              trap
      
            + The RSB fill after vmexit
      
         - initial objtool support for retpoline
      
        As I said in the status mail this is the most of the set of patches
        which should go into 4.15 except two straight forward patches still on
        hold:
      
         - the retpoline add on of LFENCE which waits for ACKs
      
         - the RSB fill after context switch
      
        Both should be ready to go early next week and with that we'll have
        covered the major holes of spectre_v2 and go back to normality"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
        x86,perf: Disable intel_bts when PTI
        security/Kconfig: Correct the Documentation reference for PTI
        x86/pti: Fix !PCID and sanitize defines
        selftests/x86: Add test_vsyscall
        x86/retpoline: Fill return stack buffer on vmexit
        x86/retpoline/irq32: Convert assembler indirect jumps
        x86/retpoline/checksum32: Convert assembler indirect jumps
        x86/retpoline/xen: Convert Xen hypercall indirect jumps
        x86/retpoline/hyperv: Convert assembler indirect jumps
        x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
        x86/retpoline/entry: Convert entry assembler indirect jumps
        x86/retpoline/crypto: Convert crypto assembler indirect jumps
        x86/spectre: Add boot time option to select Spectre v2 mitigation
        x86/retpoline: Add initial retpoline support
        objtool: Allow alternatives to be ignored
        objtool: Detect jumps to retpoline thunks
        x86/pti: Make unpoison of pgd for trusted boot work for real
        x86/alternatives: Fix optimize_nops() checking
        sysfs/cpu: Fix typos in vulnerability documentation
        x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
        ...
      40548c6b
  5. 14 1月, 2018 13 次提交