1. 02 9月, 2018 3 次提交
  2. 01 9月, 2018 1 次提交
  3. 31 8月, 2018 4 次提交
  4. 30 8月, 2018 5 次提交
  5. 27 8月, 2018 3 次提交
  6. 24 8月, 2018 12 次提交
  7. 23 8月, 2018 12 次提交
    • M
      powerpc/mce: Fix SLB rebolting during MCE recovery path. · 0f52b3a0
      Mahesh Salgaonkar 提交于
      The commit e7e81847 ("powerpc/64s: move machine check SLB flushing
      to mm/slb.c") introduced a bug in reloading bolted SLB entries. Unused
      bolted entries are stored with .esid=0 in the slb_shadow area, and
      that value is now used directly as the RB input to slbmte, which means
      the RB[52:63] index field is set to 0, which causes SLB entry 0 to be
      cleared.
      
      Fix this by storing the index bits in the unused bolted entries, which
      directs the slbmte to the right place.
      
      The SLB shadow area is also used by the hypervisor, but PAPR is okay
      with that, from LoPAPR v1.1, 14.11.1.3 SLB Shadow Buffer:
      
        Note: SLB is filled sequentially starting at index 0
        from the shadow buffer ignoring the contents of
        RB field bits 52-63
      
      Fixes: e7e81847 ("powerpc/64s: move machine check SLB flushing to mm/slb.c")
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0f52b3a0
    • P
      KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages · 8cfbdbdc
      Paul Mackerras 提交于
      Commit 76fa4975 ("KVM: PPC: Check if IOMMU page is contained in
      the pinned physical page", 2018-07-17) added some checks to ensure
      that guest DMA mappings don't attempt to map more than the guest is
      entitled to access. However, errors in the logic mean that legitimate
      guest requests to map pages for DMA are being denied in some
      situations. Specifically, if the first page of the range passed to
      mm_iommu_get() is mapped with a normal page, and subsequent pages are
      mapped with transparent huge pages, we end up with mem->pageshift ==
      0. That means that the page size checks in mm_iommu_ua_to_hpa() and
      mm_iommu_up_to_hpa_rm() will always fail for every page in that
      region, and thus the guest can never map any memory in that region for
      DMA, typically leading to a flood of error messages like this:
      
        qemu-system-ppc64: VFIO_MAP_DMA: -22
        qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)
      
      The logic errors in mm_iommu_get() are:
      
        (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
            call (meaning that find_linux_pte() returns the pte for the
            first address in the range, not the address we are currently up
            to);
        (b) use of 'pageshift' as the variable to receive the hugepage shift
            returned by find_linux_pte() - for a normal page this gets set
            to 0, leading to us setting mem->pageshift to 0 when we conclude
            that the pte returned by find_linux_pte() didn't match the page
            we were looking at;
        (c) comparing 'compshift', which is a page order, i.e. log base 2 of
            the number of pages, with 'pageshift', which is a log base 2 of
            the number of bytes.
      
      To fix these problems, this patch introduces 'cur_ua' to hold the
      current user address and uses that in the find_linux_pte() call;
      introduces 'pteshift' to hold the hugepage shift found by
      find_linux_pte(); and compares 'pteshift' with 'compshift +
      PAGE_SHIFT' rather than 'compshift'.
      
      The patch also moves the local_irq_restore to the point after the PTE
      pointer returned by find_linux_pte() has been dereferenced because
      otherwise the PTE could change underneath us, and adds a check to
      avoid doing the find_linux_pte() call once mem->pageshift has been
      reduced to PAGE_SHIFT, as an optimization.
      
      Fixes: 76fa4975 ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8cfbdbdc
    • A
      powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition · f08d08f3
      Aneesh Kumar K.V 提交于
      The Nest MMU workaround is only needed for RW upgrades. Avoid doing
      that for other PTE updates.
      
      We also avoid clearing the PTE while marking it invalid. This is
      because other page table walkers will find this PTE none and can
      result in unexpected behaviour due to that. Instead we clear
      _PAGE_PRESENT and set the software PTE bit _PAGE_INVALID.
      pte_present() is already updated to check for both bits. This makes
      sure page table walkers will find the PTE present and things like
      pte_pfn(pte) returns the right value.
      
      Based on an original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f08d08f3
    • A
      powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid. · bd0dbb73
      Aneesh Kumar K.V 提交于
      When splitting a huge pmd pte, we need to mark the pmd entry invalid. We
      can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a
      swap pte. In order to differentiate between the two use a software pte bit
      when invalidating.
      
      For regular pte, due to bd5050e3 ("powerpc/mm/radix: Change pte relax
      sequence to handle nest MMU hang") we need to mark the pte entry invalid when
      relaxing access permission. Instead of marking pte_none which can result in
      different page table walk routines possibly skipping this pte entry, invalidate
      it but still keep it marked present.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bd0dbb73
    • C
      powerpc/nohash: fix pte_access_permitted() · 810e9f86
      Christophe Leroy 提交于
      Commit 5769beaf ("powerpc/mm: Add proper pte access check helper
      for other platforms") replaced generic pte_access_permitted() by an
      arch specific one.
      
      The generic one is defined as
      (pte_present(pte) && (!(write) || pte_write(pte)))
      
      The arch specific one is open coded checking that _PAGE_USER and
      _PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
      _PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
      on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.
      
      Commit 5fa5b16b ("powerpc/mm/hugetlb: Use pte_access_permitted
      for hugetlb access check") replaced some tests performed with
      pte helpers by a call to pte_access_permitted(), leading to the same
      issue.
      
      This patch rewrites powerpc/nohash pte_access_permitted()
      using pte helpers.
      
      Fixes: 5769beaf ("powerpc/mm: Add proper pte access check helper for other platforms")
      Fixes: 5fa5b16b ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      810e9f86
    • P
      x86/mm/tlb: Revert the recent lazy TLB patches · 52a288c7
      Peter Zijlstra 提交于
      Revert commits:
      
        95b0e635 x86/mm/tlb: Always use lazy TLB mode
        64482aaf x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
        ac031589 x86/mm/tlb: Make lazy TLB mode lazier
        61d0beb5 x86/mm/tlb: Restructure switch_mm_irqs_off()
        2ff6ddf1 x86/mm/tlb: Leave lazy TLB mode at page table free time
      
      In order to simplify the TLB invalidate fixes for x86 and unify the
      parts that need backporting.  We'll try again later.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52a288c7
    • N
      include/linux/compiler*.h: make compiler-*.h mutually exclusive · 815f0ddb
      Nick Desaulniers 提交于
      Commit cafa0010 ("Raise the minimum required gcc version to 4.6")
      recently exposed a brittle part of the build for supporting non-gcc
      compilers.
      
      Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and
      __GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't
      added compiler specific checks for __clang__ or __INTEL_COMPILER.
      
      This is brittle, as they happened to get compatibility by posing as a
      certain version of GCC.  This broke when upgrading the minimal version
      of GCC required to build the kernel, to a version above what ICC and
      Clang claim to be.
      
      Rather than always including compiler-gcc.h then undefining or
      redefining macros in compiler-intel.h or compiler-clang.h, let's
      separate out the compiler specific macro definitions into mutually
      exclusive headers, do more proper compiler detection, and keep shared
      definitions in compiler_types.h.
      
      Fixes: cafa0010 ("Raise the minimum required gcc version to 4.6")
      Reported-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Suggested-by: NEli Friedman <efriedma@codeaurora.org>
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      815f0ddb
    • T
      ia64: Fix allnoconfig section mismatch for ioc_init/ioc_iommu_info · 2edd73a4
      Tony Luck 提交于
      This has been broken for an embarassingly long time (since v4.4).
      
      Just needs a couple of __init tags on functions to make the sections
      match up.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2edd73a4
    • A
      module: use relative references for __ksymtab entries · 7290d580
      Ard Biesheuvel 提交于
      An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries,
      each consisting of two 64-bit fields containing absolute references, to
      the symbol itself and to a char array containing its name, respectively.
      
      When we build the same configuration with KASLR enabled, we end up with an
      additional ~192 KB of relocations in the .init section, i.e., one 24 byte
      entry for each absolute reference, which all need to be processed at boot
      time.
      
      Given how the struct kernel_symbol that describes each entry is completely
      local to module.c (except for the references emitted by EXPORT_SYMBOL()
      itself), we can easily modify it to contain two 32-bit relative references
      instead.  This reduces the size of the __ksymtab section by 50% for all
      64-bit architectures, and gets rid of the runtime relocations entirely for
      architectures implementing KASLR, either via standard PIE linking (arm64)
      or using custom host tools (x86).
      
      Note that the binary search involving __ksymtab contents relies on each
      section being sorted by symbol name.  This is implemented based on the
      input section names, not the names in the ksymtab entries, so this patch
      does not interfere with that.
      
      Given that the use of place-relative relocations requires support both in
      the toolchain and in the module loader, we cannot enable this feature for
      all architectures.  So make it dependent on whether
      CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NJessica Yu <jeyu@kernel.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morris <james.morris@microsoft.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7290d580
    • A
      module: allow symbol exports to be disabled · f922c4ab
      Ard Biesheuvel 提交于
      To allow existing C code to be incorporated into the decompressor or the
      UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
      declarations into nops, and #define it in places where such exports are
      undesirable.  Note that this gets rid of a rather dodgy redefine of
      linux/export.h's header guard.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morris <james.morris@microsoft.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f922c4ab
    • A
      arch: enable relative relocations for arm64, power and x86 · 271ca788
      Ard Biesheuvel 提交于
      Patch series "add support for relative references in special sections", v10.
      
      This adds support for emitting special sections such as initcall arrays,
      PCI fixups and tracepoints as relative references rather than absolute
      references.  This reduces the size by 50% on 64-bit architectures, but
      more importantly, it removes the need for carrying relocation metadata for
      these sections in relocatable kernels (e.g., for KASLR) that needs to be
      fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
      such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
      4 byte relative reference)
      
      Patch #3 was sent out before as a single patch.  This series supersedes
      the previous submission.  This version makes relative ksymtab entries
      dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
      than trying to infer from kbuild test robot replies for which
      architectures it should be blacklisted.
      
      Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
      and sets it for the main architectures that are expected to benefit the
      most from this feature, i.e., 64-bit architectures or ones that use
      runtime relocations.
      
      Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
      ksymtab/kcrctab sections in decompressor and EFI stub objects when
      rebuilding existing C files to run in a different context.
      
      Patches #4 - #6 implement relative references for initcalls, PCI fixups
      and tracepoints, respectively, all of which produce sections with order
      ~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
      means we save about 28 KB of vmlinux space for each of these patches.
      
      [From the v7 series blurb, which included the jump_label patches as well]:
      
        For the arm64 kernel, all patches combined reduce the memory footprint
        of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
        KASLR enabled), of which ~1 MB is the size reduction of the RELA section
        in .init, and the remaining 300 KB is reduction of .text/.data.
      
      This patch (of 6):
      
      Before updating certain subsystems to use place relative 32-bit
      relocations in special sections, to save space and reduce the number of
      absolute relocations that need to be processed at runtime by relocatable
      kernels, introduce the Kconfig symbol and define it for some architectures
      that should be able to support and benefit from it.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
      Cc: James Morris <james.morris@microsoft.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      271ca788
    • A
      mm: zero out the vma in vma_init() · a670468f
      Andrew Morton 提交于
      Rather than in vm_area_alloc().  To ensure that the various oddball
      stack-based vmas are in a good state.  Some of the callers were zeroing
      them out, others were not.
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a670468f