1. 23 8月, 2018 3 次提交
    • A
      arch: enable relative relocations for arm64, power and x86 · 271ca788
      Ard Biesheuvel 提交于
      Patch series "add support for relative references in special sections", v10.
      
      This adds support for emitting special sections such as initcall arrays,
      PCI fixups and tracepoints as relative references rather than absolute
      references.  This reduces the size by 50% on 64-bit architectures, but
      more importantly, it removes the need for carrying relocation metadata for
      these sections in relocatable kernels (e.g., for KASLR) that needs to be
      fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
      such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
      4 byte relative reference)
      
      Patch #3 was sent out before as a single patch.  This series supersedes
      the previous submission.  This version makes relative ksymtab entries
      dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
      than trying to infer from kbuild test robot replies for which
      architectures it should be blacklisted.
      
      Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
      and sets it for the main architectures that are expected to benefit the
      most from this feature, i.e., 64-bit architectures or ones that use
      runtime relocations.
      
      Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
      ksymtab/kcrctab sections in decompressor and EFI stub objects when
      rebuilding existing C files to run in a different context.
      
      Patches #4 - #6 implement relative references for initcalls, PCI fixups
      and tracepoints, respectively, all of which produce sections with order
      ~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
      means we save about 28 KB of vmlinux space for each of these patches.
      
      [From the v7 series blurb, which included the jump_label patches as well]:
      
        For the arm64 kernel, all patches combined reduce the memory footprint
        of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
        KASLR enabled), of which ~1 MB is the size reduction of the RELA section
        in .init, and the remaining 300 KB is reduction of .text/.data.
      
      This patch (of 6):
      
      Before updating certain subsystems to use place relative 32-bit
      relocations in special sections, to save space and reduce the number of
      absolute relocations that need to be processed at runtime by relocatable
      kernels, introduce the Kconfig symbol and define it for some architectures
      that should be able to support and benefit from it.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
      Cc: James Morris <james.morris@microsoft.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      271ca788
    • A
      mm: zero out the vma in vma_init() · a670468f
      Andrew Morton 提交于
      Rather than in vm_area_alloc().  To ensure that the various oddball
      stack-based vmas are in a good state.  Some of the callers were zeroing
      them out, others were not.
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a670468f
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  2. 21 8月, 2018 1 次提交
  3. 18 8月, 2018 6 次提交
  4. 17 8月, 2018 3 次提交
    • G
      arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid() · 5ad356ea
      Greg Hackmann 提交于
      ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input
      before seeing if the PFN is valid.  This leads to false positives when
      some of the upper bits are set, but the lower bits match a valid PFN.
      
      For example, the following userspace code looks up a bogus entry in
      /proc/kpageflags:
      
          int pagemap = open("/proc/self/pagemap", O_RDONLY);
          int pageflags = open("/proc/kpageflags", O_RDONLY);
          uint64_t pfn, val;
      
          lseek64(pagemap, [...], SEEK_SET);
          read(pagemap, &pfn, sizeof(pfn));
          if (pfn & (1UL << 63)) {        /* valid PFN */
              pfn &= ((1UL << 55) - 1);   /* clear flag bits */
              pfn |= (1UL << 55);
              lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET);
              read(pageflags, &val, sizeof(val));
          }
      
      On ARM64 this causes the userspace process to crash with SIGSEGV rather
      than reading (1 << KPF_NOPAGE).  kpageflags_read() treats the offset as
      valid, and stable_page_flags() will try to access an address between the
      user and kernel address ranges.
      
      Fixes: c1cc1552 ("arm64: MMU initialisation")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Hackmann <ghackmann@google.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5ad356ea
    • W
      arm64: Avoid calling stop_machine() when patching jump labels · f6cc0c50
      Will Deacon 提交于
      Patching a jump label involves patching a single instruction at a time,
      swizzling between a branch and a NOP. The architecture treats these
      instructions specially, so a concurrently executing CPU is guaranteed to
      see either the NOP or the branch, rather than an amalgamation of the two
      instruction encodings.
      
      However, in order to guarantee that the new instruction is visible, it
      is necessary to send an IPI to the concurrently executing CPU so that it
      discards any previously fetched instructions from its pipeline. This
      operation therefore cannot be completed from a context with IRQs
      disabled, but this is exactly what happens on the jump label path where
      the hotplug lock is held and irqs are subsequently disabled by
      stop_machine_cpuslocked(). This results in a deadlock during boot on
      Hikey-960.
      
      Due to the architectural guarantees around patching NOPs and branches,
      we don't actually need to stop_machine() at all on the jump label path,
      so we can avoid the deadlock by using the "nosync" variant of our
      instruction patching routine.
      
      Fixes: 693350a7 ("arm64: insn: Don't fallback on nosync path for general insn patching")
      Reported-by: NTuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NValentin Schneider <valentin.schneider@arm.com>
      Tested-by: NTuomas Tynkkynen <tuomas@tuxera.com>
      Tested-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f6cc0c50
    • Y
      Fix kexec forbidding kernels signed with keys in the secondary keyring to boot · ea93102f
      Yannik Sembritzki 提交于
      The split of .system_keyring into .builtin_trusted_keys and
      .secondary_trusted_keys broke kexec, thereby preventing kernels signed by
      keys which are now in the secondary keyring from being kexec'd.
      
      Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
      verify_pefile_signature().
      
      Fixes: d3bfe841 ("certs: Add a secondary system keyring that can be added to dynamically")
      Signed-off-by: NYannik Sembritzki <yannik@sembritzki.me>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: keyrings@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea93102f
  5. 16 8月, 2018 3 次提交
  6. 15 8月, 2018 3 次提交
  7. 14 8月, 2018 1 次提交
  8. 13 8月, 2018 20 次提交