1. 07 12月, 2018 2 次提交
    • W
      arm64: entry: Place an SB sequence following an ERET instruction · 679db708
      Will Deacon 提交于
      Some CPUs can speculate past an ERET instruction and potentially perform
      speculative accesses to memory before processing the exception return.
      Since the register state is often controlled by a lower privilege level
      at the point of an ERET, this could potentially be used as part of a
      side-channel attack.
      
      This patch emits an SB sequence after each ERET so that speculation is
      held up on exception return.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      679db708
    • W
      arm64: Add support for SB barrier and patch in over DSB; ISB sequences · bd4fb6d2
      Will Deacon 提交于
      We currently use a DSB; ISB sequence to inhibit speculation in set_fs().
      Whilst this works for current CPUs, future CPUs may implement a new SB
      barrier instruction which acts as an architected speculation barrier.
      
      On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB
      sequence and advertise the presence of the new instruction to userspace.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      bd4fb6d2
  2. 06 12月, 2018 7 次提交
  3. 04 12月, 2018 1 次提交
    • A
      arm64: relocatable: fix inconsistencies in linker script and options · 3bbd3db8
      Ard Biesheuvel 提交于
      readelf complains about the section layout of vmlinux when building
      with CONFIG_RELOCATABLE=y (for KASLR):
      
        readelf: Warning: [21]: Link field (0) should index a symtab section.
        readelf: Warning: [21]: Info field (0) should index a relocatable section.
      
      Also, it seems that our use of '-pie -shared' is contradictory, and
      thus ambiguous. In general, the way KASLR is wired up at the moment
      is highly tailored to how ld.bfd happens to implement (and conflate)
      PIE executables and shared libraries, so given the current effort to
      support other toolchains, let's fix some of these issues as well.
      
      - Drop the -pie linker argument and just leave -shared. In ld.bfd,
        the differences between them are unclear (except for the ELF type
        of the produced image [0]) but lld chokes on seeing both at the
        same time.
      
      - Rename the .rela output section to .rela.dyn, as is customary for
        shared libraries and PIE executables, so that it is not misidentified
        by readelf as a static relocation section (producing the warnings
        above).
      
      - Pass the -z notext and -z norelro options to explicitly instruct the
        linker to permit text relocations, and to omit the RELRO program
        header (which requires a certain section layout that we don't adhere
        to in the kernel). These are the defaults for current versions of
        ld.bfd.
      
      - Discard .eh_frame and .gnu.hash sections to avoid them from being
        emitted between .head.text and .text, screwing up the section layout.
      
      These changes only affect the ELF image, and produce the same binary
      image.
      
      [0] b9dce7f1 ("arm64: kernel: force ET_DYN ELF type for ...")
      
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Smith <peter.smith@linaro.org>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3bbd3db8
  4. 30 11月, 2018 9 次提交
    • A
      arm64/lib: improve CRC32 performance for deep pipelines · efdb25ef
      Ard Biesheuvel 提交于
      Improve the performance of the crc32() asm routines by getting rid of
      most of the branches and small sized loads on the common path.
      
      Instead, use a branchless code path involving overlapping 16 byte
      loads to process the first (length % 32) bytes, and process the
      remainder using a loop that processes 32 bytes at a time.
      
      Tested using the following test program:
      
        #include <stdlib.h>
      
        extern void crc32_le(unsigned short, char const*, int);
      
        int main(void)
        {
          static const char buf[4096];
      
          srand(20181126);
      
          for (int i = 0; i < 100 * 1000 * 1000; i++)
            crc32_le(0, buf, rand() % 1024);
      
          return 0;
        }
      
      On Cortex-A53 and Cortex-A57, the performance regresses but only very
      slightly. On Cortex-A72 however, the performance improves from
      
        $ time ./crc32
      
        real  0m10.149s
        user  0m10.149s
        sys   0m0.000s
      
      to
      
        $ time ./crc32
      
        real  0m7.915s
        user  0m7.915s
        sys   0m0.000s
      
      Cc: Rui Sun <sunrui26@huawei.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      efdb25ef
    • M
      arm64: ftrace: always pass instrumented pc in x0 · 7dc48bf9
      Mark Rutland 提交于
      The core ftrace hooks take the instrumented PC in x0, but for some
      reason arm64's prepare_ftrace_return() takes this in x1.
      
      For consistency, let's flip the argument order and always pass the
      instrumented PC in x0.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7dc48bf9
    • M
      arm64: ftrace: remove return_regs macros · 49e258e0
      Mark Rutland 提交于
      The save_return_regs and restore_return_regs macros are only used by
      return_to_handler, and having them defined out-of-line only serves to
      obscure the logic.
      
      Before we complicate, let's clean this up and fold the logic directly
      into return_to_handler, saving a few lines of macro boilerplate in the
      process. At the same time, a missing trailing space is added to the
      comments, fixing a code style violation.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      49e258e0
    • M
      arm64: ftrace: don't adjust the LR value · 6e803e2e
      Mark Rutland 提交于
      The core ftrace code requires that when it is handed the PC of an
      instrumented function, this PC is the address of the instrumented
      instruction. This is necessary so that the core ftrace code can identify
      the specific instrumentation site. Since the instrumented function will
      be a BL, the address of the instrumented function is LR - 4 at entry to
      the ftrace code.
      
      This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers,
      which acquire the PC of the instrumented function.
      
      The mcount_get_lr helper is used to acquire the LR of the instrumented
      function, whose value does not require this adjustment, and cannot be
      adjusted to anything meaningful. No adjustment of this value is made on
      other architectures, including arm. However, arm64 adjusts this value by
      4.
      
      This patch brings arm64 in line with other architectures and removes the
      adjustment of the LR value.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6e803e2e
    • M
      arm64: ftrace: enable graph FP test · 5c176aff
      Mark Rutland 提交于
      The core frace code has an optional sanity check on the frame pointer
      passed by ftrace_graph_caller and return_to_handler. This is cheap,
      useful, and enabled unconditionally on x86, sparc, and riscv.
      
      Let's do the same on arm64, so that we can catch any problems early.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5c176aff
    • M
      arm64: ftrace: use GLOBAL() · e4fe1966
      Mark Rutland 提交于
      The global exports of ftrace_call and ftrace_graph_call are somewhat
      painful to read. Let's use the generic GLOBAL() macro to ameliorate
      matters.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      e4fe1966
    • M
      linkage: add generic GLOBAL() macro · ad697a1a
      Mark Rutland 提交于
      Declaring a global symbol in assembly is tedious, error-prone, and
      painful to read. While ENTRY() exists, this is supposed to be used for
      function entry points, and this affects alignment in a potentially
      undesireable manner.
      
      Instead, let's add a generic GLOBAL() macro for this, as x86 added
      locally in commit:
      
        95695547 ("x86: asm linkage - introduce GLOBAL macro")
      
      ... thus allowing us to use this more freely in the kernel.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Torsten Duwe <duwe@suse.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ad697a1a
    • A
      arm64: drop linker script hack to hide __efistub_ symbols · dd6846d7
      Ard Biesheuvel 提交于
      Commit 1212f7a1 ("scripts/kallsyms: filter arm64's __efistub_
      symbols") updated the kallsyms code to filter out symbols with
      the __efistub_ prefix explicitly, so we no longer require the
      hack in our linker script to emit them as absolute symbols.
      
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      dd6846d7
    • W
      arm64: io: Ensure value passed to __iormb() is held in a 64-bit register · 1b57ec8c
      Will Deacon 提交于
      As of commit 6460d320 ("arm64: io: Ensure calls to delay routines
      are ordered against prior readX()"), MMIO reads smaller than 64 bits
      fail to compile under clang because we end up mixing 32-bit and 64-bit
      register operands for the same data processing instruction:
      
      ./include/asm-generic/io.h:695:9: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
              return readb(addr);
                     ^
      ./arch/arm64/include/asm/io.h:147:58: note: expanded from macro 'readb'
                                                                             ^
      ./include/asm-generic/io.h:695:9: note: use constraint modifier "w"
      ./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb'
                                                                     ^
      ./arch/arm64/include/asm/io.h:118:24: note: expanded from macro '__iormb'
              asm volatile("eor       %0, %1, %1\n"                           \
                                          ^
      
      Fix the build by casting the macro argument to 'unsigned long' when used
      as an input to the inline asm.
      Reported-by: NNick Desaulniers <nick.desaulniers@gmail.com>
      Reported-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      1b57ec8c
  5. 28 11月, 2018 5 次提交
    • W
      arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE · 3d65b6bb
      Will Deacon 提交于
      In order to reduce the possibility of soft lock-ups, we bound the
      maximum number of TLBI operations performed by a single call to
      flush_tlb_range() to an arbitrary constant of 1024.
      
      Whilst this does the job of avoiding lock-ups, we can actually be a bit
      smarter by defining this as PTRS_PER_PTE. Due to the structure of our
      page tables, using PTRS_PER_PTE means that an outer loop calling
      flush_tlb_range() for entire table entries will end up performing just a
      single TLBI operation for each entry. As an example, mremap()ing a 1GB
      range mapped using 4k pages now requires only 512 TLBI operations when
      moving the page tables as opposed to 262144 operations (512*512) when
      using the current threshold of 1024.
      
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3d65b6bb
    • A
      arm64/module: switch to ADRP/ADD sequences for PLT entries · bdb85cd1
      Ard Biesheuvel 提交于
      Now that we have switched to the small code model entirely, and
      reduced the extended KASLR range to 4 GB, we can be sure that the
      targets of relative branches that are out of range are in range
      for a ADRP/ADD pair, which is one instruction shorter than our
      current MOVN/MOVK/MOVK sequence, and is more idiomatic and so it
      is more likely to be implemented efficiently by micro-architectures.
      
      So switch over the ordinary PLT code and the special handling of
      the Cortex-A53 ADRP errata, as well as the ftrace trampline
      handling.
      Reviewed-by: NTorsten Duwe <duwe@lst.de>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [will: Added a couple of comments in the plt equality check]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      bdb85cd1
    • A
      arm64/insn: add support for emitting ADR/ADRP instructions · 7aaf7b2f
      Ard Biesheuvel 提交于
      Add support for emitting ADR and ADRP instructions so we can switch
      over our PLT generation code in a subsequent patch.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7aaf7b2f
    • J
      arm64: Use a raw spinlock in __install_bp_hardening_cb() · d8797b12
      James Morse 提交于
      __install_bp_hardening_cb() is called via stop_machine() as part
      of the cpu_enable callback. To force each CPU to take its turn
      when allocating slots, they take a spinlock.
      
      With the RT patches applied, the spinlock becomes a mutex,
      and we get warnings about sleeping while in stop_machine():
      | [    0.319176] CPU features: detected: RAS Extension Support
      | [    0.319950] BUG: scheduling while atomic: migration/3/36/0x00000002
      | [    0.319955] Modules linked in:
      | [    0.319958] Preemption disabled at:
      | [    0.319969] [<ffff000008181ae4>] cpu_stopper_thread+0x7c/0x108
      | [    0.319973] CPU: 3 PID: 36 Comm: migration/3 Not tainted 4.19.1-rt3-00250-g330fc2c2a880 #2
      | [    0.319975] Hardware name: linux,dummy-virt (DT)
      | [    0.319976] Call trace:
      | [    0.319981]  dump_backtrace+0x0/0x148
      | [    0.319983]  show_stack+0x14/0x20
      | [    0.319987]  dump_stack+0x80/0xa4
      | [    0.319989]  __schedule_bug+0x94/0xb0
      | [    0.319991]  __schedule+0x510/0x560
      | [    0.319992]  schedule+0x38/0xe8
      | [    0.319994]  rt_spin_lock_slowlock_locked+0xf0/0x278
      | [    0.319996]  rt_spin_lock_slowlock+0x5c/0x90
      | [    0.319998]  rt_spin_lock+0x54/0x58
      | [    0.320000]  enable_smccc_arch_workaround_1+0xdc/0x260
      | [    0.320001]  __enable_cpu_capability+0x10/0x20
      | [    0.320003]  multi_cpu_stop+0x84/0x108
      | [    0.320004]  cpu_stopper_thread+0x84/0x108
      | [    0.320008]  smpboot_thread_fn+0x1e8/0x2b0
      | [    0.320009]  kthread+0x124/0x128
      | [    0.320010]  ret_from_fork+0x10/0x18
      
      Switch this to a raw spinlock, as we know this is only called with
      IRQs masked.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d8797b12
    • J
      arm64: acpi: Prepare for longer MADTs · 9eb1c92b
      Jeremy Linton 提交于
      The BAD_MADT_GICC_ENTRY check is a little too strict because
      it rejects MADT entries that don't match the currently known
      lengths. We should remove this restriction to avoid problems
      if the table length changes. Future code which might depend on
      additional fields should be written to validate those fields
      before using them, rather than trying to globally check
      known MADT version lengths.
      
      Link: https://lkml.kernel.org/r/20181012192937.3819951-1-jeremy.linton@arm.comSigned-off-by: NJeremy Linton <jeremy.linton@arm.com>
      [lorenzo.pieralisi@arm.com: added MADT macro comments]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: NSudeep Holla <sudeep.holla@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Al Stone <ahs3@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9eb1c92b
  6. 27 11月, 2018 2 次提交
    • W
      arm64: io: Ensure calls to delay routines are ordered against prior readX() · 6460d320
      Will Deacon 提交于
      A relatively standard idiom for ensuring that a pair of MMIO writes to a
      device arrive at that device with a specified minimum delay between them
      is as follows:
      
      	writel_relaxed(42, dev_base + CTL1);
      	readl(dev_base + CTL1);
      	udelay(10);
      	writel_relaxed(42, dev_base + CTL2);
      
      the intention being that the read-back from the device will push the
      prior write to CTL1, and the udelay will hold up the write to CTL1 until
      at least 10us have elapsed.
      
      Unfortunately, on arm64 where the underlying delay loop is implemented
      as a read of the architected counter, the CPU does not guarantee
      ordering from the readl() to the delay loop and therefore the delay loop
      could in theory be speculated and not provide the desired interval
      between the two writes.
      
      Fix this in a similar manner to PowerPC by introducing a dummy control
      dependency on the output of readX() which, combined with the ISB in the
      read of the architected counter, guarantees that a subsequent delay loop
      can not be executed until the readX() has returned its result.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6460d320
    • A
      arm64: mm: Don't wait for completion of TLB invalidation when page aging · 3403e56b
      Alex Van Brunt 提交于
      When transitioning a PTE from young to old as part of page aging, we
      can avoid waiting for the TLB invalidation to complete and therefore
      drop the subsequent DSB instruction. Whilst this opens up a race with
      page reclaim, where a PTE in active use via a stale, young TLB entry
      does not update the underlying descriptor, the worst thing that happens
      is that the page is reclaimed and then immediately faulted back in.
      
      Given that we have a DSB in our context-switch path, the window for a
      spurious reclaim is fairly limited and eliding the barrier claims to
      boost NVMe/SSD accesses by over 10% on some platforms.
      
      A similar optimisation was made for x86 in commit b13b1d2d ("x86/mm:
      In the PTE swapout page reclaim case clear the accessed bit instead of
      flushing the TLB").
      Signed-off-by: NAlex Van Brunt <avanbrunt@nvidia.com>
      Signed-off-by: NAshish Mhetre <amhetre@nvidia.com>
      [will: rewrote patch]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3403e56b
  7. 20 11月, 2018 3 次提交
    • J
      arm64/module: use plt section indices for relocations · c8ebf64e
      Jessica Yu 提交于
      Instead of saving a pointer to the .plt and .init.plt sections to apply
      plt-based relocations, save and use their section indices instead.
      
      The mod->arch.{core,init}.plt pointers were problematic for livepatch
      because they pointed within temporary section headers (provided by the
      module loader via info->sechdrs) that would be freed after module load.
      Since livepatch modules may need to apply relocations post-module-load
      (for example, to patch a module that is loaded later), using section
      indices to offset into the section headers (instead of accessing them
      through a saved pointer) allows livepatch modules on arm64 to pass in
      their own copy of the section headers to apply_relocate_add() to apply
      delayed relocations.
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c8ebf64e
    • A
      arm64: mm: apply r/o permissions of VM areas to its linear alias as well · c55191e9
      Ard Biesheuvel 提交于
      On arm64, we use block mappings and contiguous hints to map the linear
      region, to minimize the TLB footprint. However, this means that the
      entire region is mapped using read/write permissions, which we cannot
      modify at page granularity without having to take intrusive measures to
      prevent TLB conflicts.
      
      This means the linear aliases of pages belonging to read-only mappings
      (executable or otherwise) in the vmalloc region are also mapped read/write,
      and could potentially be abused to modify things like module code, bpf JIT
      code or other read-only data.
      
      So let's fix this, by extending the set_memory_ro/rw routines to take
      the linear alias into account. The consequence of enabling this is
      that we can no longer use block mappings or contiguous hints, so in
      cases where the TLB footprint of the linear region is a bottleneck,
      performance may be affected.
      
      Therefore, allow this feature to be runtime en/disabled, by setting
      rodata=full (or 'on' to disable just this enhancement, or 'off' to
      disable read-only mappings for code and r/o data entirely) on the
      kernel command line. Also, allow the default value to be set via a
      Kconfig option.
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c55191e9
    • A
      arm64: mm: purge lazily unmapped vm regions before changing permissions · b34d2ef0
      Ard Biesheuvel 提交于
      Call vm_unmap_aliases() every time we apply any changes to permission
      attributes of mappings in the vmalloc region. This avoids any potential
      issues resulting from lingering writable or executable aliases of
      mappings that should be read-only or non-executable, respectively.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      b34d2ef0
  8. 19 11月, 2018 11 次提交
    • L
      Linux 4.20-rc3 · 9ff01193
      Linus Torvalds 提交于
      9ff01193
    • L
      Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 25e19c1f
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "A small batch of fixes for v4.20-rc3.
      
        The overflow continuation fix addresses something that has been broken
        for several releases. Arguably it could wait even longer, but it's a
        one line fix and this finishes the last of the known address range
        scrub bug reports. The revert addresses a lockdep regression. The unit
        tests are not critical to fix, but no reason to hold this fix back.
      
        Summary:
      
         - Address Range Scrub overflow continuation handling has been broken
           since it was initially merged. It was only recently that error
           injection and platform-BIOS support enabled this corner case to be
           exercised.
      
         - The recent attempt to provide more isolation for the kernel Address
           Range Scrub state machine from userapace initiated sessions
           triggers a lockdep report. Revert and try again at the next merge
           window.
      
         - Fix a kasan reported buffer overflow in libnvdimm unit test
           infrastrucutre (nfit_test)"
      
      * tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        Revert "acpi, nfit: Further restrict userspace ARS start requests"
        acpi, nfit: Fix ARS overflow continuation
        tools/testing/nvdimm: Fix the array size for dimm devices.
      25e19c1f
    • L
      Merge branch 'akpm' (patches from Andrew) · c67a98c0
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
        mm, page_alloc: check for max order in hot path
        scripts/spdxcheck.py: make python3 compliant
        tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
        lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
        mm/vmstat.c: fix NUMA statistics updates
        mm/gup.c: fix follow_page_mask() kerneldoc comment
        ocfs2: free up write context when direct IO failed
        scripts/faddr2line: fix location of start_kernel in comment
        mm: don't reclaim inodes with many attached pages
        mm, memory_hotplug: check zone_movable in has_unmovable_pages
        mm/swapfile.c: use kvzalloc for swap_info_struct allocation
        MAINTAINERS: update OMAP MMC entry
        hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
        kernel/sched/psi.c: simplify cgroup_move_task()
        z3fold: fix possible reclaim races
      c67a98c0
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 03582f33
      Linus Torvalds 提交于
      Pull scheduler fix from Ingo Molnar:
       "Fix an exec() related scalability/performance regression, which was
        caused by incorrectly calculating load and migrating tasks on exec()
        when they shouldn't be"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix cpu_util_wake() for 'execl' type workloads
      03582f33
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b53e27f6
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Fix uncore PMU enumeration for CofeeLake CPUs"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Support CoffeeLake 8th CBOX
        perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
      b53e27f6
    • L
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 743a4863
      Linus Torvalds 提交于
      Pull EFI fixes from Ingo Molnar:
       "Misc fixes: two warning splat fixes, a leak fix and persistent memory
        allocation fixes for ARM"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi: Permit calling efi_mem_reserve_persistent() from atomic context
        efi/arm: Defer persistent reservations until after paging_init()
        efi/arm/libstub: Pack FDT after populating it
        efi/arm: Revert deferred unmap of early memmap mapping
        efi: Fix debugobjects warning on 'efi_rts_work'
      743a4863
    • L
      Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm · cfaa9f02
      Linus Torvalds 提交于
      Pull ARM spectre updates from Russell King:
       "These are the currently known final bits that resolve the Spectre
        issues. big.Little systems used to be sufficiently identical in that
        there were no differences between individual CPUs in the system that
        mattered to the kernel. With the advent of the Spectre problem, the
        CPUs now have differences in how the workaround is applied.
      
        As a result of previous Spectre patches, these systems ended up
        reporting quite a lot of:
      
           "CPUx: Spectre v2: incorrect context switching function, system vulnerable"
      
        messages due to the action of the big.Little switcher causing the CPUs
        to be re-initialised regularly. This series resolves that issue by
        making the CPU vtable unique to each CPU.
      
        However, since this is used very early, before per-cpu is setup,
        per-cpu can't be used. We also have a problem that two of the methods
        are not called from preempt-safe paths, but thankfully these remain
        identical between all CPUs in the system. To make sure, we validate
        that these are identical during boot"
      
      * 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: spectre-v2: per-CPU vtables to work around big.Little systems
        ARM: add PROC_VTABLE and PROC_TABLE macros
        ARM: clean up per-processor check_bugs method call
        ARM: split out processor lookup
        ARM: make lookup_processor_type() non-__init
      cfaa9f02
    • C
    • M
      mm, page_alloc: check for max order in hot path · c63ae43b
      Michal Hocko 提交于
      Konstantin has noticed that kvmalloc might trigger the following
      warning:
      
        WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
        [...]
        Call Trace:
         fragmentation_index+0x76/0x90
         compaction_suitable+0x4f/0xf0
         shrink_node+0x295/0x310
         node_reclaim+0x205/0x250
         get_page_from_freelist+0x649/0xad0
         __alloc_pages_nodemask+0x12a/0x2a0
         kmalloc_large_node+0x47/0x90
         __kmalloc_node+0x22b/0x2e0
         kvmalloc_node+0x3e/0x70
         xt_alloc_table_info+0x3a/0x80 [x_tables]
         do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
         nf_setsockopt+0x44/0x60
         SyS_setsockopt+0x6f/0xc0
         do_syscall_64+0x67/0x120
         entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      the problem is that we only check for an out of bound order in the slow
      path and the node reclaim might happen from the fast path already.  This
      is fixable by making sure that kvmalloc doesn't ever use kmalloc for
      requests that are larger than KMALLOC_MAX_SIZE but this also shows that
      the code is rather fragile.  A recent UBSAN report just underlines that
      by the following report
      
        UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
        shift exponent 51 is too large for 32-bit type 'int'
        CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0xd2/0x148 lib/dump_stack.c:113
         ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
         __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
         __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
         zone_watermark_fast mm/page_alloc.c:3216 [inline]
         get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
         __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
         alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
         alloc_pages include/linux/gfp.h:509 [inline]
         __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
         dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
         raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
         raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
         fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
         fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
         __blkdev_driver_ioctl block/ioctl.c:303 [inline]
         blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
         block_ioctl+0x105/0x150 fs/block_dev.c:1883
         vfs_ioctl fs/ioctl.c:46 [inline]
         do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
         ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
         __do_sys_ioctl fs/ioctl.c:709 [inline]
         __se_sys_ioctl fs/ioctl.c:707 [inline]
         __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
         do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Note that this is not a kvmalloc path.  It is just that the fast path
      really depends on having sanitzed order as well.  Therefore move the
      order check to the fast path.
      
      Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reported-by: NKyungtae Kim <kt0755@gmail.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Byoungyoung Lee <lifeasageek@gmail.com>
      Cc: "Dae R. Jeong" <threeearcat@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c63ae43b
    • U
      scripts/spdxcheck.py: make python3 compliant · 6f4d29df
      Uwe Kleine-König 提交于
      Without this change the following happens when using Python3 (3.6.6):
      
      	$ echo "GPL-2.0" | python3 scripts/spdxcheck.py -
      	FAIL: 'str' object has no attribute 'decode'
      	Traceback (most recent call last):
      	  File "scripts/spdxcheck.py", line 253, in <module>
      	    parser.parse_lines(sys.stdin, args.maxlines, '-')
      	  File "scripts/spdxcheck.py", line 171, in parse_lines
      	    line = line.decode(locale.getpreferredencoding(False), errors='ignore')
      	AttributeError: 'str' object has no attribute 'decode'
      
      So as the line is already a string, there is no need to decode it and
      the line can be dropped.
      
      /usr/bin/python on Arch is Python 3.  So this would indeed be worth
      going into 4.19.
      
      Link: http://lkml.kernel.org/r/20181023070802.22558-1-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Joe Perches <joe@perches.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f4d29df
    • Y
      tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset · 1a413646
      Yufen Yu 提交于
      Other filesystems such as ext4, f2fs and ubifs all return ENXIO when
      lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset.
      
      man 2 lseek says
      
      :      EINVAL whence  is  not  valid.   Or: the resulting file offset would be
      :             negative, or beyond the end of a seekable device.
      :
      :      ENXIO  whence is SEEK_DATA or SEEK_HOLE, and the file offset is  beyond
      :             the end of the file.
      
      Make tmpfs return ENXIO under these circumstances as well.  After this,
      tmpfs also passes xfstests's generic/448.
      
      [akpm@linux-foundation.org: rewrite changelog]
      Link: http://lkml.kernel.org/r/1540434176-14349-1-git-send-email-yuyufen@huawei.comSigned-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a413646