1. 29 6月, 2019 1 次提交
    • T
      x86/timer: Skip PIT initialization on modern chipsets · c8c40767
      Thomas Gleixner 提交于
      Recent Intel chipsets including Skylake and ApolloLake have a special
      ITSSPRC register which allows the 8254 PIT to be gated.  When gated, the
      8254 registers can still be programmed as normal, but there are no IRQ0
      timer interrupts.
      
      Some products such as the Connex L1430 and exone go Rugged E11 use this
      register to ship with the PIT gated by default. This causes Linux to fail
      to boot:
      
        Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
        apic=debug and send a report.
      
      The panic happens before the framebuffer is initialized, so to the user, it
      appears as an early boot hang on a black screen.
      
      Affected products typically have a BIOS option that can be used to enable
      the 8254 and make Linux work (Chipset -> South Cluster Configuration ->
      Miscellaneous Configuration -> 8254 Clock Gating), however it would be best
      to make Linux support the no-8254 case.
      
      Modern sytems allow to discover the TSC and local APIC timer frequencies,
      so the calibration against the PIT is not required. These systems have
      always running timers and the local APIC timer works also in deep power
      states.
      
      So the setup of the PIT including the IO-APIC timer interrupt delivery
      checks are a pointless exercise.
      
      Skip the PIT setup and the IO-APIC timer interrupt checks on these systems,
      which avoids the panic caused by non ticking PITs and also speeds up the
      boot process.
      
      Thanks to Daniel for providing the changelog, initial analysis of the
      problem and testing against a variety of machines.
      Reported-by: NDaniel Drake <drake@endlessm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NDaniel Drake <drake@endlessm.com>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Cc: linux@endlessm.com
      Cc: rafael.j.wysocki@intel.com
      Cc: hdegoede@redhat.com
      Link: https://lkml.kernel.org/r/20190628072307.24678-1-drake@endlessm.com
      c8c40767
  2. 23 6月, 2019 1 次提交
  3. 17 6月, 2019 1 次提交
  4. 09 5月, 2019 3 次提交
  5. 06 5月, 2019 1 次提交
  6. 05 5月, 2019 1 次提交
    • J
      perf/x86/intel: Fix race in intel_pmu_disable_event() · 6f55967a
      Jiri Olsa 提交于
      New race in x86_pmu_stop() was introduced by replacing the
      atomic __test_and_clear_bit() of cpuc->active_mask by separate
      test_bit() and __clear_bit() calls in the following commit:
      
        3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      
      The race causes panic for PEBS events with enabled callchains:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:perf_prepare_sample+0x8c/0x530
        Call Trace:
         <NMI>
         perf_event_output_forward+0x2a/0x80
         __perf_event_overflow+0x51/0xe0
         handle_pmi_common+0x19e/0x240
         intel_pmu_handle_irq+0xad/0x170
         perf_event_nmi_handler+0x2e/0x50
         nmi_handle+0x69/0x110
         default_do_nmi+0x3e/0x100
         do_nmi+0x11a/0x180
         end_repeat_nmi+0x16/0x1a
        RIP: 0010:native_write_msr+0x6/0x20
        ...
         </NMI>
         intel_pmu_disable_event+0x98/0xf0
         x86_pmu_stop+0x6e/0xb0
         x86_pmu_del+0x46/0x140
         event_sched_out.isra.97+0x7e/0x160
        ...
      
      The event is configured to make samples from PEBS drain code,
      but when it's disabled, we'll go through NMI path instead,
      where data->callchain will not get allocated and we'll crash:
      
                x86_pmu_stop
                  test_bit(hwc->idx, cpuc->active_mask)
                  intel_pmu_disable_event(event)
                  {
                    ...
                    intel_pmu_pebs_disable(event);
                    ...
      
      EVENT OVERFLOW ->  <NMI>
                           intel_pmu_handle_irq
                             handle_pmi_common
         TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                                 perf_event_overflow
                                   perf_prepare_sample
                                   {
                                     ...
                                     if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                           data->callchain = perf_callchain(event, regs);
      
               CRASH ->              size += data->callchain->nr;
                                   }
                         </NMI>
                    ...
                    x86_pmu_disable_event(event)
                  }
      
                  __clear_bit(hwc->idx, cpuc->active_mask);
      
      Fixing this by disabling the event itself before setting
      off the PEBS bit.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f55967a
  7. 04 5月, 2019 1 次提交
  8. 03 5月, 2019 7 次提交
    • N
      s390/vdso: drop unnecessary cc-ldoption · ce968f60
      Nick Desaulniers 提交于
      Towards the goal of removing cc-ldoption, it seems that --hash-style=
      was added to binutils 2.17.50.0.2 in 2006. The minimal required version
      of binutils for the kernel according to
      Documentation/process/changes.rst is 2.20.
      
      Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html
      Cc: clang-built-linux@googlegroups.com
      Suggested-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ce968f60
    • A
      s390: fix clang -Wpointer-sign warnigns in boot code · 4ae98789
      Arnd Bergmann 提交于
      The arch/s390/boot directory is built with its own set of compiler
      options that does not include -Wno-pointer-sign like the rest of
      the kernel does, this causes a lot of harmless but correct warnings
      when building with clang.
      
      For the atomics, we can add type casts to avoid the warnings, for
      everything else the easiest way is to slightly adapt the types
      to be more consistent.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4ae98789
    • A
      s390: drop CONFIG_VIRT_TO_BUS · 964d06b4
      Arnd Bergmann 提交于
      VIRT_TO_BUS is only used for legacy device PCI and ISA drivers using
      virt_to_bus() instead of the streaming DMA mapping API, and the
      remaining drivers generally don't work on 64-bit architectures.
      
      Two of these drivers also cause a build warning on s390, so instead
      of trying to fix that, let's just disable the option as we do on
      most architectures now.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      964d06b4
    • A
      s390: boot, purgatory: pass $(CLANG_FLAGS) where needed · 96fb54a1
      Arnd Bergmann 提交于
      The purgatory and boot Makefiles do not inherit the original cflags,
      so clang falls back to the default target architecture when building it,
      typically this would be x86 when cross-compiling.
      
      Add $(CLANG_FLAGS) everywhere so we pass the correct --target=s390x-linux
      option when cross-compiling.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      96fb54a1
    • A
      s390: only build for new CPUs with clang · c263a4e9
      Arnd Bergmann 提交于
      llvm does does not understand -march=z9-109 and older target
      specifiers, so disable the respective Kconfig settings and
      the logic to make the boot code work on old systems when
      building with clang.
      
      Part of the early boot code is normally compiled with -march=z900
      for maximum compatibility. This also has to get changed with
      clang to the oldest supported ISA, which is -march=z10 here.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c263a4e9
    • A
      perf/x86/intel/pt: Remove software double buffering PMU capability · 72e830f6
      Alexander Shishkin 提交于
      Now that all AUX allocations are high-order by default, the software
      double buffering PMU capability doesn't make sense any more, get rid
      of it. In case some PMUs choose to opt out, we can re-introduce it.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: adrian.hunter@intel.com
      Link: http://lkml.kernel.org/r/20190503085536.24119-3-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      72e830f6
    • K
      perf/x86/amd: Update generic hardware cache events for Family 17h · 0e3b74e2
      Kim Phillips 提交于
      Add a new amd_hw_cache_event_ids_f17h assignment structure set
      for AMD families 17h and above, since a lot has changed.  Specifically:
      
      L1 Data Cache
      
      The data cache access counter remains the same on Family 17h.
      
      For DC misses, PMCx041's definition changes with Family 17h,
      so instead we use the L2 cache accesses from L1 data cache
      misses counter (PMCx060,umask=0xc8).
      
      For DC hardware prefetch events, Family 17h breaks compatibility
      for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
      Prefetch DC Fills."
      
      L1 Instruction Cache
      
      PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
      compatible on Family 17h.
      
      For prefetches, we remove the erroneous PMCx04B assignment which
      counts how many software data cache prefetch load instructions were
      dispatched.
      
      LL - Last Level Cache
      
      Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
      on Family 17h, where the last level cache is L3.  L3 counters
      can be accessed using the existing AMD Uncore driver.
      
      Data TLB
      
      On Intel machines, data TLB accesses ("dTLB-loads") are assigned
      to counters that count load/store instructions retired.  This
      is inconsistent with instruction TLB accesses, where Intel
      implementations report iTLB misses that hit in the STLB.
      
      Ideally, dTLB-loads would count higher level dTLB misses that hit
      in lower level TLBs, and dTLB-load-misses would report those
      that also missed in those lower-level TLBs, therefore causing
      a page table walk.  That would be consistent with instruction
      TLB operation, remove the redundancy between dTLB-loads and
      L1-dcache-loads, and prevent perf from producing artificially
      low percentage ratios, i.e. the "0.01%" below:
      
              42,550,869      L1-dcache-loads
              41,591,860      dTLB-loads
                   4,802      dTLB-load-misses          #    0.01% of all dTLB cache hits
               7,283,682      L1-dcache-stores
               7,912,392      dTLB-stores
                     310      dTLB-store-misses
      
      On AMD Families prior to 17h, the "Data Cache Accesses" counter is
      used, which is slightly better than load/store instructions retired,
      but still counts in terms of individual load/store operations
      instead of TLB operations.
      
      So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
      to a counter for L1 dTLB misses that hit in the L2 dTLB, and
      "dTLB-load-misses" to a counter for L1 DTLB misses that caused
      L2 DTLB misses and therefore also caused page table walks.  This
      results in a much more accurate view of data TLB performance:
      
              60,961,781      L1-dcache-loads
                   4,601      dTLB-loads
                     963      dTLB-load-misses          #   20.93% of all dTLB cache hits
      
      Note that for all AMD families, data loads and stores are combined
      in a single accesses counter, so no 'L1-dcache-stores' are reported
      separately, and stores are counted with loads in 'L1-dcache-loads'.
      
      Also note that the "% of all dTLB cache hits" string is misleading
      because (a) "dTLB cache": although TLBs can be considered caches for
      page tables, in this context, it can be misinterpreted as data cache
      hits because the figures are similar (at least on Intel), and (b) not
      all those loads (technically accesses) technically "hit" at that
      hardware level.  "% of all dTLB accesses" would be more clear/accurate.
      
      Instruction TLB
      
      On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
      STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
      the STLB and completed a page table walk.
      
      For AMD Family 17h and above, for 'iTLB-loads' we replace the
      erroneous instruction cache fetches counter with PMCx084
      "L1 ITLB Miss, L2 ITLB Hit".
      
      For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
      L2 ITLB Miss", but set a 0xff umask because without it the event
      does not get counted.
      
      Branch Predictor (BPU)
      
      PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.
      
      Node Level Events
      
      Family 17h does not have a PMCx0e9 counter, and corresponding counters
      have not been made available publicly, so for now, we mark them as
      unsupported for Families 17h and above.
      
      Reference:
      
        "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
        Released 7/17/2018, Publication #56255, Revision 3.03:
        https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf
      
      [ mingo: tidied up the line breaks. ]
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-perf-users@vger.kernel.org
      Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0e3b74e2
  9. 02 5月, 2019 10 次提交
    • M
      s390: simplify disabled_wait · 98587c2d
      Martin Schwidefsky 提交于
      The disabled_wait() function uses its argument as the PSW address when
      it stops the CPU with a wait PSW that is disabled for interrupts.
      The different callers sometimes use a specific number like 0xdeadbeef
      to indicate a specific failure, the early boot code uses 0 and some
      other calls sites use __builtin_return_address(0).
      
      At the time a dump is created the current PSW and the registers of a
      CPU are written to lowcore to make them avaiable to the dump analysis
      tool. For a CPU stopped with disabled_wait the PSW and the registers
      do not really make sense together, the PSW address does not point to
      the function the registers belong to.
      
      Simplify disabled_wait() by using _THIS_IP_ for the PSW address and
      drop the argument to the function.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      98587c2d
    • M
      s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR · ec7bf478
      Martin Schwidefsky 提交于
      Make the call chain more reliable by tagging the ftrace stack entries
      with the stack pointer that is associated with the return address.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ec7bf478
    • M
      s390/unwind: introduce stack unwind API · 78c98f90
      Martin Schwidefsky 提交于
      Rework the dump_trace() stack unwinder interface to support different
      unwinding algorithms. The new interface looks like this:
      
      	struct unwind_state state;
      	unwind_for_each_frame(&state, task, regs, start_stack)
      		do_something(state.sp, state.ip, state.reliable);
      
      The unwind_bc.c file contains the implementation for the classic
      back-chain unwinder.
      
      One positive side effect of the new code is it now handles ftraced
      functions gracefully. It prints the real name of the return function
      instead of 'return_to_handler'.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      78c98f90
    • M
    • M
      s390/bug: add entry size to the __bug_table section · e21f8baf
      Martin Schwidefsky 提交于
      Change the __EMIT_BUG inline assembly to emit mergeable __bug_table
      entries with type @progbits and specify the size of each entry.
      The entry size is encoded sh_entsize field of the section definition,
      it allows to identify which struct bug_entry to use to decode the
      entries. This will be needed for the objtool support.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e21f8baf
    • M
      s390: use proper expoline sections for .dma code · bf726301
      Martin Schwidefsky 提交于
      The text_dma.S code uses its own macro to generate an inline version of an
      expoline. To make it easier to identify all expolines in the kernel use a
      thunk and a branch to the thunk just like the rest of the kernel code does
      it.
      
      The name of the text_dma.S expoline thunk is __dma__s390_indirect_jump_r14
      and the section is named .dma.text.__s390_indirect_jump_r14.
      
      This will be needed for the objtool support.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bf726301
    • M
      s390/nospec: rename assembler generated expoline thunks · 40a3abf7
      Martin Schwidefsky 提交于
      The assembler version of the expoline thunk use the naming
      __s390x_indirect_jump_rxuse_ry while the compiler generates names
      like __s390_indirect_jump_rx_use_ry. Make the naming more consistent.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      40a3abf7
    • M
      s390: add missing ENDPROC statements to assembler functions · 26a374ae
      Martin Schwidefsky 提交于
      The assembler code in arch/s390 misses proper ENDPROC statements
      to properly end functions in a few places. Add them.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      26a374ae
    • C
      powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX · 12f36351
      Christophe Leroy 提交于
      Serge reported some crashes with CONFIG_STRICT_KERNEL_RWX enabled
      on a book3s32 machine.
      
      Analysis shows two issues:
        - BATs addresses and sizes are not properly aligned.
        - There is a gap between the last address covered by BATs and the
          first address covered by pages.
      
      Memory mapped with DBATs:
      0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
      1: 0xc0800000-0xc0bfffff 0x00800000 Kernel RO coherent
      2: 0xc0c00000-0xc13fffff 0x00c00000 Kernel RW coherent
      3: 0xc1400000-0xc23fffff 0x01400000 Kernel RW coherent
      4: 0xc2400000-0xc43fffff 0x02400000 Kernel RW coherent
      5: 0xc4400000-0xc83fffff 0x04400000 Kernel RW coherent
      6: 0xc8400000-0xd03fffff 0x08400000 Kernel RW coherent
      7: 0xd0400000-0xe03fffff 0x10400000 Kernel RW coherent
      
      Memory mapped with pages:
      0xe1000000-0xefffffff  0x21000000       240M        rw       present           dirty  accessed
      
      This patch fixes both issues. With the patch, we get the following
      which is as expected:
      
      Memory mapped with DBATs:
      0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
      1: 0xc0800000-0xc0bfffff 0x00800000 Kernel RO coherent
      2: 0xc0c00000-0xc0ffffff 0x00c00000 Kernel RW coherent
      3: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent
      4: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent
      5: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent
      6: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent
      7: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
      
      Memory mapped with pages:
      0xe0000000-0xefffffff  0x20000000       256M        rw       present           dirty  accessed
      
      Fixes: 63b2bc61 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
      Reported-by: NSerge Belyshev <belyshev@depni.sinp.msu.ru>
      Acked-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      12f36351
    • L
      gcc-9: properly declare the {pv,hv}clock_page storage · 459e3a21
      Linus Torvalds 提交于
      The pvlock_page and hvclock_page variables are (as the name implies)
      addresses to pages, created by the linker script.
      
      But we declared them as just "extern u8" variables, which _works_, but
      now that gcc does some more bounds checking, it causes warnings like
      
          warning: array subscript 1 is outside array bounds of ‘u8[1]’
      
      when we then access more than one byte from those variables.
      
      Fix this by simply making the declaration of the variables match
      reality, which makes the compiler happy too.
      Signed-off-by: NLinus Torvalds <torvalds@-linux-foundation.org>
      459e3a21
  10. 01 5月, 2019 9 次提交
  11. 30 4月, 2019 5 次提交