1. 25 7月, 2017 1 次提交
    • M
      s390/mm: tag normal pages vs pages used in page tables · c9b5ad54
      Martin Schwidefsky 提交于
      The ESSA instruction has a new option that allows to tag pages that
      are not used as a page table. Without the tag the hypervisor has to
      assume that any guest page could be used in a page table inside the
      guest. This forces the hypervisor to flush all guest TLB entries
      whenever a host page table entry is invalidated. With the tag
      the host can skip the TLB flush if the page is tagged as normal page.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c9b5ad54
  2. 23 7月, 2017 2 次提交
  3. 21 7月, 2017 4 次提交
    • R
      x86/devicetree: Convert to using %pOF instead of ->full_name · db15e7f2
      Rob Herring 提交于
      Now that we have a custom printf format specifier, convert users of
      full_name to use %pOF instead. This is preparation to remove storing
      of the full path string for each device node.
      Signed-off-by: NRob Herring <robh@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devicetree@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org
      [ Clarify the error message while at it, as 'node' is ambiguous. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      db15e7f2
    • J
      perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8
      Jiri Olsa 提交于
      We have 2 functions using the same sched_task callback:
      
        - PEBS drain for free running counters
        - LBR save/store
      
      Both of them are called from intel_pmu_sched_task() and
      either of them can be unwillingly triggered when the
      other one is configured to run.
      
      Let's say there's PEBS drain configured in sched_task
      callback for the event, but in the callback itself
      (intel_pmu_sched_task()) we will also run the code for
      LBR save/restore, which we did not ask for, but the
      code in intel_pmu_sched_task() does not check for that.
      
      This can lead to extra cycles in some perf monitoring,
      like when we monitor PEBS event without LBR data.
      
        # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000
      
        (We need PEBS, non freq/non timestamp event to enable
         the sched_task callback)
      
      The perf stat of cycles and msr:write_msr for above
      command before the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,519,557,441      cycles:k
              91,195,527      msr:write_msr
      
            29.334476406 seconds time elapsed
      
      And after the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,704,973,540      cycles:k
              27,184,720      msr:write_msr
      
            16.977875900 seconds time elapsed
      
      There's no affect on cycles:k because the sched_task happens
      with events switched off, however the msr:write_msr tracepoint
      counter together with almost 50% of time speedup show the
      improvement.
      
      Monitoring LBR event and having extra PEBS drain processing
      in sched_task callback showed just a little speedup, because
      the drain function does not do much extra work in case there
      is no PEBS data.
      
      Adding conditions to recognize the configured work that needs
      to be done in the x86_pmu's sched_task callback.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170719075247.GA27506@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      df6c3db8
    • A
      x86/platform/uv/BAU: Disable BAU on single hub configurations · 2fe9a5c6
      Andrew Banman 提交于
      The BAU confers no benefit to a UV system running with only one hub/socket.
      Permanently disable the BAU driver if there are less than two hubs online
      to avoid BAU overhead. We have observed failed boots on single-socket UV4
      systems caused by BAU that are avoided with this patch.
      
      Also, while at it, consolidate initialization error blocks and fix a
      memory leak.
      Signed-off-by: NAndrew Banman <abanman@hpe.com>
      Acked-by: NRuss Anderson <rja@hpe.com>
      Acked-by: NMike Travis <mike.travis@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: tony.ernst@hpe.com
      Link: http://lkml.kernel.org/r/1500588351-78016-1-git-send-email-abanman@hpe.com
      [ Minor cleanups. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2fe9a5c6
    • L
      x86: mark kprobe templates as character arrays, not single characters · 54a7d50b
      Linus Torvalds 提交于
      They really are, and the "take the address of a single character" makes
      the string fortification code unhappy (it believes that you can now only
      acccess one byte, rather than a byte range, and then raises errors for
      the memory copies going on in there).
      
      We could now remove a few 'addressof' operators (since arrays naturally
      degrade to pointers), but this is the minimal patch that just changes
      the C prototypes of those template arrays (the templates themselves are
      defined in inline asm).
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Acked-and-tested-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a7d50b
  4. 20 7月, 2017 13 次提交
    • R
      kvm: x86: hyperv: avoid livelock in oneshot SynIC timers · f1ff89ec
      Roman Kagan 提交于
      If the SynIC timer message delivery fails due to SINT message slot being
      busy, there's no point to attempt starting the timer again until we're
      notified of the slot being released by the guest (via EOM or EOI).
      
      Even worse, when a oneshot timer fails to deliver its message, its
      re-arming with an expiration time in the past leads to immediate retry
      of the delivery, and so on, without ever letting the guest vcpu to run
      and release the slot, which results in a livelock.
      
      To avoid that, only start the timer when there's no timer message
      pending delivery.  When there is, meaning the slot is busy, the
      processing will be restarted upon notification from the guest that the
      slot is released.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f1ff89ec
    • W
      KVM: VMX: Fix invalid guest state detection after task-switch emulation · f244deed
      Wanpeng Li 提交于
      This can be reproduced by EPT=1, unrestricted_guest=N, emulate_invalid_state=Y
      or EPT=0, the trace of kvm-unit-tests/taskswitch2.flat is like below, it tries
      to emulate invalid guest state task-switch:
      
      kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
      kvm_emulate_insn: 42000:0:0f 0b (0x2)
      kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
      kvm_inj_exception: #UD (0x0)
      kvm_entry: vcpu 0
      kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
      kvm_emulate_insn: 42000:0:0f 0b (0x2)
      kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
      kvm_inj_exception: #UD (0x0)
      ......................
      
      It appears that the task-switch emulation updates rflags (and vm86
      flag) only after the segments are loaded, causing vmx->emulation_required
      to be set, when in fact invalid guest state emulation is not needed.
      
      This patch fixes it by updating vmx->emulation_required after the
      rflags (and vm86 flag) is updated in task-switch emulation.
      
      Thanks Radim for moving the update to vmx__set_flags and adding Paolo's
      suggestion for the check.
      Suggested-by: NNadav Amit <nadav.amit@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f244deed
    • J
      debug: Fix WARN_ON_ONCE() for modules · 325cdacd
      Josh Poimboeuf 提交于
      Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
      code turned into an oops.  As it turns out, WARN_ON_ONCE() seems to be
      completely broken when called from a module.
      
      The bug was introduced with the following commit:
      
        19d43626 ("debug: Add _ONCE() logic to report_bug()")
      
      That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
      trap handler.  It requires a writable bug table so that the BUGFLAG_DONE
      bit can be written to the flags to indicate the first warning has
      occurred.
      
      The bug table was made writable for vmlinux, which relies on
      vmlinux.lds.S and vmlinux.lds.h for laying out the sections.  However,
      it wasn't made writable for modules, which rely on the ELF section
      header flags.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Tested-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 19d43626 ("debug: Add _ONCE() logic to report_bug()")
      Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      325cdacd
    • A
      x86/platform/intel-mid: Fix a format string overflow warning · 0bc73048
      Arnd Bergmann 提交于
      We have space for exactly three characters for the index in "max7315_%d_base",
      but as GCC points out having more would cause an string overflow:
      
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c: In function 'max7315_platform_data':
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26: error: '%d' directive writing between 1 and 11 bytes into a region of size 9 [-Werror=format-overflow=]
           sprintf(base_pin_name, "max7315_%d_base", nr);
                                ^~~~~~~~~~~~~~~~~
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26: note: directive argument in the range [-2147483647, 2147483647]
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:3: note: 'sprintf' output between 15 and 25 bytes into a destination of size 17
           sprintf(base_pin_name, "max7315_%d_base", nr);
      
      This makes it use an snprintf() to truncate the string if that happened
      rather than overflowing the stack. In practice, this is safe, because
      there won't be a large number of max7315 devices in the systems, and
      both the format and the length are defined by the firmware interface.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-9-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0bc73048
    • A
      x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG · d689c64d
      Arnd Bergmann 提交于
      The IOSF_MBI option requires PCI support, without it we get a harmless
      Kconfig warning when it gets selected by PUNIT_ATOM_DEBUG:
      
        warning: (X86_INTEL_LPSS && SND_SST_IPC_ACPI && MMC_SDHCI_ACPI && PUNIT_ATOM_DEBUG) selects IOSF_MBI which has unmet direct dependencies (PCI)
      
      This adds another dependency to avoid the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-8-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d689c64d
    • A
      x86/build: Silence the build with "make -s" · d460131d
      Arnd Bergmann 提交于
      Every kernel build on x86 will result in some output:
      
        Setup is 13084 bytes (padded to 13312 bytes).
        System is 4833 kB
        CRC 6d35fa35
        Kernel: arch/x86/boot/bzImage is ready  (#2)
      
      This shuts it up, so that 'make -s' is truely silent as long as
      everything works. Building without '-s' should produce unchanged
      output.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-6-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d460131d
    • A
      x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl · 7206f9bf
      Arnd Bergmann 提交于
      The x86 version of insb/insw/insl uses an inline assembly that does
      not have the target buffer listed as an output. This can confuse
      the compiler, leading it to think that a subsequent access of the
      buffer is uninitialized:
      
        drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
        drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c: In function 'sb1000_rx':
        drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      I tried to mark the exact input buffer as an output here, but couldn't
      figure it out. As suggested by Linus, marking all memory as clobbered
      however is good enough too. For the outs operations, I also add the
      memory clobber, to force the input to be written to local variables.
      This is probably already guaranteed by the "asm volatile", but it can't
      hurt to do this for symmetry.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-5-arnd@arndb.de
      Link: https://lkml.org/lkml/2017/7/12/605Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7206f9bf
    • A
      x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning · 5623452a
      Arnd Bergmann 提交于
      gcc-7.1.1 produces this warning:
      
        arch/x86/math-emu/reg_add_sub.c: In function 'FPU_add':
        arch/x86/math-emu/reg_add_sub.c:80:48: error: ?: using integer constants in boolean context [-Werror=int-in-bool-context]
      
      This appears to be a bug in gcc-7.1.1, and I have reported it as
      PR81484. The compiler suggests that code written as
      
      	if (a & b ? c : d)
      
      is usually incorrect and should have been
      
      	if (a & (b ? c : d))
      
      However, in this case, we correctly write
      
      	if ((a & b) ? c : d)
      
      and should not get a warning for it.
      
      This adds a dirty workaround for the problem, adding a comparison with
      zero inside of the macro. The warning is currently disabled in the kernel,
      so we may decide not to apply the patch, and instead wait for future gcc
      releases to fix the problem. On the other hand, it seems to be the
      only instance of this particular problem.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Bill Metzenthen <billm@melbpc.org.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-4-arnd@arndb.de
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5623452a
    • A
      x86/fpu/math-emu: Fix possible uninitialized variable use · 75e2f0a6
      Arnd Bergmann 提交于
      When building the kernel with "make EXTRA_CFLAGS=...", this overrides
      the "PARANOID" preprocessor macro defined in arch/x86/math-emu/Makefile,
      and we run into a build warning:
      
        arch/x86/math-emu/reg_compare.c: In function ‘compare_i_st_st’:
        arch/x86/math-emu/reg_compare.c:254:6: error: ‘f’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This fixes the implementation to work correctly even without the PARANOID
      flag, and also fixes the Makefile to not use the EXTRA_CFLAGS variable
      but instead use the ccflags-y variable in the Makefile that is meant
      for this purpose.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Bill Metzenthen <billm@melbpc.org.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-3-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75e2f0a6
    • A
      perf/x86: Shut up false-positive -Wmaybe-uninitialized warning · 11d8b058
      Arnd Bergmann 提交于
      The intialization function checks for various failure scenarios, but
      unfortunately the compiler gets a little confused about the possible
      combinations, leading to a false-positive build warning when
      -Wmaybe-uninitialized is set:
      
        arch/x86/events/core.c: In function ‘init_hw_perf_events’:
        arch/x86/events/core.c:264:3: warning: ‘reg_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        arch/x86/events/core.c:264:3: warning: ‘val_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
           pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n",
      
      We can't actually run into this case, so this shuts up the warning
      by initializing the variables to a known-invalid state.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-2-arnd@arndb.de
      Link: https://patchwork.kernel.org/patch/9392595/Signed-off-by: NIngo Molnar <mingo@kernel.org>
      11d8b058
    • K
      x86/defconfig: Remove stale, old Kconfig options · 0e7f0b6c
      Krzysztof Kozlowski 提交于
      Remove old, dead Kconfig options (in order appearing in this commit):
      
       - EXPERIMENTAL is gone since v3.9;
       - IP_NF_TARGET_ULOG: commit d4da843e ("netfilter: kill remnants of ulog targets");
       - USB_LIBUSUAL: commit f61870ee ("usb: remove libusual");
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1500526885-4341-1-git-send-email-krzk@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0e7f0b6c
    • S
      x86/ioapic: Pass the correct data to unmask_ioapic_irq() · e708e35b
      Seunghun Han 提交于
      One of the rarely executed code pathes in check_timer() calls
      unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.
      
      That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
      interrupt 0. irq_get_chip_data(0) returns NULL, so the following
      dereference in unmask_ioapic_irq() causes a kernel panic.
      
      The issue went unnoticed in the first place because irq_get_chip_data()
      returns a void pointer so the compiler cannot do a type check on the
      argument. The code path was added for machines with broken configuration,
      but it seems that those machines are either not running current kernels or
      simply do not longer exist.
      
      Hand in irq_get_irq_data(0) as argument which provides the correct data.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: 4467715a ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
      Signed-off-by: NSeunghun Han <kkamagui@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1500369644-45767-1-git-send-email-kkamagui@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e708e35b
    • S
      x86/acpi: Prevent out of bound access caused by broken ACPI tables · dad5ab0d
      Seunghun Han 提交于
      The bus_irq argument of mp_override_legacy_irq() is used as the index into
      the isa_irq_to_gsi[] array. The bus_irq argument originates from
      ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
      tables, but is nowhere sanity checked.
      
      That allows broken or malicious ACPI tables to overwrite memory, which
      might cause malfunction, panic or arbitrary code execution.
      
      Add a sanity check and emit a warning when that triggers.
      
      [ tglx: Added warning and rewrote changelog ]
      Signed-off-by: NSeunghun Han <kkamagui@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      dad5ab0d
  5. 19 7月, 2017 4 次提交
  6. 18 7月, 2017 10 次提交
    • J
      perf/x86/intel: Fix debug_store reset field for freq events · dc853e26
      Jiri Olsa 提交于
      There's a bug in PEBs event enabling code, that prevents PEBS
      freq events to work properly after non freq PEBS event was run.
      
      freq events - perf_event_attr::freq set
                    -F <freq> option of perf record
      
      PEBS events - perf_event_attr::precise_ip > 0
                    default for perf record
      
      Like in following example with CPU 0 busy, we expect ~10000 samples
      for following perf tool run:
      
        # perf record -F 10000 -C 0 sleep 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.640 MB perf.data (10031 samples) ]
      
      Everything's fine, but once we run non freq PEBS event like:
      
        # perf record -c 10000 -C 0 sleep 1
        [ perf record: Woken up 4 times to write data ]
        [ perf record: Captured and wrote 1.053 MB perf.data (20061 samples) ]
      
      the freq events start to fail like this:
      
        # perf record -F 10000 -C 0 sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.185 MB perf.data (40 samples) ]
      
      The issue is in non freq PEBs event initialization of debug_store reset
      field, which value is used to auto-reload the counter value after PEBS
      event drain. This value is not being used for PEBS freq events, but once
      we run non freq event it stays in debug_store data and screws the
      sample_freq counting for PEBS freq events.
      
      Setting the reset field to 0 for freq events.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170714163551.19459-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dc853e26
    • K
      perf/x86/intel: Add Goldmont Plus CPU PMU support · dd0b06b5
      Kan Liang 提交于
      Add perf core PMU support for Intel Goldmont Plus CPU cores:
      
       - The init code is based on Goldmont.
       - There is a new cache event list, based on the Goldmont cache event
         list.
       - All four general-purpose performance counters support PEBS.
       - The first general-purpose performance counter is for reduced skid
         PEBS mechanism. Using :ppp to indicate the event which want to do
         reduced skid PEBS.
       - Goldmont Plus has 4-wide pipeline for Topdown
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/20170712134423.17766-1-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd0b06b5
    • H
      perf/x86/intel: Enable C-state residency events for Apollo Lake · 5c10b048
      Harry Pan 提交于
      Goldmont microarchitecture supports C1/C3/C6, PC2/PC3/PC6/PC10 state
      residency counters, the patch enables them for Apollo Lake platform.
      
      The MSR information is based on Intel Software Developers' Manual,
      Vol. 4, Order No. 335592, Table 2-6 and 2-12.
      Signed-off-by: NHarry Pan <harry.pan@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@suse.de
      Cc: davidcc@google.com
      Cc: gs0622@gmail.com
      Cc: lukasz.odzioba@intel.com
      Cc: piotr.luc@intel.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/20170717103749.24337-1-harry.pan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c10b048
    • M
      powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y · 029d9252
      Michael Ellerman 提交于
      Currently even with STRICT_KERNEL_RWX we leave the __init text marked
      executable after init, which is bad.
      
      Add a hook to mark it NX (no-execute) before we free it, and implement
      it for radix and hash.
      
      Note that we use __init_end as the end address, not _einittext,
      because overlaps_kernel_text() uses __init_end, because there are
      additional executable sections other than .init.text between
      __init_begin and __init_end.
      
      Tested on radix and hash with:
      
        0:mon> p $__init_begin
        *** 400 exception occurred
      
      Fixes: 1e0fc9d1 ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      029d9252
    • M
      powerpc/mm/hash: Refactor hash__mark_rodata_ro() · fa7f9189
      Michael Ellerman 提交于
      Move the core logic into a helper, so we can use it for changing other
      permissions.
      
      We also change the logic to align start down, and end up. This means
      calling the function with a range will expand that range to be at
      least 1 mmu_linear_psize page in size. We need that so we can use it
      on __init_begin ...  __init_end which is not a full page in size.
      
      This should always work for _stext/__init_begin, because we align
      __init_begin to _stext + 16M in the linker script.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fa7f9189
    • M
      powerpc/mm/radix: Refactor radix__mark_rodata_ro() · b134bd90
      Michael Ellerman 提交于
      Move the core logic into a helper, so we can use it for changing permissions
      other than _PAGE_WRITE.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b134bd90
    • R
      x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT · 4c07f904
      Roman Kagan 提交于
      A recent commit:
      
        d6e41f11 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
      
      introduced a VM_WARN_ON(!in_atomic()) which generates false positives
      on every VM entry on !CONFIG_PREEMPT_COUNT kernels.
      
      Replace it with a test for preemptible(), which appears to match the
      original intent and works across different CONFIG_PREEMPT* variations.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-mm@kvack.org
      Fixes: d6e41f11 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c07f904
    • N
      powerpc/64s: Fix hypercall entry clobbering r12 input · 76fc0cfc
      Nicholas Piggin 提交于
      A previous optimisation incorrectly assumed the PAPR hcall does
      not use r12, and clobbers it upon entry. In fact it is used as
      an input. This can result in KVM guests crashing (observed with
      PR KVM).
      
      Instead of using r12 to save r13, tihs patch saves r13 in ctr.
      This is more costly, but not as slow as using the SPRG.
      
      Fixes: acd7d8ce ("powerpc/64s: Optimize hypercall/syscall entry")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      76fc0cfc
    • N
      powerpc/perf: Avoid spurious PMU interrupts after idle · 101dd590
      Nicholas Piggin 提交于
      POWER9 DD2 can see spurious PMU interrupts after state-loss idle in
      some conditions.
      
      A solution is to save and reload MMCR0 over state-loss idle.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      101dd590
    • G
      Blackfin: flat: Use %x to format u32 · cb0fbbf2
      Geert Uytterhoeven 提交于
      Several variables had their types changed from unsigned long to u32,
      but the printk()-style format to print them wasn't updated, leading to:
      
          arch/blackfin/kernel/flat.c: In function 'bfin_get_addr_from_rp':
          arch/blackfin/kernel/flat.c:35:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'u32' [-Wformat]
          arch/blackfin/kernel/flat.c: In function 'bfin_put_addr_at_rp':
          arch/blackfin/kernel/flat.c:80:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'u32' [-Wformat]
      
      Fixes: 468138d7 ("binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb0fbbf2
  7. 17 7月, 2017 4 次提交
  8. 16 7月, 2017 2 次提交