1. 19 2月, 2017 1 次提交
  2. 18 2月, 2017 2 次提交
    • D
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann 提交于
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
      
      Before:
      
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
      
      After:
      
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        [...]
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74451e66
    • D
      bpf: remove stubs for cBPF from arch code · 9383191d
      Daniel Borkmann 提交于
      Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
      that a single __weak function in the core that can be overridden
      similarly to the eBPF one. Also remove stale pr_err() mentions
      of bpf_jit_compile.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9383191d
  3. 16 2月, 2017 3 次提交
    • K
      ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user() · 9e344048
      Kees Cook 提交于
      The 64-bit get_user() wasn't clearing the high word due to a typo in the
      error handler. The exception handler entry was already correct, though.
      Noticed during recent usercopy test additions in lib/test_user_copy.c.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      9e344048
    • K
      ARM: 8657/1: uaccess: consistently check object sizes · 32b14363
      Kees Cook 提交于
      In commit 76624175 ("arm64: uaccess: consistently check object sizes"),
      the object size checks are moved outside the access_ok() so that bad
      destinations are detected before hitting the "memset(dest, 0, size)" in the
      copy_from_user() failure path.
      
      This makes the same change for arm, with attention given to possibly
      extracting the uaccess routines into a common header file for all
      architectures in the future.
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      32b14363
    • P
      powerpc/64: Disable use of radix under a hypervisor · 3f91a89d
      Paul Mackerras 提交于
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it may try to use the radix MMU even though it doesn't have
      the necessary code to do so (it doesn't negotiate use of radix, and it
      doesn't do the H_REGISTER_PROC_TBL hcall).  If the hypervisor supports
      both radix and HPT, then it will set up the guest to use HPT (since the
      guest doesn't request radix in the CAS call), but if the radix feature
      bit is set in the ibm,pa-features property (which is valid, since
      ibm,pa-features is defined to represent the capabilities of the
      processor) the guest will try to use radix, resulting in a crash when
      it turns the MMU on.
      
      This makes the minimal fix for the current code, which is to disable
      radix unless we are running in hypervisor mode.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3f91a89d
  4. 14 2月, 2017 1 次提交
  5. 11 2月, 2017 1 次提交
  6. 10 2月, 2017 4 次提交
    • A
      x86/mm/ptdump: Fix soft lockup in page table walker · 146fbb76
      Andrey Ryabinin 提交于
      CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow.
      In that case ptdump_walk_pgd_level_core() takes a lot of time to
      walk across all page tables and doing this without
      a rescheduling causes soft lockups:
      
       NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1]
       ...
       Call Trace:
        ptdump_walk_pgd_level_core+0x40c/0x550
        ptdump_walk_pgd_level_checkwx+0x17/0x20
        mark_rodata_ro+0x13b/0x150
        kernel_init+0x2f/0x120
        ret_from_fork+0x2c/0x40
      
      I guess that this issue might arise even without KASAN on huge machines
      with several terabytes of RAM.
      
      Stick cond_resched() in pgd loop to fix this.
      Reported-by: NTobias Regnery <tobias.regnery@gmail.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Alexander Potapenko <glider@google.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      146fbb76
    • T
      x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable · 5f2e71e7
      Thomas Gleixner 提交于
      When the TSC is marked reliable then the synchronization check is skipped,
      but that also skips the TSC ADJUST sanitizing code. So on a machine with a
      wreckaged BIOS the TSC deviation between CPUs might go unnoticed.
      
      Let the TSC adjust sanitizing code run unconditionally and just skip the
      expensive synchronization checks when TSC is marked reliable.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Olof Johansson <olof@lixom.net>
      Link: http://lkml.kernel.org/r/20170209151231.491189912@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5f2e71e7
    • T
      x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST · f2e04214
      Thomas Gleixner 提交于
      Olof reported that on a machine which has a BIOS wreckaged TSC the
      timestamps in dmesg are making a large jump because the TSC value is
      jumping forward after resetting the TSC ADJUST register to a sane value.
      
      This can be avoided by calling the TSC ADJUST saniziting function before
      initializing the per cpu sched clock machinery. That takes the offset into
      account and avoid the time jump.
      
      What cannot be avoided is that the 'Firmware Bug' warnings on the secondary
      CPUs are printed with the large time offsets because it would be too much
      effort and ugly hackery to print those warnings into a buffer and emit them
      after the adjustemt on the starting CPUs. It's a firmware bug and should be
      fixed in firmware. The weird timestamps are collateral damage and just
      illustrate the sillyness of the BIOS folks:
      
      [    0.397445] smp: Bringing up secondary CPUs ...
      [    0.402100] x86: Booting SMP configuration:
      [    0.406343] .... node  #0, CPUs:      #1
      [1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101
      [1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101
      [    0.508119]  #2
      [1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677
      [1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677
      [    0.607643]  #3
      [1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530
      [1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530
      [    0.707108] smp: Brought up 1 node, 4 CPUs
      [    0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS)
      Reported-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f2e04214
    • A
      ARM: orion: remove unused wnr854t_switch_plat_data · 15c2e102
      Arnd Bergmann 提交于
      The other instances of this structure got removed along with the MDIO
      device change, but this one was left behind and needs to be removed
      as well:
      
      arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable]
       static struct dsa_platform_data __initdata wnr854t_switch_plat_data = {
      
      Fixes: 575e93f7 ("ARM: orion: Register DSA switch as a MDIO device")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15c2e102
  7. 09 2月, 2017 4 次提交
  8. 08 2月, 2017 2 次提交
  9. 07 2月, 2017 2 次提交
    • F
      ARM: orion: Register DSA switch as a MDIO device · 575e93f7
      Florian Fainelli 提交于
      Utilize the ability to pass board specific MDIO bus information towards a
      particular MDIO device thus allowing us to provide the per-port switch layout
      to the Marvell 88E6XXX switch driver.
      
      Since we would end-up with conflicting registration paths, do not register the
      "dsa" platform device anymore.
      
      Note that the MDIO devices registered by code in net/dsa/dsa2.c does not
      parse a dsa_platform_data, but directly take a dsa_chip_data (specific
      to a single switch chip), so we update the different call sites to pass
      this structure down to orion_ge00_switch_init().
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      575e93f7
    • A
      ARM: defconfigs: make NF_CT_PROTO_SCTP and NF_CT_PROTO_UDPLITE built-in · 5aff1d24
      Arnd Bergmann 提交于
      The symbols can no longer be used as loadable modules, leading to a harmless Kconfig
      warning:
      
      arch/arm/configs/imote2_defconfig:60:warning: symbol value 'm' invalid for NF_CT_PROTO_UDPLITE
      arch/arm/configs/imote2_defconfig:59:warning: symbol value 'm' invalid for NF_CT_PROTO_SCTP
      arch/arm/configs/ezx_defconfig:68:warning: symbol value 'm' invalid for NF_CT_PROTO_UDPLITE
      arch/arm/configs/ezx_defconfig:67:warning: symbol value 'm' invalid for NF_CT_PROTO_SCTP
      
      Let's make them built-in.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      5aff1d24
  10. 05 2月, 2017 2 次提交
    • Y
      x86/CPU/AMD: Fix Zen SMT topology · 08b25963
      Yazen Ghannam 提交于
      After:
      
        a33d3317 ("x86/CPU/AMD: Fix Bulldozer topology")
      
      our  SMT scheduling topology for Fam17h systems is broken, because
      the ThreadId is included in the ApicId when SMT is enabled.
      
      So, without further decoding cpu_core_id is unique for each thread
      rather than the same for threads on the same core. This didn't affect
      systems with SMT disabled. Make cpu_core_id be what it is defined to be.
      Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.9
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      08b25963
    • B
      x86/CPU/AMD: Bring back Compute Unit ID · 79a8b9aa
      Borislav Petkov 提交于
      Commit:
      
        a33d3317 ("x86/CPU/AMD: Fix Bulldozer topology")
      
      restored the initial approach we had with the Fam15h topology of
      enumerating CU (Compute Unit) threads as cores. And this is still
      correct - they're beefier than HT threads but still have some
      shared functionality.
      
      Our current approach has a problem with the Mad Max Steam game, for
      example. Yves Dionne reported a certain "choppiness" while playing on
      v4.9.5.
      
      That problem stems most likely from the fact that the CU threads share
      resources within one CU and when we schedule to a thread of a different
      compute unit, this incurs latency due to migrating the working set to a
      different CU through the caches.
      
      When the thread siblings mask mirrors that aspect of the CUs and
      threads, the scheduler pays attention to it and tries to schedule within
      one CU first. Which takes care of the latency, of course.
      Reported-by: NYves Dionne <yves.dionne@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.9
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yazen Ghannam <yazen.ghannam@amd.com>
      Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79a8b9aa
  11. 04 2月, 2017 2 次提交
    • R
      KVM: x86: do not save guest-unsupported XSAVE state · 00c87e9a
      Radim Krčmář 提交于
      Saving unsupported state prevents migration when the new host does not
      support a XSAVE feature of the original host, even if the feature is not
      exposed to the guest.
      
      We've masked host features with guest-visible features before, with
      4344ee98 ("KVM: x86: only copy XSAVE state for the supported
      features") and dropped it when implementing XSAVES.  Do it again.
      
      Fixes: df1daba7 ("KVM: x86: support XSAVES usage in the host")
      Cc: stable@vger.kernel.org
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      00c87e9a
    • A
      modversions: treat symbol CRCs as 32 bit quantities · 71810db2
      Ard Biesheuvel 提交于
      The modversion symbol CRCs are emitted as ELF symbols, which allows us
      to easily populate the kcrctab sections by relying on the linker to
      associate each kcrctab slot with the correct value.
      
      This has a couple of downsides:
      
       - Given that the CRCs are treated as memory addresses, we waste 4 bytes
         for each CRC on 64 bit architectures,
      
       - On architectures that support runtime relocation, a R_<arch>_RELATIVE
         relocation entry is emitted for each CRC value, which identifies it
         as a quantity that requires fixing up based on the actual runtime
         load offset of the kernel. This results in corrupted CRCs unless we
         explicitly undo the fixup (and this is currently being handled in the
         core module code)
      
       - Such runtime relocation entries take up 24 bytes of __init space
         each, resulting in a x8 overhead in [uncompressed] kernel size for
         CRCs.
      
      Switching to explicit 32 bit values on 64 bit architectures fixes most
      of these issues, given that 32 bit values are not treated as quantities
      that require fixing up based on the actual runtime load offset.  Note
      that on some ELF64 architectures [such as PPC64], these 32-bit values
      are still emitted as [absolute] runtime relocatable quantities, even if
      the value resolves to a build time constant.  Since relative relocations
      are always resolved at build time, this patch enables MODULE_REL_CRCS on
      powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
      references into relative references into .rodata where the actual CRC
      value is stored.
      
      So redefine all CRC fields and variables as u32, and redefine the
      __CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
      inline assembler (which is necessary since 64-bit C code cannot use
      32-bit types to hold memory addresses, even if they are ultimately
      resolved using values that do not exceed 0xffffffff).  To avoid
      potential problems with legacy 32-bit architectures using legacy
      toolchains, the equivalent C definition of the kcrctab entry is retained
      for 32-bit architectures.
      
      Note that this mostly reverts commit d4703aef ("module: handle ppc64
      relocating kcrctabs when CONFIG_RELOCATABLE=y")
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71810db2
  12. 03 2月, 2017 2 次提交
  13. 01 2月, 2017 6 次提交
    • T
      perf/x86/intel/uncore: Make package handling more robust · fff4b87e
      Thomas Gleixner 提交于
      The package management code in uncore relies on package mapping being
      available before a CPU is started. This changed with:
      
        9d85eb91 ("x86/smpboot: Make logical package management more robust")
      
      because the ACPI/BIOS information turned out to be unreliable, but that
      left uncore in broken state. This was not noticed because on a regular boot
      all CPUs are online before uncore is initialized.
      
      Move the allocation to the CPU online callback and simplify the hotplug
      handling. At this point the package mapping is established and correct.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
      Link: http://lkml.kernel.org/r/20170131230141.377156255@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fff4b87e
    • T
      perf/x86/intel/uncore: Clean up hotplug conversion fallout · 1aa6cfd3
      Thomas Gleixner 提交于
      The recent conversion to the hotplug state machine kept two mechanisms from
      the original code:
      
       1) The first_init logic which adds the number of online CPUs in a package
          to the refcount. That's wrong because the callbacks are executed for
          all online CPUs.
      
          Remove it so the refcounting is correct.
      
       2) The on_each_cpu() call to undo box->init() in the error handling
          path. That's bogus because when the prepare callback fails no box has
          been initialized yet.
      
          Remove it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Fixes: 1a246b9f ("perf/x86/intel/uncore: Convert to hotplug state machine")
      Link: http://lkml.kernel.org/r/20170131230141.298032324@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1aa6cfd3
    • T
      perf/x86/intel/rapl: Make package handling more robust · dd86e373
      Thomas Gleixner 提交于
      The package management code in RAPL relies on package mapping being
      available before a CPU is started. This changed with:
      
        9d85eb91 ("x86/smpboot: Make logical package management more robust")
      
      because the ACPI/BIOS information turned out to be unreliable, but that
      left RAPL in broken state. This was not noticed because on a regular boot
      all CPUs are online before RAPL is initialized.
      
      A possible fix would be to reintroduce the mess which allocates a package
      data structure in CPU prepare and when it turns out to already exist in
      starting throw it away later in the CPU online callback. But that's a
      horrible hack and not required at all because RAPL becomes functional for
      perf only in the CPU online callback. That's correct because user space is
      not yet informed about the CPU being onlined, so nothing caan rely on RAPL
      being available on that particular CPU.
      
      Move the allocation to the CPU online callback and simplify the hotplug
      handling. At this point the package mapping is established and correct.
      
      This also adds a missing check for available package data in the
      event_init() function.
      Reported-by: NYasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
      Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd86e373
    • M
      xtensa: fix noMMU build on cores with MMU · 4b3e6f2e
      Max Filippov 提交于
      Commit bf15f86b ("xtensa: initialize MMU before jumping to reset
      vector") calls MMU management functions even when CONFIG_MMU is not
      selected. That breaks noMMU build on cores with MMU.
      
      Don't manage MMU when CONFIG_MMU is not selected.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      4b3e6f2e
    • T
      x86/mce: Make timer handling more robust · 0becc0ae
      Thomas Gleixner 提交于
      Erik reported that on a preproduction hardware a CMCI storm triggers the
      BUG_ON in add_timer_on(). The reason is that the per CPU MCE timer is
      started by the CMCI logic before the MCE CPU hotplug callback starts the
      timer with add_timer_on(). So the timer is already queued which triggers
      the BUG.
      
      Using add_timer_on() is pretty pointless in this code because the timer is
      strictlty per CPU, initialized as pinned and all operations which arm the
      timer happen on the CPU to which the timer belongs.
      
      Simplify the whole machinery by using mod_timer() instead of add_timer_on()
      which avoids the problem because mod_timer() can handle already queued
      timers. Use __start_timer() everywhere so the earliest armed expiry time is
      preserved.
      Reported-by: NErik Veijola <erik.veijola@intel.com>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701310936080.3457@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      0becc0ae
    • T
      x86/irq: Make irq activate operations symmetric · aaaec6fc
      Thomas Gleixner 提交于
      The recent commit which prevents double activation of interrupts unearthed
      interesting code in x86. The code (ab)uses irq_domain_activate_irq() to
      reconfigure an already activated interrupt. That trips over the prevention
      code now.
      
      Fix it by deactivating the interrupt before activating the new configuration.
      
      Fixes: 08d85f3e "irqdomain: Avoid activating interrupts more than once"
      Reported-and-tested-by: NMike Galbraith <efault@gmx.de>
      Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311901580.3457@nanos
      aaaec6fc
  14. 31 1月, 2017 4 次提交
  15. 30 1月, 2017 4 次提交
    • D
      ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write · 228dbbfb
      Dave Martin 提交于
      Ensure that if userspace supplies insufficient data to
      PTRACE_SETREGSET to fill all the registers, the thread's old
      registers are preserved.
      
      Cc: <stable@vger.kernel.org> # 3.0.x-
      Fixes: 5be6f62b ("ARM: 6883/1: ptrace: Migrate to regsets framework")
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      228dbbfb
    • A
      ARM: 8642/1: LPAE: catch pending imprecise abort on unmask · 97a98ae5
      Alexander Sverdlin 提交于
      Asynchronous external abort is coded differently in DFSR with LPAE enabled.
      
      Fixes: 9254970c "ARM: 8447/1: catch pending imprecise abort on unmask".
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      97a98ae5
    • B
      x86/microcode: Do not access the initrd after it has been freed · 24c25032
      Borislav Petkov 提交于
      When we look for microcode blobs, we first try builtin and if that
      doesn't succeed, we fallback to the initrd supplied to the kernel.
      
      However, at some point doing boot, that initrd gets jettisoned and we
      shouldn't access it anymore. But we do, as the below KASAN report shows.
      That's because find_microcode_in_initrd() doesn't check whether the
      initrd is still valid or not.
      
      So do that.
      
        ==================================================================
        BUG: KASAN: use-after-free in find_cpio_data
        Read of size 1 by task swapper/1/0
        page:ffffea0000db9d40 count:0 mapcount:0 mapping:          (null) index:0x1
        flags: 0x100000000000000()
        raw: 0100000000000000 0000000000000000 0000000000000001 00000000ffffffff
        raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
        page dumped because: kasan: bad access detected
        CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.10.0-rc5-debug-00075-g2dbde22 #3
        Hardware name: Dell Inc. XPS 13 9360/0839Y6, BIOS 1.2.3 12/01/2016
        Call Trace:
         dump_stack
         ? _atomic_dec_and_lock
         ? __dump_page
         kasan_report_error
         ? pointer
         ? find_cpio_data
         __asan_report_load1_noabort
         ? find_cpio_data
         find_cpio_data
         ? vsprintf
         ? dump_stack
         ? get_ucode_user
         ? print_usage_bug
         find_microcode_in_initrd
         __load_ucode_intel
         ? collect_cpu_info_early
         ? debug_check_no_locks_freed
         load_ucode_intel_ap
         ? collect_cpu_info
         ? trace_hardirqs_on
         ? flat_send_IPI_mask_allbutself
         load_ucode_ap
         ? get_builtin_firmware
         ? flush_tlb_func
         ? do_raw_spin_trylock
         ? cpumask_weight
         cpu_init
         ? trace_hardirqs_off
         ? play_dead_common
         ? native_play_dead
         ? hlt_play_dead
         ? syscall_init
         ? arch_cpu_idle_dead
         ? do_idle
         start_secondary
         start_cpu
        Memory state around the buggy address:
         ffff880036e74f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
         ffff880036e74f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        >ffff880036e75000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                           ^
         ffff880036e75080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
         ffff880036e75100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ==================================================================
      Reported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Tested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170126165833.evjemhbqzaepirxo@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      24c25032
    • R
      powerpc/mm: Use the correct pointer when setting a 2MB pte · a0615a16
      Reza Arbab 提交于
      When setting a 2MB pte, radix__map_kernel_page() is using the address
      
      	ptep = (pte_t *)pudp;
      
      Fix this conversion to use pmdp instead. Use pmdp_ptep() to do this
      instead of casting the pointer.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a0615a16