1. 28 6月, 2017 8 次提交
    • P
      MIPS: Perform post-DMA cache flushes on systems with MAARs · 498e9ade
      Paul Burton 提交于
      Recent CPUs from Imagination Technologies such as the I6400 or P6600 are
      able to speculatively fetch data from memory into caches. This means
      that if used in a system with non-coherent DMA they require that caches
      be invalidated after a device performs DMA, and before the CPU reads the
      DMA'd data, in order to ensure that stale values weren't speculatively
      prefetched.
      
      Such CPUs also introduced Memory Accessibility Attribute Registers
      (MAARs) in order to control the regions in which they are allowed to
      speculate. Thus we can use the presence of MAARs as a good indication
      that the CPU requires the above cache maintenance. Use the presence of
      MAARs to determine the result of cpu_needs_post_dma_flush() in the
      default case, in order to handle these recent CPUs correctly.
      
      Note that the return type of cpu_needs_post_dma_flush() is changed to
      bool, such that it's clearer what's happening when cpu_has_maar is cast
      to bool for the return value. If this patch were backported to a
      pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int
      we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is
      currently 1ull<<30, so when truncated to an int gives a non-zero value
      anyway, but even so the implicit conversion from long long int to bool
      makes it clearer to understand what will happen than the implicit
      conversion from long long int to int would. The bool return type also
      fits this usage better semantically, so seems like an all-round win.
      
      Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting
      the return type change.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Ed Blake <ed.blake@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16363/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      498e9ade
    • D
      MIPS: Give __secure_computing() access to syscall arguments. · 669c4092
      David Daney 提交于
      KProbes of __seccomp_filter() are not very useful without access to
      the syscall arguments.
      
      Do what x86 does, and populate a struct seccomp_data to be passed to
      __secure_computing().  This allows samples/bpf/tracex5 to extract a
      sensible trace.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16368/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      669c4092
    • D
      MIPS: Add support for eBPF JIT. · f381bf6d
      David Daney 提交于
      Since the eBPF machine has 64-bit registers, we only support this in
      64-bit kernels.  As of the writing of this commit log test-bpf is showing:
      
        test_bpf: Summary: 316 PASSED, 0 FAILED, [308/308 JIT'ed]
      
      All current test cases are successfully compiled.
      
      Many examples in samples/bpf are usable, specifically tracex5 which
      uses tail calls works.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16369/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      f381bf6d
    • D
      MIPS: Add some instructions to uasm. · dc190129
      David Daney 提交于
      Follow on patches for eBPF JIT require these additional instructions:
      
         insn_bgtz, insn_blez, insn_break, insn_ddivu, insn_dmultu,
         insn_dsbh, insn_dshd, insn_dsllv, insn_dsra32, insn_dsrav,
         insn_dsrlv, insn_lbu, insn_movn, insn_movz, insn_multu, insn_nor,
         insn_sb, insn_sh, insn_slti, insn_dinsu, insn_lwu
      
      ... so, add them.
      
      Sort the insn_* enumeration values alphabetically.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16367/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      dc190129
    • D
      MIPS: Correctly define DBSHFL type instruction opcodes. · 1f22d599
      David Daney 提交于
      DSHD was incorrectly classified as being BSHFL, and DSHD was missing
      altogether.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16366/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      1f22d599
    • D
      MIPS: Optimize uasm insn lookup. · ce807d5f
      David Daney 提交于
      Instead of doing a linear search through the insn_table for each
      instruction, use the opcode as direct index into the table.  This will
      give constant time lookup performance as the number of supported
      opcodes increases.  Make the tables const as they are only ever read.
      For uasm-mips.c sort the table alphabetically, and remove duplicate
      entries, uasm-micromips.c was already sorted and duplicate free.
      There is a small savings in object size as struct insn loses a field:
      
      $ size arch/mips/mm/uasm-mips.o arch/mips/mm/uasm-mips.o.save
         text	   data	    bss	    dec	    hex	filename
        10040	      0	      0	  10040	   2738	arch/mips/mm/uasm-mips.o
         9240	   1120	      0	  10360	   2878	arch/mips/mm/uasm-mips.o.save
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16365/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      ce807d5f
    • P
      MIPS: module: Unify rel & rela reloc handling · 430d0b88
      Paul Burton 提交于
      The module load code has previously had entirely separate
      implementations for rel & rela style relocs, which unnecessarily
      duplicates a whole lot of code. Unify the implementations of both types
      of reloc, sharing the bulk of the code.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15832/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      430d0b88
    • P
      MIPS: module: Ensure we always clean up r_mips_hi16_list · 351b0940
      Paul Burton 提交于
      If we hit an error whilst processing a reloc then we would return early
      from apply_relocate & potentially not free entries in r_mips_hi16_list,
      thereby leaking memory. Fix this by ensuring that we always run the code
      to free r_mipps_hi16_list when errors occur.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 861667dc ("MIPS: Fix race condition in module relocation code.")
      Fixes: 04211a57 ("MIPS: Bail on unsupported module relocs")
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15831/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      351b0940
  2. 27 6月, 2017 2 次提交
    • K
      MIPS: defconfig: Cleanup from old Kconfig options · 59baa24d
      Krzysztof Kozlowski 提交于
      Remove old, dead Kconfig options (in order appearing in this commit):
       - EXPERIMENTAL is gone since v3.9;
       - INET_LRO: commit 7bbf3cae ("ipv4: Remove inet_lro library");
       - MTD_CONCAT: commit f53fdebc ("mtd: drop MTD_CONCAT from Kconfig
         entirely");
       - MTD_CHAR: commit 660685d9 ("mtd: merge mtdchar module with
         mtdcore");
       - NETDEV_1000 and NETDEV_10000: commit f860b052 ("drivers/net:
         Kconfig and Makefile cleanup"); NET_ETHERNET should be replaced with
         just ETHERNET but that is separate change;
       - MISC_DEVICES: commit 7c5763b8 ("drivers: misc: Remove
         MISC_DEVICES config option");
       - HID_SUPPORT: commit 1f41a6a9 ("HID: Fix the generic Kconfig
         options");
       - BT_L2CAP and BT_SCO: commit f1e91e16 ("Bluetooth: Always compile
         SCO and L2CAP in Bluetooth Core");
       - DEBUG_ERRORS: commit b025a3f8 ("ARM: 6876/1: Kconfig.debug:
         Remove unused CONFIG_DEBUG_ERRORS");
       - USB_DEVICE_CLASS: commit 007bab91 ("USB: remove
         CONFIG_USB_DEVICE_CLASS");
       - RCU_CPU_STALL_DETECTOR: commit a00e0d71 ("rcu: Remove conditional
         compilation for RCU CPU stall warnings");
       - IP_NF_QUEUE: commit 3dd6664f ("netfilter: remove unused "config
         IP_NF_QUEUE"");
       - IP_NF_TARGET_ULOG: commit d4da843e ("netfilter: kill remnants of
         ulog targets");
       - IP6_NF_QUEUE: commit d16cf20e ("netfilter: remove ip_queue
         support");
       - IP6_NF_TARGET_LOG: commit 6939c33a ("netfilter: merge ipt_LOG and
         ip6_LOG into xt_LOG");
       - USB_LED: commit a335aaf3 ("usb: misc: remove outdated USB LED
         driver");
       - MMC_UNSAFE_RESUME: commit 2501c917 ("mmc: core: Use
         MMC_UNSAFE_RESUME as default behavior");
       - AUTOFS_FS: commit 561c5cf9 ("staging: Remove autofs3");
       - VIDEO_OUTPUT_CONTROL: commit f167a64e ("video / output: Drop
         display output class support");
       - USB_LIBUSUAL: commit f61870ee ("usb: remove libusual");
       - CRYPTO_ZLIB: 11049218 ("crypto: compress - remove unused pcomp
         interface");
       - BLK_DEV_UB: commit 68a5059e ("block: remove the deprecated ub
         driver");
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: bcm-kernel-feedback-list@broadcom.com
      Cc: linux-mips@linux-mips.org
      Cc: linux-arm-kernel@lists.infradead.org
      Patchwork: https://patchwork.linux-mips.org/patch/16342/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      59baa24d
    • M
      MIPS: Sort MIPS Kconfig Alphabetically. · 12597988
      Matt Redfearn 提交于
      Sort the entries in config MIPS alphabetically so as to make entries
      easier to find.
      Signed-off-by: NMatt Redfearn <matt.redfearn@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16068/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      12597988
  3. 24 6月, 2017 1 次提交
  4. 23 6月, 2017 1 次提交
    • N
      powerpc/64: Initialise thread_info for emergency stacks · 34f19ff1
      Nicholas Piggin 提交于
      Emergency stacks have their thread_info mostly uninitialised, which in
      particular means garbage preempt_count values.
      
      Emergency stack code runs with interrupts disabled entirely, and is
      used very rarely, so this has been unnoticed so far. It was found by a
      proposed new powerpc watchdog that takes a soft-NMI directly from the
      masked_interrupt handler and using the emergency stack. That crashed
      at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
      garbage.
      
      To fix this, zero the entire THREAD_SIZE allocation, and initialize
      the thread_info.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Move it all into setup_64.c, use a function not a macro. Fix
            crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34f19ff1
  5. 22 6月, 2017 4 次提交
    • P
      KVM: x86: fix singlestepping over syscall · c8401dda
      Paolo Bonzini 提交于
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c8401dda
    • A
      powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD · bbd5ff50
      Alistair Popple 提交于
      NPU2 requires an extra explicit flush to an active GPU PID when
      sending address translation shoot downs (ATSDs) to reliably flush the
      GPU TLB. This patch adds just such a flush at the end of each sequence
      of ATSDs.
      
      We can safely use PID 0 which is always reserved and active on the
      GPU. PID 0 is only used for init_mm which will never be a user mm on
      the GPU. To enforce this we add a check in pnv_npu2_init_context()
      just in case someone tries to use PID 0 on the GPU.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Use true/false for bool literals]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bbd5ff50
    • H
      KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows · addb63c1
      Heiko Carstens 提交于
      For real-space designation asces the asce origin part is only a token.
      The asce token origin must not be used to generate an effective
      address for storage references. This however is erroneously done
      within kvm_s390_shadow_tables().
      
      Furthermore within the same function the wrong parts of virtual
      addresses are used to generate a corresponding real address
      (e.g. the region second index is used as region first index).
      
      Both of the above can result in incorrect address translations. Only
      for real space designations with a token origin of zero and addresses
      below one megabyte the translation was correct.
      
      Furthermore replace a "!asce.r" statement with a "!*fake" statement to
      make it more obvious that a specific condition has nothing to do with
      the architecture, but with the fake handling of real space designations.
      
      Fixes: 3218f709 ("s390/mm: support real-space for gmap shadows")
      Cc: David Hildenbrand <david@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      addb63c1
    • K
      perf/x86/intel: Add 1G DTLB load/store miss support for SKL · fb3a5055
      Kan Liang 提交于
      Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
      4M page size.
      Need to extend the events to support any page size (4K/2M/4M/1G).
      
      The complete DTLB load/store miss events are:
      
        DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
        DTLB_STORE_MISSES.WALK_COMPLETED		0xe49
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb3a5055
  6. 20 6月, 2017 2 次提交
    • J
      KVM: MIPS: Fix maybe-uninitialized build failure · e27a9eca
      James Cowgill 提交于
      This commit fixes a "maybe-uninitialized" build failure in
      arch/mips/kvm/tlb.c when KVM, DYNAMIC_DEBUG and JUMP_LABEL are all
      enabled. The failure is:
      
      In file included from ./include/linux/printk.h:329:0,
                       from ./include/linux/kernel.h:13,
                       from ./include/asm-generic/bug.h:15,
                       from ./arch/mips/include/asm/bug.h:41,
                       from ./include/linux/bug.h:4,
                       from ./include/linux/thread_info.h:11,
                       from ./include/asm-generic/current.h:4,
                       from ./arch/mips/include/generated/asm/current.h:1,
                       from ./include/linux/sched.h:11,
                       from arch/mips/kvm/tlb.c:13:
      arch/mips/kvm/tlb.c: In function ‘kvm_mips_host_tlb_inv’:
      ./include/linux/dynamic_debug.h:126:3: error: ‘idx_kernel’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         __dynamic_pr_debug(&descriptor, pr_fmt(fmt), \
         ^~~~~~~~~~~~~~~~~~
      arch/mips/kvm/tlb.c:169:16: note: ‘idx_kernel’ was declared here
        int idx_user, idx_kernel;
                      ^~~~~~~~~~
      
      There is a similar error relating to "idx_user". Both errors were
      observed with GCC 6.
      
      As far as I can tell, it is impossible for either idx_user or idx_kernel
      to be uninitialized when they are later read in the calls to kvm_debug,
      but to satisfy the compiler, add zero initializers to both variables.
      Signed-off-by: NJames Cowgill <James.Cowgill@imgtec.com>
      Fixes: 57e3869c ("KVM: MIPS/TLB: Generalise host TLB invalidate to kernel ASID")
      Cc: <stable@vger.kernel.org> # 4.11+
      Acked-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e27a9eca
    • W
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · dbb236c1
      Will Deacon 提交于
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Acked-by: NKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dbb236c1
  7. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  8. 16 6月, 2017 8 次提交
    • R
      powerpc/perf: Fix oops when kthread execs user process · bf05fc25
      Ravi Bangoria 提交于
      When a kthread calls call_usermodehelper() the steps are:
        1. allocate current->mm
        2. load_elf_binary()
        3. populate current->thread.regs
      
      While doing this, interrupts are not disabled. If there is a perf
      interrupt in the middle of this process (i.e. step 1 has completed
      but not yet reached to step 3) and if perf tries to read userspace
      regs, kernel oops with following log:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000da0fc
        ...
        Call Trace:
        perf_output_sample_regs+0x6c/0xd0
        perf_output_sample+0x4e4/0x830
        perf_event_output_forward+0x64/0x90
        __perf_event_overflow+0x8c/0x1e0
        record_and_restart+0x220/0x5c0
        perf_event_interrupt+0x2d8/0x4d0
        performance_monitor_exception+0x54/0x70
        performance_monitor_common+0x158/0x160
        --- interrupt: f01 at avtab_search_node+0x150/0x1a0
            LR = avtab_search_node+0x100/0x1a0
        ...
        load_elf_binary+0x6e8/0x15a0
        search_binary_handler+0xe8/0x290
        do_execveat_common.isra.14+0x5f4/0x840
        call_usermodehelper_exec_async+0x170/0x210
        ret_from_kernel_thread+0x5c/0x7c
      
      Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
      pt_regs are not set.
      
      Fixes: ed4a4ef8 ("powerpc/perf: Add support for sampling interrupt register state")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bf05fc25
    • N
      powerpc/64s: Handle data breakpoints in Radix mode · d89ba535
      Naveen N. Rao 提交于
      On Power9, trying to use data breakpoints throws the splat shown
      below. This is because the check for a data breakpoint in DSISR is in
      do_hash_page(), which is not called when in Radix mode.
      
        Unable to handle kernel paging request for data at address 0xc000000000e19218
        Faulting instruction address: 0xc0000000001155e8
        cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
        pc: c0000000001155e8: find_pid_ns+0x48/0xe0
        lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
        sp: c0000000ef1e7da0
        msr: 9000000000009033
        dar: c000000000e19218
        dsisr: 400000
      
      Move the check to handle_page_fault() so as to catch data breakpoints
      in both Hash and Radix MMU modes.
      
      We have to change the check in do_hash_page() against 0xa410 to use
      0xa450, so as to include the value of (DSISR_DABRMATCH << 16).
      
      There are two sites that call handle_page_fault() when in Radix, both
      already pass DSISR in r4.
      
      Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NShriya R. Kulkarni <shriykul@in.ibm.com>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Fix the fall-through case on hash, we need to reload DSISR]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d89ba535
    • N
      powerpc/kprobes: Skip livepatch_handler() for jprobes · c05b8c44
      Naveen N. Rao 提交于
      ftrace_caller() depends on a modified regs->nip to detect if a certain
      function has been livepatched. However, with KPROBES_ON_FTRACE, it is
      possible for regs->nip to have been modified by the kprobes pre_handler
      (jprobes, for instance). In this case, we do not want to invoke the
      livepatch_handler so as not to consume the livepatch stack.
      
      To distinguish between the two (kprobes and livepatch), we check if
      there is an active kprobe on the current function. If there is, then we
      know for sure that it must have modified the NIP as we don't support
      livepatching a kprobe'd function. In this case, we simply skip the
      livepatch_handler and branch to the new NIP. Otherwise, the
      livepatch_handler is invoked.
      
      Fixes: ead514d5 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c05b8c44
    • N
      powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS · a4979a7e
      Naveen N. Rao 提交于
      For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set
      of registers in pt_regs, to capture the state _before_ ftrace_caller.
      However, we are instead passing the stack pointer *after* allocating a
      stack frame in ftrace_caller. Fix this by saving the proper value of r1
      in pt_regs. Also, use SAVE_10GPRS() to simplify the code.
      
      Fixes: 15308664 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4979a7e
    • N
      powerpc/kprobes: Pause function_graph tracing during jprobes handling · a9f8553e
      Naveen N. Rao 提交于
      This fixes a crash when function_graph and jprobes are used together.
      This is essentially commit 237d28db ("ftrace/jprobes/x86: Fix
      conflict between jprobes and function graph tracing"), but for powerpc.
      
      Jprobes breaks function_graph tracing since the jprobe hook needs to use
      jprobe_return(), which never returns back to the hook, but instead to
      the original jprobe'd function. The solution is to momentarily pause
      function_graph tracing before invoking the jprobe hook and re-enable it
      when returning back to the original jprobe'd function.
      
      Fixes: 6794c782 ("powerpc64: port of the function graph tracer")
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a9f8553e
    • A
      powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path · a093c92d
      Alexey Kardashevskiy 提交于
      When trapped on WARN_ON(), report_bug() is expected to return
      BUG_TRAP_TYPE_WARN so the caller will increment NIP by 4 and continue.
      The __builtin_constant_p() path of the PPC's WARN_ON()
      calls (indirectly) __WARN_FLAGS() which has BUGFLAG_WARNING set,
      however the other branch does not which makes report_bug() report a
      bug rather than a warning.
      
      Fixes: f26dee15 ("debug: Avoid setting BUGFLAG_WARNING twice")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a093c92d
    • P
      KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · 3d3efb68
      Paul Mackerras 提交于
      POWER9 DD1 has an erratum where writing to the TBU40 register, which
      is used to apply an offset to the timebase, can cause the timebase to
      lose counts.  This results in the timebase on some CPUs getting out of
      sync with other CPUs, which then results in misbehaviour of the
      timekeeping code.
      
      To work around the problem, we make KVM ignore the timebase offset for
      all guests on POWER9 DD1 machines.  This means that live migration
      cannot be supported on POWER9 DD1 machines.
      
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3d3efb68
    • P
      KVM: PPC: Book3S HV: Save/restore host values of debug registers · 7ceaa6dc
      Paul Mackerras 提交于
      At present, HV KVM on POWER8 and POWER9 machines loses any instruction
      or data breakpoint set in the host whenever a guest is run.
      Instruction breakpoints are currently only used by xmon, but ptrace
      and the perf_event subsystem can set data breakpoints as well as xmon.
      
      To fix this, we save the host values of the debug registers (CIABR,
      DAWR and DAWRX) before entering the guest and restore them on exit.
      To provide space to save them in the stack frame, we expand the stack
      frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7ceaa6dc
  9. 15 6月, 2017 5 次提交
    • B
      powerpc/xive: Fix offset for store EOI MMIOs · 25642705
      Benjamin Herrenschmidt 提交于
      Architecturally we should apply a 0x400 offset for these. Not doing
      it will break future HW implementations.
      
      The offset of 0 is supposed to remain for "triggers" though not all
      sources support both trigger and store EOI, and in P9 specifically,
      some sources will treat 0 as a store EOI. But future chips will not.
      So this makes us use the properly architected offset which should work
      always.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25642705
    • P
      MIPS: .its targets depend on vmlinux · bcd7c45e
      Paul Burton 提交于
      The .its targets require information about the kernel binary, such as
      its entry point, which is extracted from the vmlinux ELF. We therefore
      require that the ELF is built before the .its files are generated.
      Declare this requirement in the Makefile such that make will ensure this
      is always the case, otherwise in corner cases we can hit issues as the
      .its is generated with an incorrect (either invalid or stale) entry
      point.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: cf2a5e0b ("MIPS: Support generating Flattened Image Trees (.itb)")
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org> # v4.9+
      Patchwork: https://patchwork.linux-mips.org/patch/16179/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      bcd7c45e
    • P
      MIPS: Fix bnezc/jialc return address calculation · 1a73d931
      Paul Burton 提交于
      The code handling the pop76 opcode (ie. bnezc & jialc instructions) in
      __compute_return_epc_for_insn() needs to set the value of $31 in the
      jialc case, which is encoded with rs = 0. However its check to
      differentiate bnezc (rs != 0) from jialc (rs = 0) was unfortunately
      backwards, meaning that if we emulate a bnezc instruction we clobber $31
      & if we emulate a jialc instruction it actually behaves like a jic
      instruction.
      
      Fix this by inverting the check of rs to match the way the instructions
      are actually encoded.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 28d6f93d ("MIPS: Emulate the new MIPS R6 BNEZC and JIALC instructions")
      Cc: stable <stable@vger.kernel.org> # v4.0+
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16178/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      1a73d931
    • P
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 46a704f8
      Paul Mackerras 提交于
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      46a704f8
    • P
      KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 4c3bb4cc
      Paul Mackerras 提交于
      This restores several special-purpose registers (SPRs) to sane values
      on guest exit that were missed before.
      
      TAR and VRSAVE are readable and writable by userspace, and we need to
      save and restore them to prevent the guest from potentially affecting
      userspace execution (not that TAR or VRSAVE are used by any known
      program that run uses the KVM_RUN ioctl).  We save/restore these
      in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.
      
      FSCR affects userspace execution in that it can prohibit access to
      certain facilities by userspace.  We restore it to the normal value
      for the task on exit from the KVM_RUN ioctl.
      
      IAMR is normally 0, and is restored to 0 on guest exit.  However,
      with a radix host on POWER9, it is set to a value that prevents the
      kernel from executing user-accessible memory.  On POWER9, we save
      IAMR on guest entry and restore it on guest exit to the saved value
      rather than 0.  On POWER8 we continue to set it to 0 on guest exit.
      
      PSPB is normally 0.  We restore it to 0 on guest exit to prevent
      userspace taking advantage of the guest having set it non-zero
      (which would allow userspace to set its SMT priority to high).
      
      UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
      the AMR from being used as a covert channel between userspace
      processes, since the AMR is not context-switched at present.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4c3bb4cc
  10. 14 6月, 2017 1 次提交
  11. 13 6月, 2017 3 次提交
    • V
      x86/mm: Disable 1GB direct mappings when disabling 2MB mappings · d9ee35ac
      Vlastimil Babka 提交于
      The kmemleak and debug_pagealloc features both disable using huge pages for
      direct mappings so they can do cpa() on page level granularity in any context.
      
      However they only do that for 2MB pages, which means 1GB pages can still be
      used if the CPU supports it, unless disabled by a boot param, which is
      non-obvious. Disable also 1GB pages when disabling 2MB pages.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegardno@ifi.uio.no>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/2be70c78-6130-855d-3dfa-d87bd1dd4fda@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d9ee35ac
    • P
      KVM: PPC: Book3S HV: Context-switch EBB registers properly · ca8efa1d
      Paul Mackerras 提交于
      This adds code to save the values of three SPRs (special-purpose
      registers) used by userspace to control event-based branches (EBBs),
      which are essentially interrupts that get delivered directly to
      userspace.  These registers are loaded up with guest values when
      entering the guest, and their values are saved when exiting the
      guest, but we were not saving the host values and restoring them
      before going back to userspace.
      
      On POWER8 this would only affect userspace programs which explicitly
      request the use of EBBs and also use the KVM_RUN ioctl, since the
      only source of EBBs on POWER8 is the PMU, and there is an explicit
      enable bit in the PMU registers (and those PMU registers do get
      properly context-switched between host and guest).  On POWER9 there
      is provision for externally-generated EBBs, and these are not subject
      to the control in the PMU registers.
      
      Since these registers only affect userspace, we can save them when
      we first come in from userspace and restore them before returning to
      userspace, rather than saving/restoring the host values on every
      guest entry/exit.  Similarly, we don't need to worry about their
      values on offline secondary threads since they execute in the context
      of the idle task, which never executes in userspace.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ca8efa1d
    • P
      x86/debug: Handle early WARN_ONs proper · 8a524f80
      Peter Zijlstra 提交于
      Hans managed to trigger a WARN very early in the boot which killed his
      (Virtual) box.
      
      The reason is that the recent rework of WARN() to use UD0 forgot to add the
      fixup_bug() call to early_fixup_exception(). As a result the kernel does
      not handle the WARN_ON injected UD0 exception and panics.
      
      Add the missing fixup call, so early UD's injected by WARN() get handled.
      
      Fixes: 9a93848f ("x86/debug: Implement __WARN() using UD0")
      Reported-and-tested-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frank Mehnert <frank.mehnert@oracle.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Michael Thayer <michael.thayer@oracle.com>
      Link: http://lkml.kernel.org/r/20170612180108.w4vgu2ckucmllf3a@hirez.programming.kicks-ass.net
      8a524f80
  12. 11 6月, 2017 2 次提交
    • W
      KVM: async_pf: avoid async pf injection when in guest mode · 9bc1f09f
      Wanpeng Li 提交于
       INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
             Not tainted 4.12.0-rc4+ #8
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       gnome-terminal- D    0  1734   1015 0x00000000
       Call Trace:
        __schedule+0x3cd/0xb30
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? __vfs_read+0x37/0x150
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
      
      This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
      and then gives stress to memory on L1, I can observed this hang on L1 when
      at least ~70% swap area is occupied on L0.
      
      This is due to async pf was injected to L2 which should be injected to L1,
      L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
      actually), and L1 guest starts accumulating tasks stuck in D state in
      kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
      
      This patch fixes the hang by doing async pf when executing L1 guest.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9bc1f09f
    • G
      hexagon: Use raw_copy_to_user · 4d801cca
      Guenter Roeck 提交于
      Commit ac4691fa ("hexagon: switch to RAW_COPY_USER") replaced
      __copy_to_user_hexagon() with raw_copy_to_user(), but did not catch
      all callers, resulting in the following build error.
      
      arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon':
      arch/hexagon/mm/uaccess.c:40:3: error:
      	implicit declaration of function '__copy_to_user_hexagon'
      
      Fixes: ac4691fa ("hexagon: switch to RAW_COPY_USER")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NRichard Kuo <rkuo@codeaurora.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      4d801cca
  13. 09 6月, 2017 1 次提交
  14. 08 6月, 2017 1 次提交