1. 27 2月, 2016 1 次提交
  2. 25 2月, 2016 2 次提交
    • M
      KVM: x86: MMU: fix ubsan index-out-of-range warning · 17e4bce0
      Mike Krinkin 提交于
      Ubsan reports the following warning due to a typo in
      update_accessed_dirty_bits template, the patch fixes
      the typo:
      
      [  168.791851] ================================================================================
      [  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
      [  168.791866] index 4 is out of range for type 'u64 [4]'
      [  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
      [  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
      [  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
      [  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
      [  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
      [  168.791891] Call Trace:
      [  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
      [  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
      [  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
      [  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
      [  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
      [  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
      [  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
      [  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
      [  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
      [  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
      [  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
      [  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
      [  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
      [  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
      [  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
      [  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
      [  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
      [  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
      [  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
      [  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
      [  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
      [  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
      [  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
      [  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
      [  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
      [  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
      [  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
      [  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
      [  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
      [  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
      [  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
      [  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
      [  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
      [  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
      [  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
      [  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
      [  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
      [  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
      [  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
      [  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
      [  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
      [  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
      [  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
      [  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
      [  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
      [  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
      [  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  168.792780] ================================================================================
      Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e4bce0
    • M
      arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2 · fd451b90
      Marc Zyngier 提交于
      The GICv3 architecture spec says:
      
      Writing to the active priority registers in any order other than
      the following order will result in UNPREDICTABLE behavior:
      - ICH_AP0R<n>_EL2.
      - ICH_AP1R<n>_EL2.
      
      So let's not pointlessly go against the rule...
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fd451b90
  3. 24 2月, 2016 10 次提交
    • P
      KVM: x86: fix conversion of addresses to linear in 32-bit protected mode · 0c1d77f4
      Paolo Bonzini 提交于
      Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c",
      2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
      case actually could modify "la", but after the patch this change is
      not reflected in *linear.
      
      The bug is visible whenever a non-zero segment base causes the linear
      address to wrap around the 4GB mark.
      
      Fixes: e8dd2d2d
      Cc: stable@vger.kernel.org
      Reported-by: NAurelien Jarno <aurelien@aurel32.net>
      Tested-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c1d77f4
    • P
      KVM: x86: fix missed hardware breakpoints · 172b2386
      Paolo Bonzini 提交于
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in two threads.  To see the bug, run a 2-VCPU guest with
      "taskset -c 0" and run "./bp 0 1" inside the guest.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      172b2386
    • M
      arm/arm64: KVM: Feed initialized memory to MMIO accesses · 1d6a8212
      Marc Zyngier 提交于
      On an MMIO access, we always copy the on-stack buffer info
      the shared "run" structure, even if this is a read access.
      This ends up leaking up to 8 bytes of uninitialized memory
      into userspace, depending on the size of the access.
      
      An obvious fix for this one is to only perform the copy if
      this is an actual write.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1d6a8212
    • V
      arc: SMP: CONFIG_ARC_IPI_DBG cleanup · 9ef2d8be
      Valentin Rothberg 提交于
      Previous Commit ("ARC: SMP: No need for CONFIG_ARC_IPI_DBG") removed
      the Kconfig option ARC_IPI_DBG.  Remove the last reference on this
      option.
      Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      9ef2d8be
    • V
      ARC: SMP: No need for CONFIG_ARC_IPI_DBG · d73b73f5
      Vineet Gupta 提交于
      This was more relevant during SMP bringup.
      
      The warning for bogus msg better be visible always.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      d73b73f5
    • V
      ARCv2: Elide sending new cross core intr if receiver didn't ack prev · 3dea30ca
      Vineet Gupta 提交于
      ARConnect/MCIP IPI sending has a retry-wait loop in case caller had
      not seen a previous such interrupt. Turns out that it is not needed at
      all. Linux cross core calling allows coalescing multiple IPIs to same
      receiver - it is fine as long as there is one.
      
      This logic is built into upper layer already, at a higher level of
      abstraction. ipi_send_msg_one() sets the actual msg payload, but it only
      calls MCIP IPI sending if msg holder was empty (using
      atomic-set-new-and-get-old construct). Thus it is unlikely that the
      retry-wait looping was ever getting exercised at all.
      
      Cc: Chuck Jordan <cjordan@synopsys.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      3dea30ca
    • V
      ARCv2: SMP: Push IPI_IRQ into IPI provider · 96817879
      Vineet Gupta 提交于
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      96817879
    • V
      ARC: [intc-compact] Remove IPI setup from ARCompact port · dbcbc7e7
      Vineet Gupta 提交于
      There is no real ARC700 based SMP SoC so remove IPI definition.
      EZChip's SMP ARC700 is going to use a different intc and IPI provider
      anyways.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      dbcbc7e7
    • V
      ARCv2: SMP: Emulate IPI to self using software triggered interrupt · bb143f81
      Vineet Gupta 提交于
      ARConnect/MCIP Inter-Core-Interrupt module can't send interrupt to
      local core. So use core intc capability to trigger software
      interrupt to self, using an unsued IRQ #21.
      
      This showed up as csd deadlock with LTP trace_sched on a dual core
      system. This test acts as scheduler fuzzer, triggering all sorts of
      schedulting activity. Trouble starts with IPI to self, which doesn't get
      delivered (effectively lost due to H/w capability), but the msg intended
      to be sent remain enqueued in per-cpu @ipi_data.
      
      All subsequent IPIs to this core from other cores get elided due to the
      IPI coalescing optimization in ipi_send_msg_one() where a pending msg
      implies an IPI already sent and assumes other core is yet to ack it.
      After the elided IPI, other core simply goes into csd_lock_wait()
      but never comes out as this core never sees the interrupt.
      
      Fixes STAR 9001008624
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>        [4.2]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      bb143f81
    • L
      x86: fix SMAP in 32-bit environments · de9e478b
      Linus Torvalds 提交于
      In commit 11f1a4b9 ("x86: reorganize SMAP handling in user space
      accesses") I changed how the stac/clac instructions were generated
      around the user space accesses, which then made it possible to do
      batched accesses efficiently for user string copies etc.
      
      However, in doing so, I completely spaced out, and didn't even think
      about the 32-bit case.  And nobody really even seemed to notice, because
      SMAP doesn't even exist until modern Skylake processors, and you'd have
      to be crazy to run 32-bit kernels on a modern CPU.
      
      Which brings us to Andy Lutomirski.
      
      He actually tested the 32-bit kernel on new hardware, and noticed that
      it doesn't work.  My bad.  The trivial fix is to add the required
      uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.
      
      I feel a bit bad about this patch, just because that header file really
      should be cleaned up to avoid all the duplicated code in it, and this
      commit just expands on the problem.  But this just fixes the bug without
      any bigger cleanup surgery.
      Reported-and-tested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de9e478b
  4. 23 2月, 2016 1 次提交
    • A
      arc: get rid of DEVTMPFS dependency on INITRAMFS_SOURCE · 3e5177c1
      Alexey Brodkin 提交于
      Even though DEVTMPFS is required when our pre-built initramfs
      is used it is not the case in general. It is perfectly possible
      to use initramfs with device nodes already populated or there
      could be other usages, see discussion below for more detials:
      http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/37819/focus=37821
      
      This change removes mentioned dependency from arch/arc/Kconfig
      updating instead those defconfigs that are usually used with this
      kind of pre-build initramfs.
      
      And while at it all touched defconfigs were regenerated via
      savedefconfig and some options were removed:
       * USB is selected by other options implicitly
       * VGA_CONSOLE is disableb for ARC since
         031e29b5
       * EXT3_FS automatically selects EXT4_FS
       * MTDxxx and JFFS2_FS make no sense for AXS because
         AXS NAND controller is not upstreamed
       * NET_OSCI_LAN is not in upstream as well
       * ARCPGU_xxx options make no sense because ARC PGU is not yet
         in upstream and when it gets there all config options would
         be taken from devicetree
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      3e5177c1
  5. 22 2月, 2016 4 次提交
    • M
      s390/fpu: signals vs. floating point control register · 1b17cb79
      Martin Schwidefsky 提交于
      git commit 904818e2
      "s390/kernel: introduce fpu-internal.h with fpu helper functions"
      introduced the fpregs_store / fp_regs_load helper. These function
      fail to save and restore the floating pointer control registers.
      
      The effect is that the FPC is not correctly handled on signal
      delivery and signal return.
      
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1b17cb79
    • M
      s390/compat: correct restore of high gprs on signal return · 342300cc
      Martin Schwidefsky 提交于
      git commit 80703617
      "s390: add support for vector extension"
      broke 31-bit compat processes in regard to signal handling.
      
      The restore_sigregs_ext32() function is used to restore the additional
      elements from the user space signal frame. Among the additional elements
      are the upper registers halves for 64-bit register support for 31-bit
      processes. The copy_from_user that is used to retrieve the high-gprs
      array from the user stack uses an incorrect length, 8 bytes instead of
      64 bytes. This causes incorrect upper register halves to get loaded.
      
      Cc: stable@vger.kernel.org # 3.8+
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      342300cc
    • A
      powerpc/mm/hash: Clear the invalid slot information correctly · 9ab3ac23
      Aneesh Kumar K.V 提交于
      We can get a hash pte fault with 4k base page size and find the pte
      already inserted with 64K base page size. In that case we need to clear
      the existing slot information from the old pte. Fix this correctly
      
      With THP, we also clear the slot information with respect to all
      the 64K hash pte mapping that 16MB page. They are all invalid
      now. This make sure we don't find the slot valid when we fault with
      4k base page size. Finding the slot valid should not result in any wrong
      behavior because we do check again in hash page table for the validity.
      But we can avoid that check completely.
      
      Fixes: a43c0eb8 ("powerpc/mm: Convert 4k hash insert to C")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9ab3ac23
    • G
      powerpc/eeh: Fix partial hotplug criterion · f6bf0fa1
      Gavin Shan 提交于
      During error recovery, the device could be removed as part of the
      partial hotplug. The criterion used to come with partial hotplug
      is: if the device driver provides error_detected(), slot_reset()
      and resume() callbacks, it's immune from hotplug. Otherwise,
      it's going to experience partial hotplug during EEH recovery. But
      the criterion isn't correct enough: mlx4_core driver for Mellanox
      adapters provides error_detected(), slot_reset() callbacks, but
      resume() isn't there. Those Mellanox adapters won't be to involved
      in the partial hotplug.
      
      This fixes the criterion to a practical one: adpater with driver
      that provides error_detected(), slot_reset() will be immune from
      partial hotplug. resume() isn't mandatory.
      
      Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion")
      Cc: stable@vger.kernel.org #v4.4+
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f6bf0fa1
  6. 19 2月, 2016 3 次提交
  7. 18 2月, 2016 6 次提交
    • V
      ARCv2: boot report CCMs (Closely Coupled Memories) · a150b085
      Vineet Gupta 提交于
      - ARCv2 uses a seperate BCR for {I,D}CCM base address:
        ARCompact encoded both base/size in same BCR
      
      - Size encoding in common BCR is different for ARCompact/ARCv2
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      a150b085
    • V
      ARCv2: boot print Low Latency Memory · 98341f7d
      Vineet Gupta 提交于
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      98341f7d
    • V
      ARC: Assume multiplier is always present · 0eca6fdb
      Vineet Gupta 提交于
      It is unlikely that designs running Linux will not have multiplier.
      Further the current support is not complete as tool don't generate a
      multilib w/o multiplier.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      0eca6fdb
    • T
      x86/mm: Fix vmalloc_fault() to handle large pages properly · f4eafd8b
      Toshi Kani 提交于
      A kernel page fault oops with the callstack below was observed
      when a read syscall was made to a pmem device after a huge amount
      (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
      system:
      
           BUG: unable to handle kernel paging request at ffff880840000ff8
           IP: vmalloc_fault+0x1be/0x300
           PGD c7f03a067 PUD 0
           Oops: 0000 [#1] SM
           Call Trace:
              __do_page_fault+0x285/0x3e0
              do_page_fault+0x2f/0x80
              ? put_prev_entity+0x35/0x7a0
              page_fault+0x28/0x30
              ? memcpy_erms+0x6/0x10
              ? schedule+0x35/0x80
              ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
              ? schedule_timeout+0x183/0x240
              btt_log_read+0x63/0x140 [nd_btt]
               :
              ? __symbol_put+0x60/0x60
              ? kernel_read+0x50/0x80
              SyS_finit_module+0xb9/0xf0
              entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      Since v4.1, ioremap() supports large page (pud/pmd) mappings in
      x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
      range is limited to pte mappings.
      
      vmalloc faults do not normally happen in ioremap'd ranges since
      ioremap() sets up the kernel page tables, which are shared by
      user processes.  pgd_ctor() sets the kernel's PGD entries to
      user's during fork().  When allocation of the vmalloc ranges
      crosses a 512GB boundary, ioremap() allocates a new pud table
      and updates the kernel PGD entry to point it.  If user process's
      PGD entry does not have this update yet, a read/write syscall
      to the range will cause a vmalloc fault, which hits the Oops
      above as it does not handle a large page properly.
      
      Following changes are made to vmalloc_fault().
      
      64-bit:
      
       - No change for the PGD sync operation as it handles large
         pages already.
       - Add pud_huge() and pmd_huge() to the validation code to
         handle large pages.
       - Change pud_page_vaddr() to pud_pfn() since an ioremap range
         is not directly mapped (while the if-statement still works
         with a bogus addr).
       - Change pmd_page() to pmd_pfn() since an ioremap range is not
         backed by struct page (while the if-statement still works
         with a bogus addr).
      
      32-bit:
       - No change for the sync operation since the index3 PGD entry
         covers the entire vmalloc range, which is always valid.
         (A separate change to sync PGD entry is necessary if this
          memory layout is changed regardless of the page size.)
       - Add pmd_huge() to the validation code to handle large pages.
         This is for completeness since vmalloc_fault() won't happen
         in ioremap'd ranges as its PGD entry is always valid.
      Reported-by: NHenning Schild <henning.schild@siemens.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # 4.1+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4eafd8b
    • B
      Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" · 67b4eab9
      Bjorn Helgaas 提交于
      Revert 811a4e6f ("PCI: Add helpers to manage pci_dev->irq and
      pci_dev->irq_managed").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      67b4eab9
    • B
      Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled" · fe25d078
      Bjorn Helgaas 提交于
      Revert 8affb487 ("x86/PCI: Don't alloc pcibios-irq when MSI is
      enabled").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      CC: Joerg Roedel <jroedel@suse.de>
      fe25d078
  8. 17 2月, 2016 7 次提交
  9. 16 2月, 2016 1 次提交
  10. 15 2月, 2016 5 次提交