1. 27 2月, 2016 1 次提交
  2. 25 2月, 2016 1 次提交
    • M
      KVM: x86: MMU: fix ubsan index-out-of-range warning · 17e4bce0
      Mike Krinkin 提交于
      Ubsan reports the following warning due to a typo in
      update_accessed_dirty_bits template, the patch fixes
      the typo:
      
      [  168.791851] ================================================================================
      [  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
      [  168.791866] index 4 is out of range for type 'u64 [4]'
      [  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
      [  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
      [  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
      [  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
      [  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
      [  168.791891] Call Trace:
      [  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
      [  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
      [  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
      [  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
      [  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
      [  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
      [  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
      [  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
      [  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
      [  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
      [  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
      [  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
      [  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
      [  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
      [  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
      [  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
      [  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
      [  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
      [  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
      [  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
      [  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
      [  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
      [  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
      [  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
      [  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
      [  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
      [  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
      [  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
      [  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
      [  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
      [  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
      [  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
      [  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
      [  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
      [  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
      [  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
      [  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
      [  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
      [  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
      [  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
      [  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
      [  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
      [  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
      [  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
      [  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
      [  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
      [  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  168.792780] ================================================================================
      Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e4bce0
  3. 24 2月, 2016 3 次提交
    • P
      KVM: x86: fix conversion of addresses to linear in 32-bit protected mode · 0c1d77f4
      Paolo Bonzini 提交于
      Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c",
      2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
      case actually could modify "la", but after the patch this change is
      not reflected in *linear.
      
      The bug is visible whenever a non-zero segment base causes the linear
      address to wrap around the 4GB mark.
      
      Fixes: e8dd2d2d
      Cc: stable@vger.kernel.org
      Reported-by: NAurelien Jarno <aurelien@aurel32.net>
      Tested-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c1d77f4
    • P
      KVM: x86: fix missed hardware breakpoints · 172b2386
      Paolo Bonzini 提交于
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in two threads.  To see the bug, run a 2-VCPU guest with
      "taskset -c 0" and run "./bp 0 1" inside the guest.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      172b2386
    • L
      x86: fix SMAP in 32-bit environments · de9e478b
      Linus Torvalds 提交于
      In commit 11f1a4b9 ("x86: reorganize SMAP handling in user space
      accesses") I changed how the stac/clac instructions were generated
      around the user space accesses, which then made it possible to do
      batched accesses efficiently for user string copies etc.
      
      However, in doing so, I completely spaced out, and didn't even think
      about the 32-bit case.  And nobody really even seemed to notice, because
      SMAP doesn't even exist until modern Skylake processors, and you'd have
      to be crazy to run 32-bit kernels on a modern CPU.
      
      Which brings us to Andy Lutomirski.
      
      He actually tested the 32-bit kernel on new hardware, and noticed that
      it doesn't work.  My bad.  The trivial fix is to add the required
      uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.
      
      I feel a bit bad about this patch, just because that header file really
      should be cleaned up to avoid all the duplicated code in it, and this
      commit just expands on the problem.  But this just fixes the bug without
      any bigger cleanup surgery.
      Reported-and-tested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de9e478b
  4. 19 2月, 2016 1 次提交
  5. 18 2月, 2016 3 次提交
    • T
      x86/mm: Fix vmalloc_fault() to handle large pages properly · f4eafd8b
      Toshi Kani 提交于
      A kernel page fault oops with the callstack below was observed
      when a read syscall was made to a pmem device after a huge amount
      (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
      system:
      
           BUG: unable to handle kernel paging request at ffff880840000ff8
           IP: vmalloc_fault+0x1be/0x300
           PGD c7f03a067 PUD 0
           Oops: 0000 [#1] SM
           Call Trace:
              __do_page_fault+0x285/0x3e0
              do_page_fault+0x2f/0x80
              ? put_prev_entity+0x35/0x7a0
              page_fault+0x28/0x30
              ? memcpy_erms+0x6/0x10
              ? schedule+0x35/0x80
              ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
              ? schedule_timeout+0x183/0x240
              btt_log_read+0x63/0x140 [nd_btt]
               :
              ? __symbol_put+0x60/0x60
              ? kernel_read+0x50/0x80
              SyS_finit_module+0xb9/0xf0
              entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      Since v4.1, ioremap() supports large page (pud/pmd) mappings in
      x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
      range is limited to pte mappings.
      
      vmalloc faults do not normally happen in ioremap'd ranges since
      ioremap() sets up the kernel page tables, which are shared by
      user processes.  pgd_ctor() sets the kernel's PGD entries to
      user's during fork().  When allocation of the vmalloc ranges
      crosses a 512GB boundary, ioremap() allocates a new pud table
      and updates the kernel PGD entry to point it.  If user process's
      PGD entry does not have this update yet, a read/write syscall
      to the range will cause a vmalloc fault, which hits the Oops
      above as it does not handle a large page properly.
      
      Following changes are made to vmalloc_fault().
      
      64-bit:
      
       - No change for the PGD sync operation as it handles large
         pages already.
       - Add pud_huge() and pmd_huge() to the validation code to
         handle large pages.
       - Change pud_page_vaddr() to pud_pfn() since an ioremap range
         is not directly mapped (while the if-statement still works
         with a bogus addr).
       - Change pmd_page() to pmd_pfn() since an ioremap range is not
         backed by struct page (while the if-statement still works
         with a bogus addr).
      
      32-bit:
       - No change for the sync operation since the index3 PGD entry
         covers the entire vmalloc range, which is always valid.
         (A separate change to sync PGD entry is necessary if this
          memory layout is changed regardless of the page size.)
       - Add pmd_huge() to the validation code to handle large pages.
         This is for completeness since vmalloc_fault() won't happen
         in ioremap'd ranges as its PGD entry is always valid.
      Reported-by: NHenning Schild <henning.schild@siemens.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # 4.1+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4eafd8b
    • B
      Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" · 67b4eab9
      Bjorn Helgaas 提交于
      Revert 811a4e6f ("PCI: Add helpers to manage pci_dev->irq and
      pci_dev->irq_managed").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      67b4eab9
    • B
      Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled" · fe25d078
      Bjorn Helgaas 提交于
      Revert 8affb487 ("x86/PCI: Don't alloc pcibios-irq when MSI is
      enabled").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      CC: Joerg Roedel <jroedel@suse.de>
      fe25d078
  6. 17 2月, 2016 4 次提交
    • M
      hpet: Drop stale URLs · 4e7f9df2
      Michael S. Tsirkin 提交于
      Looks like the HPET spec at intel.com got moved.
      It isn't hard to find so drop the link, just mention
      the revision assumed.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e7f9df2
    • T
      x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() · a82eee74
      Toshi Kani 提交于
      Data corruption issues were observed in tests which initiated
      a system crash/reset while accessing BTT devices.  This problem
      is reproducible.
      
      The BTT driver calls pmem_rw_bytes() to update data in pmem
      devices.  This interface calls __copy_user_nocache(), which
      uses non-temporal stores so that the stores to pmem are
      persistent.
      
      __copy_user_nocache() uses non-temporal stores when a request
      size is 8 bytes or larger (and is aligned by 8 bytes).  The
      BTT driver updates the BTT map table, which entry size is
      4 bytes.  Therefore, updates to the map table entries remain
      cached, and are not written to pmem after a crash.
      
      Change __copy_user_nocache() to use non-temporal store when
      a request size is 4 bytes.  The change extends the current
      byte-copy path for a less-than-8-bytes request, and does not
      add any overhead to the regular path.
      Reported-and-tested-by: NMicah Parrish <micah.parrish@hpe.com>
      Reported-and-tested-by: NBrian Boylston <brian.boylston@hpe.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a82eee74
    • T
      x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable · ee9737c9
      Toshi Kani 提交于
      Add comments to __copy_user_nocache() to clarify its procedures
      and alignment requirements.
      
      Also change numeric branch target labels to named local labels.
      
      No code changed:
      
       arch/x86/lib/copy_user_64.o:
      
          text    data     bss     dec     hex filename
          1239       0       0    1239     4d7 copy_user_64.o.before
          1239       0       0    1239     4d7 copy_user_64.o.after
      
       md5:
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: brian.boylston@hpe.com
      Cc: dan.j.williams@intel.com
      Cc: linux-nvdimm@lists.01.org
      Cc: micah.parrish@hpe.com
      Cc: ross.zwisler@linux.intel.com
      Cc: vishal.l.verma@intel.com
      Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
      [ Small readability edits and added object file comparison. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ee9737c9
    • T
      perf/x86/amd/uncore: Plug reference leak · 8bc9162c
      Thomas Gleixner 提交于
      In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore
      struct is freed, but the percpu pointer still references it. Set it to NULL.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8bc9162c
  7. 15 2月, 2016 1 次提交
  8. 12 2月, 2016 1 次提交
  9. 08 2月, 2016 1 次提交
    • I
      x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels · 59fd1214
      Ingo Molnar 提交于
      The following commit:
      
        a0acda91 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable")
      
      Introduced numa_clear_kernel_node_hotplug(), which function is executed
      during early bootup, and which marks all currently reserved memblock
      regions as hot-memory-unswappable as well.
      
      y14sg1 <y14sg1@comcast.net> reported that when running 32-bit NUMA kernels,
      the grsecurity/PAX kernel patch flagged a size overflow in this function:
      
        PAX: size overflow detected in function x86_numa_init arch/x86/mm/numa.c:691 [...]
      
      ... the reason for the overflow is that memblock_clear_hotplug() takes physical
      addresses as arguments, while the start/end variables used by
      numa_clear_kernel_node_hotplug() are 'unsigned long', which is 32-bit on PAE
      kernels, but which has 64-bit physical addresses.
      
      So on 32-bit PAE kernels that have physical memory above the 4GB boundary,
      we truncate a 64-bit physical address range to 32 bits and pass it to
      memblock_clear_hotplug(), which at minimum prevents the original memory-hotplug
      bugfix from working, but might have other side effects as well.
      
      The fix is to use the proper type to handle physical addresses, phys_addr_t.
      Reported-by: Ny14sg1 <y14sg1@comcast.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59fd1214
  10. 06 2月, 2016 1 次提交
    • V
      mm, hugetlb: don't require CMA for runtime gigantic pages · 080fe206
      Vlastimil Babka 提交于
      Commit 944d9fec ("hugetlb: add support for gigantic page allocation
      at runtime") has added the runtime gigantic page allocation via
      alloc_contig_range(), making this support available only when CONFIG_CMA
      is enabled.  Because it doesn't depend on MIGRATE_CMA pageblocks and the
      associated infrastructure, it is possible with few simple adjustments to
      require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.
      
      After this patch, alloc_contig_range() and related functions are
      available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
      enabled.  Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION.  This allows
      supporting runtime gigantic pages without the CMA-specific checks in
      page allocator fastpaths.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      080fe206
  11. 05 2月, 2016 1 次提交
    • D
      x86: Fix KASAN false positives in thread_saved_pc() · 75edb54a
      Dmitry Vyukov 提交于
      thread_saved_pc() reads stack of a potentially running task.
      This can cause false KASAN stack-out-of-bounds reports,
      because the running task concurrently poisons and unpoisons
      own stack.
      
      The same happens in get_wchan(), and get get_wchan() was fixed
      by using READ_ONCE_NOCHECK(). Do the same here.
      
      Example KASAN report triggered by sysrq-t:
      
        BUG: KASAN: out-of-bounds in sched_show_task+0x306/0x3b0 at addr ffff880043c97c18
        Read of size 8 by task syz-executor/23839
        [...]
        page dumped because: kasan: bad access detected
        [...]
        Call Trace:
         [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40
         [<ffffffff813e7a26>] sched_show_task+0x306/0x3b0
         [<ffffffff813e7bf4>] show_state_filter+0x124/0x1a0
         [<ffffffff82d2ca00>] fn_show_state+0x10/0x20
         [<ffffffff82d2cf98>] k_spec+0xa8/0xe0
         [<ffffffff82d3354f>] kbd_event+0xb9f/0x4000
         [<ffffffff843ca8a7>] input_to_handler+0x3a7/0x4b0
         [<ffffffff843d1954>] input_pass_values.part.5+0x554/0x6b0
         [<ffffffff843d29bc>] input_handle_event+0x2ac/0x1070
         [<ffffffff843d3a47>] input_inject_event+0x237/0x280
         [<ffffffff843e8c28>] evdev_write+0x478/0x680
         [<ffffffff817ac653>] __vfs_write+0x113/0x480
         [<ffffffff817ae0e7>] vfs_write+0x167/0x4a0
         [<ffffffff817b13d1>] SyS_write+0x111/0x220
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: glider@google.com
      Cc: kasan-dev@googlegroups.com
      Cc: kcc@google.com
      Cc: linux-kernel@vger.kernel.org
      Cc: ryabinin.a.a@gmail.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      75edb54a
  12. 29 1月, 2016 3 次提交
    • M
      x86/mm/pat: Avoid truncation when converting cpa->numpages to address · 74256377
      Matt Fleming 提交于
      There are a couple of nasty truncation bugs lurking in the pageattr
      code that can be triggered when mapping EFI regions, e.g. when we pass
      a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
      left by PAGE_SHIFT will truncate the resultant address to 32-bits.
      
      Viorel-Cătălin managed to trigger this bug on his Dell machine that
      provides a ~5GB EFI region which requires 1236992 pages to be mapped.
      When calling populate_pud() the end of the region gets calculated
      incorrectly in the following buggy expression,
      
        end = start + (cpa->numpages << PAGE_SHIFT);
      
      And only 188416 pages are mapped. Next, populate_pud() gets invoked
      for a second time because of the loop in __change_page_attr_set_clr(),
      only this time no pages get mapped because shifting the remaining
      number of pages (1048576) by PAGE_SHIFT is zero. At which point the
      loop in __change_page_attr_set_clr() spins forever because we fail to
      map progress.
      
      Hitting this bug depends very much on the virtual address we pick to
      map the large region at and how many pages we map on the initial run
      through the loop. This explains why this issue was only recently hit
      with the introduction of commit
      
        a5caa209 ("x86/efi: Fix boot crash by mapping EFI memmap
         entries bottom-up at runtime, instead of top-down")
      
      It's interesting to note that safe uses of cpa->numpages do exist in
      the pageattr code. If instead of shifting ->numpages we multiply by
      PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
      so the result is unsigned long.
      
      To avoid surprises when users try to convert very large cpa->numpages
      values to addresses, change the data type from 'int' to 'unsigned
      long', thereby making it suitable for shifting by PAGE_SHIFT without
      any type casting.
      
      The alternative would be to make liberal use of casting, but that is
      far more likely to cause problems in the future when someone adds more
      code and fails to cast properly; this bug was difficult enough to
      track down in the first place.
      Reported-and-tested-by: NViorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
      Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      74256377
    • P
      perf/x86: De-obfuscate code · 8f04b853
      Peter Zijlstra 提交于
      Get rid of the 'onln' obfuscation.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8f04b853
    • P
      perf/x86: Fix uninitialized value usage · e01d8718
      Peter Zijlstra 提交于
      When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not
      initialize alt_idx and then use this uninitialized value to index an
      array.
      
      When that is not fatal, it can result in an infinite loop in its
      caller __intel_shared_reg_get_constraints(), with IRQs disabled.
      
      Alternative error modes are random memory corruption due to the
      cpuc->shared_regs->regs[] array overrun, which manifest in either
      get_constraints or put_constraints doing weird stuff.
      
      Only took 6 hours of painful debugging to find this. Neither GCC nor
      Smatch warnings flagged this bug.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: ae3f011f ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e01d8718
  13. 27 1月, 2016 1 次提交
  14. 25 1月, 2016 1 次提交
  15. 23 1月, 2016 1 次提交
    • R
      pmem: add wb_cache_pmem() to the PMEM API · 3f4a2670
      Ross Zwisler 提交于
      __arch_wb_cache_pmem() was already an internal implementation detail of
      the x86 PMEM API, but this functionality needs to be exported as part of
      the general PMEM API to handle the fsync/msync case for DAX mmaps.
      
      One thing worth noting is that we really do want this to be part of the
      PMEM API as opposed to a stand-alone function like clflush_cache_range()
      because of ordering restrictions.  By having wb_cache_pmem() as part of
      the PMEM API we can leave it unordered, call it multiple times to write
      back large amounts of memory, and then order the multiple calls with a
      single wmb_pmem().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f4a2670
  16. 22 1月, 2016 1 次提交
  17. 21 1月, 2016 4 次提交
  18. 20 1月, 2016 2 次提交
  19. 19 1月, 2016 4 次提交
  20. 17 1月, 2016 1 次提交
  21. 16 1月, 2016 4 次提交