1. 09 1月, 2016 1 次提交
  2. 09 12月, 2015 1 次提交
  3. 06 12月, 2015 4 次提交
  4. 04 12月, 2015 1 次提交
    • K
      x86/mm: Fix regression with huge pages on PAE · 70f15287
      Kirill A. Shutemov 提交于
      Recent PAT patchset has caused issue on 32-bit PAE machines:
      
        page:eea45000 count:0 mapcount:-128 mapping:  (null) index:0x0 flags: 0x40000000()
        page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0)
        ------------[ cut here ]------------
        kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485!
        invalid opcode: 0000 [#1] SMP
        [...]
        Call Trace:
         unmap_single_vma
         ? __wake_up
         unmap_vmas
         unmap_region
         do_munmap
         vm_munmap
         SyS_munmap
         do_fast_syscall_32
         ? __do_page_fault
         sysenter_past_esp
        Code: ...
        EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98
      
      The problem is in pmd_pfn_mask() and pmd_flags_mask(). These
      helpers use PMD_PAGE_MASK to calculate resulting mask.
      PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as
      phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a
      result, the upper bits of resulting mask get truncated.
      
      pud_pfn_mask() and pud_flags_mask() aren't problematic since we
      don't have PUD page table level on 32-bit systems, but it's
      reasonable to keep them consistent with PMD counterpart.
      
      Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in
      addition to existing PHYSICAL_PAGE_MASK and reworks helpers to
      use them.
      Reported-and-Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      [ Fix -Woverflow warnings from the realmode code. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jürgen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: elliott@hpe.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Fixes: f70abb0f ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit")
      Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      70f15287
  5. 02 12月, 2015 1 次提交
  6. 01 12月, 2015 1 次提交
  7. 26 11月, 2015 1 次提交
  8. 25 11月, 2015 1 次提交
    • H
      KVM: nVMX: remove incorrect vpid check in nested invvpid emulation · b2467e74
      Haozhong Zhang 提交于
      This patch removes the vpid check when emulating nested invvpid
      instruction of type all-contexts invalidation. The existing code is
      incorrect because:
       (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
           Translations Based on VPID", invvpid instruction does not check
           vpid in the invvpid descriptor when its type is all-contexts
           invalidation.
       (2) According to the same document, invvpid of type all-contexts
           invalidation does not require there is an active VMCS, so/and
           get_vmcs12() in the existing code may result in a NULL-pointer
           dereference. In practice, it can crash both KVM itself and L1
           hypervisors that use invvpid (e.g. Xen).
      Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2467e74
  9. 24 11月, 2015 1 次提交
    • A
      x86/entry/64: Fix irqflag tracing wrt context tracking · f1075053
      Andy Lutomirski 提交于
      Paolo pointed out that enter_from_user_mode could be called
      while irqflags were traced as though IRQs were on.
      
      In principle, this could confuse lockdep.  It doesn't cause any
      problems that I've seen in any configuration, but if I build
      with CONFIG_DEBUG_LOCKDEP=y, enable a nohz_full CPU, and add
      code like:
      
      	if (irqs_disabled()) {
      		spin_lock(&something);
      		spin_unlock(&something);
      	}
      
      to the top of enter_from_user_mode, then lockdep will complain
      without this fix.  It seems that lockdep's irqflags sanity
      checks are too weak to detect this bug without forcing the
      issue.
      
      This patch adds one byte to normal kernels, and it's IMO a bit
      ugly. I haven't spotted a better way to do this yet, though.
      The issue is that we can't do TRACE_IRQS_OFF until after SWAPGS
      (if needed), but we're also supposed to do it before calling C
      code.
      
      An alternative approach would be to call trace_hardirqs_off in
      enter_from_user_mode.  That would be less code and would not
      bloat normal kernels at all, but it would be harder to see how
      the code worked.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/86237e362390dfa6fec12de4d75a238acb0ae787.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1075053
  10. 23 11月, 2015 4 次提交
  11. 19 11月, 2015 2 次提交
  12. 18 11月, 2015 4 次提交
  13. 14 11月, 2015 1 次提交
  14. 12 11月, 2015 5 次提交
    • H
      perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro · 41ac18eb
      Huang Rui 提交于
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1446630233-3166-1-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      41ac18eb
    • H
      x86/fpu: Fix get_xsave_addr() behavior under virtualization · a05917b6
      Huaitong Han 提交于
      KVM uses the get_xsave_addr() function in a different fashion from
      the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
      not the currently running task.
      
      But 'xsave' is replaced with current task's (host) xsave structure, so
      get_xsave_addr() will incorrectly return the bad xsave address to KVM.
      
      Fix it so that the passed in 'xsave' address is used - as intended
      originally.
      Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
      [ Tidied up the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a05917b6
    • D
      x86/fpu: Fix 32-bit signal frame handling · ab6b5294
      Dave Hansen 提交于
      (This should have gone to LKML originally. Sorry for the extra
       noise, folks on the cc.)
      
      Background:
      
      Signal frames on x86 have two formats:
      
        1. For 32-bit executables (whether on a real 32-bit kernel or
           under 32-bit emulation on a 64-bit kernel) we have a
          'fpregset_t' that includes the "FSAVE" registers.
      
        2. For 64-bit executables (on 64-bit kernels obviously), the
           'fpregset_t' is smaller and does not contain the "FSAVE"
           state.
      
      When creating the signal frame, we have to be aware of whether
      we are running a 32 or 64-bit executable so we create the
      correct format signal frame.
      
      Problem:
      
      save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
      called for a 32-bit executable.  This is for real 32-bit and
      ia32 emulation.
      
      But, fpu__init_prepare_fx_sw_frame() only initializes
      'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
      32-bit kernels.
      
      This leads to really wierd situations where 32-bit programs
      lose their extended state when returning from a signal handler.
      The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
      out to userspace in save_xstate_epilog().  But when returning
      from the signal, the kernel errors out in check_for_xstate()
      when it does not see FP_XSTATE_MAGIC1 present (because it was
      zeroed).  This leads to the FPU/XSAVE state being initialized.
      
      For MPX, this leads to the most permissive state and means we
      silently lose bounds violations.  I think this would also mean
      that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
      no one has spotted this bug.
      
      I believe this was broken by:
      
      	72a671ce ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
      
      way back in 2012.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@sr71.net
      Cc: fenghua.yu@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab6b5294
    • D
      x86/mpx: Fix 32-bit address space calculation · f3119b83
      Dave Hansen 提交于
      I received a bug report that running 32-bit MPX binaries on
      64-bit kernels was broken.  I traced it down to this little code
      snippet.  We were switching our "number of bounds directory
      entries" calculation correctly.  But, we didn't switch the other
      side of the calculation: the virtual space size.
      
      This meant that we were calculating an absurd size for
      bd_entry_virt_space() on 32-bit because we used the 64-bit
      virt_space.
      
      This was _also_ broken for 32-bit kernels running on 64-bit
      hardware since boot_cpu_data.x86_virt_bits=48 even when running
      in 32-bit mode.
      
      Correct that and properly handle all 3 possible cases:
      
       1. 32-bit binary on 64-bit kernel
       2. 64-bit binary on 64-bit kernel
       3. 32-bit binary on 32-bit kernel
      
      This manifested in having bounds tables not properly unmapped.
      It "leaked" memory but had no functional impact otherwise.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3119b83
    • D
      x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels · 46561c39
      Dave Hansen 提交于
      When you call get_user(foo, bar), you effectively do a
      
      	copy_from_user(&foo, bar, sizeof(*bar));
      
      Note that the sizeof() is implicit.
      
      When we reach out to userspace to try to zap an entire "bounds
      table" we need to go read a "bounds directory entry" in order to
      locate the table's address.  The size of a "directory entry"
      depends on the binary being run and is always the size of a
      pointer.
      
      But, when we have a 64-bit kernel and a 32-bit application, the
      directory entry is still only 32-bits long, but we fetch it with
      a 64-bit pointer which makes get_user() does a 64-bit fetch.
      Reading 4 extra bytes isn't harmful, unless we are at the end of
      and run off the table.  It might also cause the zero page to get
      faulted in unnecessarily even if you are not at the end.
      
      Fix it up by doing a special 32-bit get_user() via a cast when
      we have 32-bit userspace.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      46561c39
  15. 10 11月, 2015 12 次提交