1. 01 2月, 2013 3 次提交
  2. 31 1月, 2013 1 次提交
  3. 26 1月, 2013 5 次提交
    • D
      x86, kvm: Fix kvm's use of __pa() on percpu areas · 5dfd486c
      Dave Hansen 提交于
      In short, it is illegal to call __pa() on an address holding
      a percpu variable.  This replaces those __pa() calls with
      slow_virt_to_phys().  All of the cases in this patch are
      in boot time (or CPU hotplug time at worst) code, so the
      slow pagetable walking in slow_virt_to_phys() is not expected
      to have a performance impact.
      
      The times when this actually matters are pretty obscure
      (certain 32-bit NUMA systems), but it _does_ happen.  It is
      important to keep KVM guests working on these systems because
      the real hardware is getting harder and harder to find.
      
      This bug manifested first by me seeing a plain hang at boot
      after this message:
      
      	CPU 0 irqstacks, hard=f3018000 soft=f301a000
      
      or, sometimes, it would actually make it out to the console:
      
      [    0.000000] BUG: unable to handle kernel paging request at ffffffff
      
      I eventually traced it down to the KVM async pagefault code.
      This can be worked around by disabling that code either at
      compile-time, or on the kernel command-line.
      
      The kvm async pagefault code was injecting page faults in
      to the guest which the guest misinterpreted because its
      "reason" was not being properly sent from the host.
      
      The guest passes a physical address of an per-cpu async page
      fault structure via an MSR to the host.  Since __pa() is
      broken on percpu data, the physical address it sent was
      bascially bogus and the host went scribbling on random data.
      The guest never saw the real reason for the page fault (it
      was injected by the host), assumed that the kernel had taken
      a _real_ page fault, and panic()'d.  The behavior varied,
      though, depending on what got corrupted by the bad write.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      5dfd486c
    • D
      x86, mm: Create slow_virt_to_phys() · d7656534
      Dave Hansen 提交于
      This is necessary because __pa() does not work on some kinds of
      memory, like vmalloc() or the alloc_remap() areas on 32-bit
      NUMA systems.  We have some functions to do conversions _like_
      this in the vmalloc() code (like vmalloc_to_page()), but they
      do not work on sizes other than 4k pages.  We would potentially
      need to be able to handle all the page sizes that we use for
      the kernel linear mapping (4k, 2M, 1G).
      
      In practice, on 32-bit NUMA systems, the percpu areas get stuck
      in the alloc_remap() area.  Any __pa() call on them will break
      and basically return garbage.
      
      This patch introduces a new function slow_virt_to_phys(), which
      walks the kernel page tables on x86 and should do precisely
      the same logical thing as __pa(), but actually work on a wider
      range of memory.  It should work on the normal linear mapping,
      vmalloc(), kmap(), etc...
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212433.4D1FCA62@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d7656534
    • D
      x86, mm: Use new pagetable helpers in try_preserve_large_page() · f3c4fbb6
      Dave Hansen 提交于
      try_preserve_large_page() can be slightly simplified by using
      the new page_level_*() helpers.  This also moves the 'level'
      over to the new pg_level enum type.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212432.14F3D993@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f3c4fbb6
    • D
      x86, mm: Pagetable level size/shift/mask helpers · 4cbeb51b
      Dave Hansen 提交于
      I plan to use lookup_address() to walk the kernel pagetables
      in a later patch.  It returns a "pte" and the level in the
      pagetables where the "pte" was found.  The level is just an
      enum and needs to be converted to a useful value in order to
      do address calculations with it.  These helpers will be used
      in at least two places.
      
      This also gives the anonymous enum a real name so that no one
      gets confused about what they should be passing in to these
      helpers.
      
      "PTE_SHIFT" was chosen for naming consistency with the other
      pagetable levels (PGD/PUD/PMD_SHIFT).
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212431.405D3A8C@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4cbeb51b
    • D
      x86, mm: Make DEBUG_VIRTUAL work earlier in boot · a25b9316
      Dave Hansen 提交于
      The KVM code has some repeated bugs in it around use of __pa() on
      per-cpu data.  Those data are not in an area on which using
      __pa() is valid.  However, they are also called early enough in
      boot that __vmalloc_start_set is not set, and thus the
      CONFIG_DEBUG_VIRTUAL debugging does not catch them.
      
      This adds a check to also verify __pa() calls against max_low_pfn,
      which we can use earler in boot than is_vmalloc_addr().  However,
      if we are super-early in boot, max_low_pfn=0 and this will trip
      on every call, so also make sure that max_low_pfn is set before
      we try to use it.
      
      With this patch applied, CONFIG_DEBUG_VIRTUAL will actually
      catch the bug I was chasing (and fix later in this series).
      
      I'd love to find a generic way so that any __pa() call on percpu
      areas could do a BUG_ON(), but there don't appear to be any nice
      and easy ways to check if an address is a percpu one.  Anybody
      have ideas on a way to do this?
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212430.F46F8159@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a25b9316
  4. 24 1月, 2013 2 次提交
  5. 23 1月, 2013 11 次提交
  6. 22 1月, 2013 4 次提交
  7. 21 1月, 2013 1 次提交
  8. 19 1月, 2013 5 次提交
  9. 18 1月, 2013 3 次提交
  10. 17 1月, 2013 5 次提交