1. 03 4月, 2013 1 次提交
    • K
      xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen provided pagetables. · b2222794
      Konrad Rzeszutek Wilk 提交于
      Occassionaly on a DL380 G4 the guest would crash quite early with this:
      
      (XEN) d244:v0: unhandled page fault (ec=0003)
      (XEN) Pagetable walk from ffffffff84dc7000:
      (XEN)  L4[0x1ff] = 00000000c3f18067 0000000000001789
      (XEN)  L3[0x1fe] = 00000000c3f14067 000000000000178d
      (XEN)  L2[0x026] = 00000000dc8b2067 0000000000004def
      (XEN)  L1[0x1c7] = 00100000dc8da067 0000000000004dc7
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 244 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.1.3OVM  x86_64  debug=n  Not tainted ]----
      (XEN) CPU:    3
      (XEN) RIP:    e033:[<ffffffff81263f22>]
      (XEN) RFLAGS: 0000000000000216   EM: 1   CONTEXT: pv guest
      (XEN) rax: 0000000000000000   rbx: ffffffff81785f88   rcx: 000000000000003f
      (XEN) rdx: 0000000000000000   rsi: 00000000dc8da063   rdi: ffffffff84dc7000
      
      The offending code shows it to be a loop writting the value zero
      (%rax) in the %rdi (the L4 provided by Xen) register:
      
         0: 44 00 00             add    %r8b,(%rax)
         3: 31 c0                 xor    %eax,%eax
         5: b9 40 00 00 00       mov    $0x40,%ecx
         a: 66 0f 1f 84 00 00 00 nopw   0x0(%rax,%rax,1)
        11: 00 00
        13: ff c9                 dec    %ecx
        15:* 48 89 07             mov    %rax,(%rdi)     <-- trapping instruction
        18: 48 89 47 08           mov    %rax,0x8(%rdi)
        1c: 48 89 47 10           mov    %rax,0x10(%rdi)
      
      which fails. xen_setup_kernel_pagetable recycles some of the Xen's
      page-table entries when it has switched over to its Linux page-tables.
      
      Right before try to clear the page, we  make a hypercall to change
      it from _RO to  _RW and that works (otherwise we would hit an BUG()).
      And the _RW flag is set for that page:
      (XEN)  L1[0x1c7] = 001000004885f067 0000000000004dc7
      
      The error code is 3, so PFEC_page_present and PFEC_write_access, so page is
      present (correct), and we tried to write to the page, but a violation
      occurred. The one theory is that the the page entries in hardware
      (which are cached) are not up to date with what we just set. Especially
      as we have just done an CR3 write and flushed the multicalls.
      
      This patch does solve the problem by flusing out the TLB page
      entry after changing it from _RO to _RW and we don't hit this
      issue anymore.
      
      Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO
      'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4]
      Reported-and-Tested-by: NSaar Maoz <Saar.Maoz@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b2222794
  2. 28 3月, 2013 1 次提交
  3. 23 2月, 2013 1 次提交
    • K
      x86-64, xen, mmu: Provide an early version of write_cr3. · 0cc9129d
      Konrad Rzeszutek Wilk 提交于
      With commit 8170e6be ("x86, 64bit: Use a #PF handler to materialize
      early mappings on demand") we started hitting an early bootup crash
      where the Xen hypervisor would inform us that:
      
          (XEN) d7:v0: unhandled page fault (ec=0000)
          (XEN) Pagetable walk from ffffea000005b2d0:
          (XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
          (XEN) domain_crash_sync called from entry.S
          (XEN) Domain 7 (vcpu#0) crashed on cpu#3:
          (XEN) ----[ Xen-4.2.0  x86_64  debug=n  Not tainted ]----
      
      .. that Xen was unable to context switch back to dom0.
      
      Looking at the calling stack we find:
      
          [<ffffffff8103feba>] xen_get_user_pgd+0x5a  <--
          [<ffffffff8103feba>] xen_get_user_pgd+0x5a
          [<ffffffff81042d27>] xen_write_cr3+0x77
          [<ffffffff81ad2d21>] init_mem_mapping+0x1f9
          [<ffffffff81ac293f>] setup_arch+0x742
          [<ffffffff81666d71>] printk+0x48
      
      We are trying to figure out whether we need to up-date the user PGD as
      well.  Please keep in mind that under 64-bit PV guests we have a limited
      amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel
      and user-space.  As such the Linux pvops'fied version of write_cr3
      checks if it has to update the user-space cr3 as well.
      
      That clearly is not needed during early bootup.  The recent changes (see
      above git commit) streamline the x86 page table allocation to be much
      simpler (And also incidentally the #PF handler ends up in spirit being
      similar to how the Xen toolstack sets up the initial page-tables).
      
      The fix is to have an early-bootup version of cr3 that just loads the
      kernel %cr3.  The later version - which also handles user-page
      modifications will be used after the initial page tables have been
      setup.
      
      [ hpa: removed a redundant #ifdef and made the new function __init.
        Also note that x86-32 already has such an early xen_write_cr3. ]
      Tested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cc9129d
  4. 29 11月, 2012 2 次提交
  5. 18 11月, 2012 1 次提交
  6. 01 11月, 2012 1 次提交
  7. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  8. 04 10月, 2012 1 次提交
    • O
      xen pv-on-hvm: add pfn_is_ram helper for kdump · 34b6f01a
      Olaf Hering 提交于
      Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump
      kernel. See commit message of 997c136f ("fs/proc/vmcore.c: add hook
      to read_from_oldmem() to check for non-ram pages") for details.
      
      It makes use of a new hvmop HVMOP_get_mem_type which was introduced in
      xen 4.2 (23298:26413986e6e0) and backported to 4.1.1.
      
      The new function is currently only enabled for reading /proc/vmcore.
      Later it will be used also for the kexec kernel. Since that requires
      more changes in the generic kernel make it static for the time being.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      34b6f01a
  9. 12 9月, 2012 4 次提交
  10. 06 9月, 2012 1 次提交
  11. 05 9月, 2012 1 次提交
  12. 23 8月, 2012 10 次提交
  13. 20 7月, 2012 2 次提交
    • D
      xen/mm: zero PTEs for non-present MFNs in the initial page table · 66a27dde
      David Vrabel 提交于
      When constructing the initial page tables, if the MFN for a usable PFN
      is missing in the p2m then that frame is initially ballooned out.  In
      this case, zero the PTE (as in decrease_reservation() in
      drivers/xen/balloon.c).
      
      This is obviously safe instead of having an valid PTE with an MFN of
      INVALID_P2M_ENTRY (~0).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      66a27dde
    • D
      xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable · d095d43e
      David Vrabel 提交于
      In xen_set_pte() if batching is unavailable (because the caller is in
      an interrupt context such as handling a page fault) it would fall back
      to using native_set_pte() and trapping and emulating the PTE write.
      
      On 32-bit guests this requires two traps for each PTE write (one for
      each dword of the PTE).  Instead, do one mmu_update hypercall
      directly.
      
      During construction of the initial page tables, continue to use
      native_set_pte() because most of the PTEs being set are in writable
      and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
      using a hypercall for this is very expensive.
      
      This significantly improves page fault performance in 32-bit PV
      guests.
      
      lmbench3 test  Before    After     Improvement
      ----------------------------------------------
      lat_pagefault  3.18 us   2.32 us   27%
      lat_proc fork  356 us    313.3 us  11%
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d095d43e
  14. 28 6月, 2012 1 次提交
    • A
      x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range · e7b52ffd
      Alex Shi 提交于
      x86 has no flush_tlb_range support in instruction level. Currently the
      flush_tlb_range just implemented by flushing all page table. That is not
      the best solution for all scenarios. In fact, if we just use 'invlpg' to
      flush few lines from TLB, we can get the performance gain from later
      remain TLB lines accessing.
      
      But the 'invlpg' instruction costs much of time. Its execution time can
      compete with cr3 rewriting, and even a bit more on SNB CPU.
      
      So, on a 512 4KB TLB entries CPU, the balance points is at:
      	(512 - X) * 100ns(assumed TLB refill cost) =
      		X(TLB flush entries) * 100ns(assumed invlpg cost)
      
      Here, X is 256, that is 1/2 of 512 entries.
      
      But with the mysterious CPU pre-fetcher and page miss handler Unit, the
      assumed TLB refill cost is far lower then 100ns in sequential access. And
      2 HT siblings in one core makes the memory access more faster if they are
      accessing the same memory. So, in the patch, I just do the change when
      the target entries is less than 1/16 of whole active tlb entries.
      Actually, I have no data support for the percentage '1/16', so any
      suggestions are welcomed.
      
      As to hugetlb, guess due to smaller page table, and smaller active TLB
      entries, I didn't see benefit via my benchmark, so no optimizing now.
      
      My micro benchmark show in ideal scenarios, the performance improves 70
      percent in reading. And in worst scenario, the reading/writing
      performance is similar with unpatched 3.4-rc4 kernel.
      
      Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
      'always':
      
      multi thread testing, '-t' paramter is thread number:
      	       	        with patch   unpatched 3.4-rc4
      ./mprotect -t 1           14ns		24ns
      ./mprotect -t 2           13ns		22ns
      ./mprotect -t 4           12ns		19ns
      ./mprotect -t 8           14ns		16ns
      ./mprotect -t 16          28ns		26ns
      ./mprotect -t 32          54ns		51ns
      ./mprotect -t 128         200ns		199ns
      
      Single process with sequencial flushing and memory accessing:
      
      		       	with patch   unpatched 3.4-rc4
      ./mprotect		    7ns			11ns
      ./mprotect -p 4096  -l 8 -n 10240
      			    21ns		21ns
      
      [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
        has additional performance numbers. ]
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      e7b52ffd
  15. 08 5月, 2012 1 次提交
    • D
      xen/setup: update VA mapping when releasing memory during setup · 83d51ab4
      David Vrabel 提交于
      In xen_memory_setup(), if a page that is being released has a VA
      mapping this must also be updated.  Otherwise, the page will be not
      released completely -- it will still be referenced in Xen and won't be
      freed util the mapping is removed and this prevents it from being
      reallocated at a different PFN.
      
      This was already being done for the ISA memory region in
      xen_ident_map_ISA() but on many systems this was omitting a few pages
      as many systems marked a few pages below the ISA memory region as
      reserved in the e820 map.
      
      This fixes errors such as:
      
      (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
      (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      83d51ab4
  16. 07 5月, 2012 1 次提交
    • K
      xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs · b7e5ffe5
      Konrad Rzeszutek Wilk 提交于
      If I try to do "cat /sys/kernel/debug/kernel_page_tables"
      I end up with:
      
      BUG: unable to handle kernel paging request at ffffc7fffffff000
      IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
      PGD 0
      Oops: 0000 [#1] SMP
      CPU 0
      .. snip..
      RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
      RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000
      
      which is due to the fact we are trying to access a PFN that is not
      accessible to us. The reason (at least in this case) was that
      PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
      hypervisor) to point to a read-only linear map of the MFN->PFN array.
      During our parsing we would get the MFN (a valid one), try to look
      it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
      Then pte_mfn_to_pfn would happilly feed that in, attach the flags
      and return it back to the caller. 'ptdump_show' bitshifts it and
      gets and invalid value that it tries to dereference.
      
      Instead of doing all of that, we detect the ~0 case and just
      return !_PAGE_PRESENT.
      
      This bug has been in existence .. at least until 2.6.37 (yikes!)
      
      CC: stable@kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b7e5ffe5
  17. 02 5月, 2012 1 次提交
  18. 07 4月, 2012 1 次提交
    • K
      xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' · 2531d64b
      Konrad Rzeszutek Wilk 提交于
      The above mentioned patch checks the IOAPIC and if it contains
      -1, then it unmaps said IOAPIC. But under Xen we get this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      PGD 0
      Oops: 0002 [#1] SMP
      CPU 0
      Modules linked in:
      
      Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
      1525                  /0U990C
      RIP: e030:[<ffffffff8134e51f>]  [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      RSP: e02b: ffff8800d42cbb70  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
      RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
      RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
      FS:  0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0:000000008005003b
      CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
      Stack:
       00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
       ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
       0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
      Call Trace:
       [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
       [<ffffffff8100a9b2>] ? check_events+0x12+0x20
       [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
       [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
       [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
       [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
       [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
       [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
       [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
       [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
       [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20
      
      The reason we are dying is b/c the call acpi_get_override_irq() is used,
      which returns the polarity and trigger for the IRQs. That function calls
      mp_find_ioapics to get the 'struct ioapic' structure - which along with the
      mp_irq[x] is used to figure out the default values and the polarity/trigger
      overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
      with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
      mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.
      
      The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
      struct so that platforms can override it. But for v3.4 lets carry this
      work-around. This patch does that by providing a slightly different variant
      of the fake IOAPIC entries.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2531d64b
  19. 20 2月, 2012 1 次提交
    • K
      xen/pat: Disable PAT support for now. · 8eaffa67
      Konrad Rzeszutek Wilk 提交于
      [Pls also look at https://lkml.org/lkml/2012/2/10/228]
      
      Using of PAT to change pages from WB to WC works quite nicely.
      Changing it back to WB - not so much. The crux of the matter is
      that the code that does this (__page_change_att_set_clr) has only
      limited information so when it tries to the change it gets
      the "raw" unfiltered information instead of the properly filtered one -
      and the "raw" one tell it that PSE bit is on (while infact it
      is not).  As a result when the PTE is set to be WB from WC, we get
      tons of:
      
      :WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
      :Hardware name: HP xw4400 Workstation
      .. snip..
      :Pid: 27, comm: kswapd0 Tainted: G        W    3.2.2-1.fc16.x86_64 #1
      :Call Trace:
      : [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
      : [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
      : [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
      : [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
      : [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
      : [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
      : [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
      : [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
      : [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
      : [<ffffffff8100a9b2>] ? check_events+0x12/0x20
      : [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
      : [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
      : [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
      : [<ffffffff8112ba04>] shrink_slab+0x154/0x310
      : [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
      : [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
      : [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
      : [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
      : [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
      : [<ffffffff8108fb6c>] kthread+0x8c/0xa0
      
      for every page. The proper fix for this is has been posted
      and is https://lkml.org/lkml/2012/2/10/228
      "x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
      along with a detailed description of the problem and solution.
      
      But since that posting has gone nowhere I am proposing
      this band-aid solution so that at least users don't get
      the page corruption (the pages that are WC don't get changed to WB
      and end up being recycled for filesystem or other things causing
      mysterious crashes).
      
      The negative impact of this patch is that users of WC flag
      (which are InfiniBand, radeon, nouveau drivers) won't be able
      to set that flag - so they are going to see performance degradation.
      But stability is more important here.
      
      Fixes RH BZ# 742032, 787403, and 745574
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8eaffa67
  20. 25 1月, 2012 1 次提交
  21. 10 1月, 2012 1 次提交
  22. 24 9月, 2011 1 次提交
  23. 15 9月, 2011 1 次提交
  24. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  25. 10 8月, 2011 1 次提交
  26. 05 8月, 2011 1 次提交