1. 04 10月, 2012 1 次提交
    • O
      xen pv-on-hvm: add pfn_is_ram helper for kdump · 34b6f01a
      Olaf Hering 提交于
      Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump
      kernel. See commit message of 997c136f ("fs/proc/vmcore.c: add hook
      to read_from_oldmem() to check for non-ram pages") for details.
      
      It makes use of a new hvmop HVMOP_get_mem_type which was introduced in
      xen 4.2 (23298:26413986e6e0) and backported to 4.1.1.
      
      The new function is currently only enabled for reading /proc/vmcore.
      Later it will be used also for the kexec kernel. Since that requires
      more changes in the generic kernel make it static for the time being.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      34b6f01a
  2. 24 9月, 2012 2 次提交
  3. 18 9月, 2012 2 次提交
  4. 12 9月, 2012 5 次提交
  5. 06 9月, 2012 1 次提交
  6. 05 9月, 2012 2 次提交
    • A
      xen: fix logical error in tlb flushing · ce7184bd
      Alex Shi 提交于
      While TLB_FLUSH_ALL gets passed as 'end' argument to
      flush_tlb_others(), the Xen code was made to check its 'start'
      parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
      instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
      be flushed from TLB.
      
      This patch fixed this issue.
      Reported-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Tested-by: NYongjie Ren <yongjie.ren@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ce7184bd
    • K
      xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041
      Konrad Rzeszutek Wilk 提交于
      We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
      inclusive) when trying to figure out whether we can re-use some of the
      P2M middle leafs.
      
      Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
      we would try to use the 512th entry. Fortunately for us the p2m_top_index
      has a check for this:
      
       BUG_ON(pfn >= MAX_P2M_PFN);
      
      which we hit and saw this:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff819cadeb>]
      (XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
      (XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
      (XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
      (XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
      (XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
      (XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
      (XEN) cr3: 0000000661795000   cr2: 0000000000000000
      
      Fixes-Oracle-Bug: 14570662
      CC: stable@vger.kernel.org # only for v3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      50e90041
  7. 28 8月, 2012 1 次提交
  8. 23 8月, 2012 18 次提交
    • K
      xen/mmu: If the revector fails, don't attempt to revector anything else. · 32873187
      Konrad Rzeszutek Wilk 提交于
      If the P2M revectoring would fail, we would try to continue on by
      cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap
      is greedy and erases the PMD on both boundaries. Since the P2M
      array can share the PMD, we would wipe out part of the __ka
      that is still used in the P2M tree to point to P2M leafs.
      
      This fixes it by bypassing the revectoring and continuing on.
      If the revector fails, a nice WARN is printed so we can still
      troubleshoot this.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      32873187
    • K
      xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc
      Konrad Rzeszutek Wilk 提交于
      When we free the PFNs and then subsequently populate them back
      during bootup:
      
      Freeing 20000-20200 pfn range: 512 pages freed
      1-1 mapping on 20000->20200
      Freeing 40000-40200 pfn range: 512 pages freed
      1-1 mapping on 40000->40200
      Freeing bad80-badf4 pfn range: 116 pages freed
      1-1 mapping on bad80->badf4
      Freeing badf6-bae7f pfn range: 137 pages freed
      1-1 mapping on badf6->bae7f
      Freeing bb000-100000 pfn range: 282624 pages freed
      1-1 mapping on bb000->100000
      Released 283999 pages of unused memory
      Set 283999 page(s) to 1-1 mapping
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      We end up having the P2M array (that is the one that was
      grafted on the P2M tree) filled with IDENTITY_FRAME or
      INVALID_P2M_ENTRY) entries. The patch titled
      
      "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
      recycles said slots and replaces the P2M tree leaf's with
       &mfn_list[xx] with p2m_identity or p2m_missing.
      
      And re-uses the P2M array sections for other P2M tree leaf's.
      For the above mentioned bootup excerpt, the PFNs at
      0x20000->0x20200 are going to be IDENTITY based:
      
      P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
      
      We can re-use that and replace P2M[0][256] to point to p2m_identity.
      The "old" page (the grafted P2M array provided by Xen) that was at
      P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
      b/c when we populate back:
      
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
      the new MFNs.
      
      That is all OK, except when we revector we assume that the PFN
      count would be the same in the grafted P2M array and in the
      newly allocated. Since that is no longer the case, as we have
      holes in the P2M that point to p2m_missing or p2m_identity we
      have to take that into account.
      
      [v2: Check for overflow]
      [v3: Move within the __va check]
      [v4: Fix the computation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3fc509fc
    • K
      xen/mmu: Release just the MFN list, not MFN list and part of pagetables. · 785f6231
      Konrad Rzeszutek Wilk 提交于
      We call memblock_reserve for [start of mfn list] -> [PMD aligned end
      of mfn list] instead of <start of mfn list> -> <page aligned end of mfn list].
      
      This has the disastrous effect that if at bootup the end of mfn_list is
      not PMD aligned we end up returning to memblock parts of the region
      past the mfn_list array. And those parts are the PTE tables with
      the disastrous effect of seeing this at bootup:
      
      Write protecting the kernel read-only data: 10240k
      Freeing unused kernel memory: 1860k freed
      Freeing unused kernel memory: 200k freed
      (XEN) mm.c:2429:d0 Bad type (saw 1400000000000002 != exp 7000000000000000) for mfn 116a80 (pfn 14e26)
      ...
      (XEN) mm.c:908:d0 Error getting mfn 116a83 (pfn 14e2a) from L1 entry 8000000116a83067 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:908:d0 Error getting mfn 4040 (pfn 5555555555555555) from L1 entry 0000000004040601 for l1e_owner=0, pg_owner=0
      .. and so on.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      785f6231
    • K
      xen/mmu: Remove from __ka space PMD entries for pagetables. · 3aca7fbc
      Konrad Rzeszutek Wilk 提交于
      Please first read the description in "xen/mmu: Copy and revector the
      P2M tree."
      
      At this stage, the __ka address space (which is what the old
      P2M tree was using) is partially disassembled. The cleanup_highmap
      has removed the PMD entries from 0-16MB and anything past _brk_end
      up to the max_pfn_mapped (which is the end of the ramdisk).
      
      The xen_remove_p2m_tree and code around has ripped out the __ka for
      the old P2M array.
      
      Here we continue on doing it to where the Xen page-tables were.
      It is safe to do it, as the page-tables are addressed using __va.
      For good measure we delete anything that is within MODULES_VADDR
      and up to the end of the PMD.
      
      At this point the __ka only contains PMD entries for the start
      of the kernel up to __brk.
      
      [v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3aca7fbc
    • K
      xen/mmu: Copy and revector the P2M tree. · 7f914062
      Konrad Rzeszutek Wilk 提交于
      Please first read the description in "xen/p2m: Add logic to revector a
      P2M tree to use __va leafs" patch.
      
      The 'xen_revector_p2m_tree()' function allocates a new P2M tree
      copies the contents of the old one in it, and returns the new one.
      
      At this stage, the __ka address space (which is what the old
      P2M tree was using) is partially disassembled. The cleanup_highmap
      has removed the PMD entries from 0-16MB and anything past _brk_end
      up to the max_pfn_mapped (which is the end of the ramdisk).
      
      We have revectored the P2M tree (and the one for save/restore as well)
      to use new shiny __va address to new MFNs. The xen_start_info
      has been taken care of already in 'xen_setup_kernel_pagetable()' and
      xen_start_info->shared_info in 'xen_setup_shared_info()', so
      we are free to roam and delete PMD entries - which is exactly what
      we are going to do. We rip out the __ka for the old P2M array.
      
      [v1: Fix smatch warnings]
      [v2: memset was doing 0 instead of 0xff]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f914062
    • K
      xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb
      Konrad Rzeszutek Wilk 提交于
      During bootup Xen supplies us with a P2M array. It sticks
      it right after the ramdisk, as can be seen with a 128GB PV guest:
      
      (certain parts removed for clarity):
      xc_dom_build_image: called
      xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
      xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
      xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
      xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
      xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
      xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
      nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
      nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
      xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
      xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
      xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
      xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
      
      So the physical memory and virtual (using __START_KERNEL_map addresses)
      layout looks as so:
      
        phys                             __ka
      /------------\                   /-------------------\
      | 0          | empty             | 0xffffffff80000000|
      | ..         |                   | ..                |
      | 16MB       | <= kernel starts  | 0xffffffff81000000|
      | ..         |                   |                   |
      | 30MB       | <= kernel ends => | 0xffffffff81e43000|
      | ..         |  & ramdisk starts | ..                |
      | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
      | ..         |  & P2M starts     | ..                |
      | ..         |                   | ..                |
      | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
      | ..         | start_info        | 0xffffffffa25c7000|
      | ..         | xenstore          | 0xffffffffa25c8000|
      | ..         | cosole            | 0xffffffffa25c9000|
      | 549MB      | <= page tables => | 0xffffffffa25ca000|
      | ..         |                   |                   |
      | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
      | ..         | boot stack        |                   |
      \------------/                   \-------------------/
      
      As can be seen, the ramdisk, P2M and pagetables are taking
      a bit of __ka addresses space. Which is a problem since the
      MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
      right in there! This results during bootup with the inability to
      load modules, with this error:
      
      ------------[ cut here ]------------
      WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
      Call Trace:
       [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
       [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
       [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
       [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
       [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c6186>] ? load_module+0x66/0x19c0
       [<ffffffff8105cadc>] module_alloc+0x5c/0x60
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
       [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
       [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
       [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
      ---[ end trace fd8f7704fdea0291 ]---
      vmalloc: allocation failure, allocated 16384 of 20480 bytes
      modprobe: page allocation failure: order:0, mode:0xd2
      
      Since the __va and __ka are 1:1 up to MODULES_VADDR and
      cleanup_highmap rids __ka of the ramdisk mapping, what
      we want to do is similar - get rid of the P2M in the __ka
      address space. There are two ways of fixing this:
      
       1) All P2M lookups instead of using the __ka address would
          use the __va address. This means we can safely erase from
          __ka space the PMD pointers that point to the PFNs for
          P2M array and be OK.
       2). Allocate a new array, copy the existing P2M into it,
          revector the P2M tree to use that, and return the old
          P2M to the memory allocate. This has the advantage that
          it sets the stage for using XEN_ELF_NOTE_INIT_P2M
          feature. That feature allows us to set the exact virtual
          address space we want for the P2M - and allows us to
          boot as initial domain on large machines.
      
      So we pick option 2).
      
      This patch only lays the groundwork in the P2M code. The patch
      that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      357a3cfb
    • K
      xen/mmu: Recycle the Xen provided L4, L3, and L2 pages · 488f046d
      Konrad Rzeszutek Wilk 提交于
      As we are not using them. We end up only using the L1 pagetables
      and grafting those to our page-tables.
      
      [v1: Per Stefano's suggestion squashed two commits]
      [v2: Per Stefano's suggestion simplified loop]
      [v3: Fix smatch warnings]
      [v4: Add more comments]
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      488f046d
    • K
      xen/mmu: For 64-bit do not call xen_map_identity_early · caaf9ecf
      Konrad Rzeszutek Wilk 提交于
      B/c we do not need it. During the startup the Xen provides
      us with all the initial memory mapped that we need to function.
      
      The initial memory mapped is up to the bootstack, which means
      we can reference using __ka up to 4.f):
      
      (from xen/interface/xen.h):
      
       4. This the order of bootstrap elements in the initial virtual region:
         a. relocated kernel image
         b. initial ram disk              [mod_start, mod_len]
         c. list of allocated page frames [mfn_list, nr_pages]
         d. start_info_t structure        [register ESI (x86)]
         e. bootstrap page tables         [pt_base, CR3 (x86)]
         f. bootstrap stack               [register ESP (x86)]
      
      (initial ram disk may be ommitted).
      
      [v1: More comments in git commit]
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      caaf9ecf
    • K
      xen/mmu: use copy_page instead of memcpy. · ae895ed7
      Konrad Rzeszutek Wilk 提交于
      After all, this is what it is there for.
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ae895ed7
    • K
      xen/mmu: Provide comments describing the _ka and _va aliasing issue · 4fac153a
      Konrad Rzeszutek Wilk 提交于
      Which is that the level2_kernel_pgt (__ka virtual addresses)
      and level2_ident_pgt (__va virtual address) contain the same
      PMD entries. So if you modify a PTE in __ka, it will be reflected
      in __va (and vice-versa).
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4fac153a
    • K
      xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. · 3699aad0
      Konrad Rzeszutek Wilk 提交于
      We don't need to return the new PGD - as we do not use it.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3699aad0
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
    • K
      xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. · 6d7083ee
      Konrad Rzeszutek Wilk 提交于
      arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
      arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6d7083ee
    • S
      xen: allow privcmd for HVM guests · 1a1d4331
      Stefano Stabellini 提交于
      This patch removes the "return -ENOSYS" for auto_translated_physmap
      guests from privcmd_mmap, thus it allows ARM guests to issue privcmd
      mmap calls. However privcmd mmap calls are still going to fail for HVM
      and hybrid guests on x86 because the xen_remap_domain_mfn_range
      implementation is currently PV only.
      
      Changes in v2:
      
      - better commit message;
      - return -EINVAL from xen_remap_domain_mfn_range if
        auto_translated_physmap.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1a1d4331
    • S
      xen: Introduce xen_pfn_t for pfn and mfn types · bd3f79b7
      Stefano Stabellini 提交于
      All the original Xen headers have xen_pfn_t as mfn and pfn type, however
      when they have been imported in Linux, xen_pfn_t has been replaced with
      unsigned long. That might work for x86 and ia64 but it does not for arm.
      Bring back xen_pfn_t and let each architecture define xen_pfn_t as they
      see fit.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bd3f79b7
    • K
      xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. · c96aae1f
      Konrad Rzeszutek Wilk 提交于
      When we are finished with return PFNs to the hypervisor, then
      populate it back, and also mark the E820 MMIO and E820 gaps
      as IDENTITY_FRAMEs, we then call P2M to set areas that can
      be used for ballooning. We were off by one, and ended up
      over-writting a P2M entry that most likely was an IDENTITY_FRAME.
      For example:
      
      1-1 mapping on 40000->40200
      1-1 mapping on bc558->bc5ac
      1-1 mapping on bc5b4->bc8c5
      1-1 mapping on bc8c6->bcb7c
      1-1 mapping on bcd00->100000
      Released 614 pages of unused memory
      Set 277889 page(s) to 1-1 mapping
      Populating 40200-40466 pfn range: 614 pages added
      
      => here we set from 40466 up to bc559 P2M tree to be
      INVALID_P2M_ENTRY. We should have done it up to bc558.
      
      The end result is that if anybody is trying to construct
      a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
      
      CC: stable@vger.kernel.org
      Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c96aae1f
    • A
      x86, microcode, AMD: Fix broken ucode patch size check · 36bf50d7
      Andreas Herrmann 提交于
      This issue was recently observed on an AMD C-50 CPU where a patch of
      maximum size was applied.
      
      Commit be62adb4 ("x86, microcode, AMD: Simplify ucode verification")
      added current_size in get_matching_microcode(). This is calculated as
      size of the ucode patch + 8 (ie. size of the header). Later this is
      compared against the maximum possible ucode patch size for a CPU family.
      And of course this fails if the patch has already maximum size.
      
      Cc: <stable@vger.kernel.org> [3.3+]
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      36bf50d7
    • A
      KVM: x86 emulator: use stack size attribute to mask rsp in stack ops · 5ad105e5
      Avi Kivity 提交于
      The sub-register used to access the stack (sp, esp, or rsp) is not
      determined by the address size attribute like other memory references,
      but by the stack segment's B bit (if not in x86_64 mode).
      
      Fix by using the existing stack_mask() to figure out the correct mask.
      
      This long-existing bug was exposed by a combination of a27685c3
      (emulate invalid guest state by default), which causes many more
      instructions to be emulated, and a seabios change (possibly a bug) which
      causes the high 16 bits of esp to become polluted across calls to real
      mode software interrupts.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      5ad105e5
  9. 22 8月, 2012 8 次提交
    • T
      KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended · 35f2d16b
      Takuya Yoshikawa 提交于
      Although the possible race described in
      
        commit 85b70591
        KVM: MMU: fix shrinking page from the empty mmu
      
      was correct, the real cause of that issue was a more trivial bug of
      mmu_shrink() introduced by
      
        commit 19526396
        KVM: MMU: do not iterate over all VMs in mmu_shrink()
      
      Here is the bug:
      
      	if (kvm->arch.n_used_mmu_pages > 0) {
      		if (!nr_to_scan--)
      			break;
      		continue;
      	}
      
      We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
      in other words we try to shrink empty ones by mistake.
      
      This patch reverses the logic so that mmu_shrink() can free pages from
      the first VM whose n_used_mmu_pages is not zero.  Note that we also add
      comments explaining the role of nr_to_scan which is not practically
      important now, hoping this will be improved in the future.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      35f2d16b
    • A
      x86/alternatives: Fix p6 nops on non-modular kernels · cb09cad4
      Avi Kivity 提交于
      Probably a leftover from the early days of self-patching, p6nops
      are marked __initconst_or_module, which causes them to be
      discarded in a non-modular kernel.  If something later triggers
      patching, it will overwrite kernel code with garbage.
      Reported-by: NTomas Racek <tracek@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Cc: Michael Tokarev <mjt@tls.msk.ru>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: qemu-devel@nongnu.org
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb09cad4
    • L
      x86/fixup_irq: Use cpu_online_mask instead of cpu_all_mask · 2530cd4f
      Liu, Chuansheng 提交于
      When one CPU is going down and this CPU is the last one in irq
      affinity, current code is setting cpu_all_mask as the new
      affinity for that irq.
      
      But for some systems (such as in Medfield Android mobile) the
      firmware sends the interrupt to each CPU in the irq affinity
      mask, averaged, and cpu_all_mask includes all potential CPUs,
      i.e. offline ones as well.
      
      So replace cpu_all_mask with cpu_online_mask.
      Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
      Acked-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2530cd4f
    • R
      x86/spinlocks: Fix comment in spinlock.h · 83be4ffa
      Richard Weinberger 提交于
      This comment is no longer true.  We support up to 2^16 CPUs
      because __ticket_t is an u16 if NR_CPUS is larger than 256.
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      83be4ffa
    • M
      mm: hugetlbfs: correctly populate shared pmd · eb48c071
      Michal Hocko 提交于
      Each page mapped in a process's address space must be correctly
      accounted for in _mapcount.  Normally the rules for this are
      straightforward but hugetlbfs page table sharing is different.  The page
      table pages at the PMD level are reference counted while the mapcount
      remains the same.
      
      If this accounting is wrong, it causes bugs like this one reported by
      Larry Woodman:
      
        kernel BUG at mm/filemap.c:135!
        invalid opcode: 0000 [#1] SMP
        CPU 22
        Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
        Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
        RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
        Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
        Call Trace:
          delete_from_page_cache+0x40/0x80
          truncate_hugepages+0x115/0x1f0
          hugetlbfs_evict_inode+0x18/0x30
          evict+0x9f/0x1b0
          iput_final+0xe3/0x1e0
          iput+0x3e/0x50
          d_kill+0xf8/0x110
          dput+0xe2/0x1b0
          __fput+0x162/0x240
      
      During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
      shared page tables with the check dst_pte == src_pte.  The logic is if
      the PMD page is the same, they must be shared.  This assumes that the
      sharing is between the parent and child.  However, if the sharing is
      with a different process entirely then this check fails as in this
      diagram:
      
        parent
          |
          ------------>pmd
                       src_pte----------> data page
                                              ^
        other--------->pmd--------------------|
                        ^
        child-----------|
                       dst_pte
      
      For this situation to occur, it must be possible for Parent and Other to
      have faulted and failed to share page tables with each other.  This is
      possible due to the following style of race.
      
        PROC A                                          PROC B
        copy_hugetlb_page_range                         copy_hugetlb_page_range
          src_pte == huge_pte_offset                      src_pte == huge_pte_offset
          !src_pte so no sharing                          !src_pte so no sharing
      
        (time passes)
      
        hugetlb_fault                                   hugetlb_fault
          huge_pte_alloc                                  huge_pte_alloc
            huge_pmd_share                                 huge_pmd_share
              LOCK(i_mmap_mutex)
              find nothing, no sharing
              UNLOCK(i_mmap_mutex)
                                                            LOCK(i_mmap_mutex)
                                                            find nothing, no sharing
                                                            UNLOCK(i_mmap_mutex)
            pmd_alloc                                       pmd_alloc
            LOCK(instantiation_mutex)
            fault
            UNLOCK(instantiation_mutex)
                                                        LOCK(instantiation_mutex)
                                                        fault
                                                        UNLOCK(instantiation_mutex)
      
      These two processes are not poing to the same data page but are not
      sharing page tables because the opportunity was missed.  When either
      process later forks, the src_pte == dst pte is potentially insufficient.
      As the check falls through, the wrong PTE information is copied in
      (harmless but wrong) and the mapcount is bumped for a page mapped by a
      shared page table leading to the BUG_ON.
      
      This patch addresses the issue by moving pmd_alloc into huge_pmd_share
      which guarantees that the shared pud is populated in the same critical
      section as pmd.  This also means that huge_pte_offset test in
      huge_pmd_share is serialized correctly now which in turn means that the
      success of the sharing will be higher as the racing tasks see the pud
      and pmd populated together.
      
      Race identified and changelog written mostly by Mel Gorman.
      
      {akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
      Reported-by: NLarry Woodman <lwoodman@redhat.com>
      Tested-by: NLarry Woodman <lwoodman@redhat.com>
      Reviewed-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Ken Chen <kenchen@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb48c071
    • K
      xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static. · b8b0f559
      Konrad Rzeszutek Wilk 提交于
      There is no need for those functions/variables to be visible. Make them
      static and also fix the compile warnings of this sort:
      
      drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static?
      
      Some of them just require including the header file that
      declares the functions.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b8b0f559
    • S
      xen: missing includes · 4d9310e3
      Stefano Stabellini 提交于
      Changes in v2:
      - remove pvclock hack;
      - remove include linux/types.h from xen/interface/xen.h.
      v3:
      - Compile under IA64
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4d9310e3
    • K
      xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB. · fc2341df
      Konrad Rzeszutek Wilk 提交于
      If a PV guest is booted the native SWIOTLB should not be
      turned on. It does not help us (we don't have any PCI devices)
      and it eats 64MB of good memory. In the case of PV guests
      with PCI devices we need the Xen-SWIOTLB one.
      
      [v1: Rewrite comment per Stefano's suggestion]
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fc2341df