1. 02 5月, 2017 2 次提交
  2. 27 3月, 2017 2 次提交
  3. 16 3月, 2017 1 次提交
    • T
      x86: Remap GDT tables in the fixmap section · 69218e47
      Thomas Garnier 提交于
      Each processor holds a GDT in its per-cpu structure. The sgdt
      instruction gives the base address of the current GDT. This address can
      be used to bypass KASLR memory randomization. With another bug, an
      attacker could target other per-cpu structures or deduce the base of
      the main memory section (PAGE_OFFSET).
      
      This patch relocates the GDT table for each processor inside the
      fixmap section. The space is reserved based on number of supported
      processors.
      
      For consistency, the remapping is done by default on 32 and 64-bit.
      
      Each processor switches to its remapped GDT at the end of
      initialization. For hibernation, the main processor returns with the
      original GDT and switches back to the remapping at completion.
      
      This patch was tested on both architectures. Hibernation and KVM were
      both tested specially for their usage of the GDT.
      
      Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
      recommending changes for Xen support.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69218e47
  4. 02 3月, 2017 1 次提交
  5. 07 2月, 2017 1 次提交
  6. 28 1月, 2017 1 次提交
    • I
      x86/boot/e820: Move asm/e820.h to asm/e820/api.h · 66441bd3
      Ingo Molnar 提交于
      In line with asm/e820/types.h, move the e820 API declarations to
      asm/e820/api.h and update all usage sites.
      
      This is just a mechanical, obviously correct move & replace patch,
      there will be subsequent changes to clean up the code and to make
      better use of the new header organization.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66441bd3
  7. 14 7月, 2016 1 次提交
    • P
      x86/xen: Audit and remove any unnecessary uses of module.h · 7a2463dc
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace as needed.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20160714001901.31603-7-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a2463dc
  8. 23 6月, 2016 2 次提交
  9. 24 2月, 2016 1 次提交
  10. 14 12月, 2015 1 次提交
  11. 26 11月, 2015 1 次提交
  12. 28 10月, 2015 1 次提交
  13. 09 9月, 2015 1 次提交
  14. 20 8月, 2015 4 次提交
    • J
      xen: move p2m list if conflicting with e820 map · 70e61199
      Juergen Gross 提交于
      Check whether the hypervisor supplied p2m list is placed at a location
      which is conflicting with the target E820 map. If this is the case
      relocate it to a new area unused up to now and compliant to the E820
      map.
      
      As the p2m list might by huge (up to several GB) and is required to be
      mapped virtually, set up a temporary mapping for the copied list.
      
      For pvh domains just delete the p2m related information from start
      info instead of reserving the p2m memory, as we don't need it at all.
      
      For 32 bit kernels adjust the memblock_reserve() parameters in order
      to cover the page tables only. This requires to memblock_reserve() the
      start_info page on it's own.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      70e61199
    • J
      xen: add explicit memblock_reserve() calls for special pages · 6c2681c8
      Juergen Gross 提交于
      Some special pages containing interfaces to xen are being reserved
      implicitly only today. The memblock_reserve() call to reserve them is
      meant to reserve the p2m list supplied by xen. It is just reserving
      not only the p2m list itself, but some more pages up to the start of
      the xen built page tables.
      
      To be able to move the p2m list to another pfn range, which is needed
      for support of huge RAM, this memblock_reserve() must be split up to
      cover all affected reserved pages explicitly.
      
      The affected pages are:
      - start_info page
      - xenstore ring (might be missing, mfn is 0 in this case)
      - console ring (not for initial domain)
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      6c2681c8
    • J
      xen: check pre-allocated page tables for conflict with memory map · 04414baa
      Juergen Gross 提交于
      Check whether the page tables built by the domain builder are at
      memory addresses which are in conflict with the target memory map.
      If this is the case just panic instead of running into problems
      later.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      04414baa
    • J
      xen: eliminate scalability issues from initial mapping setup · 8f5b0c63
      Juergen Gross 提交于
      Direct Xen to place the initial P->M table outside of the initial
      mapping, as otherwise the 1G (implementation) / 2G (theoretical)
      restriction on the size of the initial mapping limits the amount
      of memory a domain can be handed initially.
      
      As the initial P->M table is copied rather early during boot to
      domain private memory and it's initial virtual mapping is dropped,
      the easiest way to avoid virtual address conflicts with other
      addresses in the kernel is to use a user address area for the
      virtual address of the initial P->M table. This allows us to just
      throw away the page tables of the initial mapping after the copy
      without having to care about address invalidation.
      
      It should be noted that this patch won't enable a pv-domain to USE
      more than 512 GB of RAM. It just enables it to be started with a
      P->M table covering more memory. This is especially important for
      being able to boot a Dom0 on a system with more than 512 GB memory.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Based-on-patch-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      8f5b0c63
  15. 15 4月, 2015 1 次提交
  16. 16 3月, 2015 2 次提交
    • D
      xen/privcmd: improve performance of MMAPBATCH_V2 · 4e8c0c8c
      David Vrabel 提交于
      Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map
      multiple frames at a time rather than one at a time, despite the pages
      being non-consecutive GFNs.
      
      xen_remap_foreign_mfn_array() is added which maps an array of GFNs
      (instead of a consecutive range of GFNs).
      
      Since per-frame errors are returned in an array, privcmd must set the
      MMAPBATCH_V1 error bits as part of the "report errors" phase, after
      all the frames are mapped.
      
      Migrate times are significantly improved (when using a PV toolstack
      domain).  For example, for an idle 12 GiB PV guest:
      
              Before     After
        real  0m38.179s  0m26.868s
        user  0m15.096s  0m13.652s
        sys   0m28.988s  0m18.732s
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      4e8c0c8c
    • D
      xen: unify foreign GFN map/unmap for auto-xlated physmap guests · 628c28ee
      David Vrabel 提交于
      Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and
      unmap foreign GFNs using the same method (updating the physmap).
      Unify the two arm and x86 implementations into one commont one.
      
      Note that on arm and arm64, the correct error code will be returned
      (instead of always -EFAULT) and map/unmap failure warnings are no
      longer printed.  These changes are required if the foreign domain is
      paging (-ENOENT failures are expected and must be propagated up to the
      caller).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      628c28ee
  17. 28 1月, 2015 2 次提交
  18. 11 12月, 2014 1 次提交
    • J
      xen: switch to post-init routines in xen mmu.c earlier · cdfa0bad
      Juergen Gross 提交于
      With the virtual mapped linear p2m list the post-init mmu operations
      must be used for setting up the p2m mappings, as in case of
      CONFIG_FLATMEM the init routines may trigger BUGs.
      
      paging_init() sets up all infrastructure needed to switch to the
      post-init mmu ops done by xen_post_allocator_init(). With the virtual
      mapped linear p2m list we need some mmu ops during setup of this list,
      so we have to switch to the correct mmu ops as soon as possible.
      
      The p2m list is usable from the beginning, just expansion requires to
      have established the new linear mapping. So the call of
      xen_remap_memory() had to be introduced, but this is not due to the
      mmu ops requiring this.
      
      Summing it up: calling xen_post_allocator_init() not directly after
      paging_init() was conceptually wrong in the beginning, it just didn't
      matter up to now as no functions used between the two calls needed
      some critical mmu ops (e.g. alloc_pte). This has changed now, so I
      corrected it.
      Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cdfa0bad
  19. 04 12月, 2014 4 次提交
    • J
      xen: switch to linear virtual mapped sparse p2m list · 054954eb
      Juergen Gross 提交于
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      054954eb
    • J
      xen: Hide get_phys_to_machine() to be able to tune common path · 0aad5689
      Juergen Gross 提交于
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0aad5689
    • J
      xen: Delay remapping memory of pv-domain · 1f3ac86b
      Juergen Gross 提交于
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      1f3ac86b
    • J
      xen: use common page allocation function in p2m.c · 7108c9ce
      Juergen Gross 提交于
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      7108c9ce
  20. 16 11月, 2014 1 次提交
  21. 04 11月, 2014 1 次提交
  22. 23 10月, 2014 1 次提交
    • J
      x86/xen: delay construction of mfn_list_list · 2c185687
      Juergen Gross 提交于
      The 3 level p2m tree for the Xen tools is constructed very early at
      boot by calling xen_build_mfn_list_list(). Memory needed for this tree
      is allocated via extend_brk().
      
      As this tree (other than the kernel internal p2m tree) is only needed
      for domain save/restore, live migration and crash dump analysis it
      doesn't matter whether it is constructed very early or just some
      milliseconds later when memory allocation is possible by other means.
      
      This patch moves the call of xen_build_mfn_list_list() just after
      calling xen_pagetable_p2m_copy() simplifying this function, too, as it
      doesn't have to bother with two parallel trees now. The same applies
      for some other internal functions.
      
      While simplifying code, make early_can_reuse_p2m_middle() static and
      drop the unused second parameter. p2m_mid_identity_mfn can be removed
      as well, it isn't used either.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2c185687
  23. 23 9月, 2014 1 次提交
  24. 10 9月, 2014 1 次提交
    • S
      x86/xen: don't copy bogus duplicate entries into kernel page tables · 0b5a5063
      Stefan Bader 提交于
      When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
      modules exceeds 512 MiB, then loading modules fails with a warning
      (and hence a vmalloc allocation failure) because the PTEs for the
      newly-allocated vmalloc address space are not zero.
      
        WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
                 vmap_page_range_noflush+0x2a1/0x360()
      
      This is caused by xen_setup_kernel_pagetables() copying
      level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
      entries.
      
      Without KASLR, the normal kernel image size only covers the first half
      of level2_kernel_pgt and module space starts after that.
      
      L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[  0..255]->kernel
                                                        [256..511]->module
                                [511]->level2_fixmap_pgt[  0..505]->module
      
      This allows 512 MiB of of module vmalloc space to be used before
      having to use the corrupted level2_fixmap_pgt entries.
      
      With KASLR enabled, the kernel image uses the full PUD range of 1G and
      module space starts in the level2_fixmap_pgt. So basically:
      
      L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
                                [511]->level2_fixmap_pgt[0..505]->module
      
      And now no module vmalloc space can be used without using the corrupt
      level2_fixmap_pgt entries.
      
      Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
      and setting level1_fixmap_pgt as read-only.
      
      A number of comments were also using the the wrong L3 offset for
      level2_kernel_pgt.  These have been corrected.
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      0b5a5063
  25. 27 5月, 2014 1 次提交
  26. 15 5月, 2014 1 次提交
  27. 06 5月, 2014 1 次提交
  28. 25 3月, 2014 1 次提交
    • D
      Revert "xen: properly account for _PAGE_NUMA during xen pte translations" · 5926f87f
      David Vrabel 提交于
      This reverts commit a9c8e4be.
      
      PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
      is set and pseudo-physical addresses is _PAGE_PRESENT is clear.
      
      This is because during a domain save/restore (migration) the page
      table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
      converted to PFNs during domain save so that on a restore the page
      table entries may be rewritten with the new MFNs on the destination.
      This canonicalisation is only done for PTEs that are present.
      
      This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
      _PAGE_NUMA) was set but _PAGE_PRESENT was clear.  These PTEs would be
      migrated as-is which would result in unexpected behaviour in the
      destination domain.  Either a) the MFN would be translated to the
      wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
      because the MFN is no longer owned by the domain; or c) the present
      bit would not get set.
      
      Symptoms include "Bad page" reports when munmapping after migrating a
      domain.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>        [3.12+]
      5926f87f
  29. 14 3月, 2014 1 次提交