1. 13 7月, 2017 1 次提交
  2. 13 6月, 2017 2 次提交
  3. 05 6月, 2017 2 次提交
    • A
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski 提交于
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3d28ebce
    • A
      x86/mm: Pass flush_tlb_info to flush_tlb_others() etc · a2055abe
      Andy Lutomirski 提交于
      Rather than passing all the contents of flush_tlb_info to
      flush_tlb_others(), pass a pointer to the structure directly. For
      consistency, this also removes the unnecessary cpu parameter from
      uv_flush_tlb_others() to make its signature match the other
      *flush_tlb_others() functions.
      
      This serves two purposes:
      
       - It will dramatically simplify future patches that change struct
         flush_tlb_info, which I'm planning to do.
      
       - struct flush_tlb_info is an adequate description of what to do
         for a local flush, too, so by reusing it we can remove duplicated
         code between local and remove flushes in a future patch.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      [ Fix build warning. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a2055abe
  4. 19 5月, 2017 1 次提交
  5. 11 5月, 2017 1 次提交
  6. 02 5月, 2017 3 次提交
  7. 27 3月, 2017 2 次提交
  8. 16 3月, 2017 1 次提交
    • T
      x86: Remap GDT tables in the fixmap section · 69218e47
      Thomas Garnier 提交于
      Each processor holds a GDT in its per-cpu structure. The sgdt
      instruction gives the base address of the current GDT. This address can
      be used to bypass KASLR memory randomization. With another bug, an
      attacker could target other per-cpu structures or deduce the base of
      the main memory section (PAGE_OFFSET).
      
      This patch relocates the GDT table for each processor inside the
      fixmap section. The space is reserved based on number of supported
      processors.
      
      For consistency, the remapping is done by default on 32 and 64-bit.
      
      Each processor switches to its remapped GDT at the end of
      initialization. For hibernation, the main processor returns with the
      original GDT and switches back to the remapping at completion.
      
      This patch was tested on both architectures. Hibernation and KVM were
      both tested specially for their usage of the GDT.
      
      Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
      recommending changes for Xen support.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69218e47
  9. 02 3月, 2017 1 次提交
  10. 07 2月, 2017 1 次提交
  11. 28 1月, 2017 1 次提交
    • I
      x86/boot/e820: Move asm/e820.h to asm/e820/api.h · 66441bd3
      Ingo Molnar 提交于
      In line with asm/e820/types.h, move the e820 API declarations to
      asm/e820/api.h and update all usage sites.
      
      This is just a mechanical, obviously correct move & replace patch,
      there will be subsequent changes to clean up the code and to make
      better use of the new header organization.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66441bd3
  12. 14 7月, 2016 1 次提交
    • P
      x86/xen: Audit and remove any unnecessary uses of module.h · 7a2463dc
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace as needed.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20160714001901.31603-7-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a2463dc
  13. 23 6月, 2016 2 次提交
  14. 24 2月, 2016 1 次提交
  15. 14 12月, 2015 1 次提交
  16. 26 11月, 2015 1 次提交
  17. 28 10月, 2015 1 次提交
  18. 09 9月, 2015 1 次提交
  19. 20 8月, 2015 4 次提交
    • J
      xen: move p2m list if conflicting with e820 map · 70e61199
      Juergen Gross 提交于
      Check whether the hypervisor supplied p2m list is placed at a location
      which is conflicting with the target E820 map. If this is the case
      relocate it to a new area unused up to now and compliant to the E820
      map.
      
      As the p2m list might by huge (up to several GB) and is required to be
      mapped virtually, set up a temporary mapping for the copied list.
      
      For pvh domains just delete the p2m related information from start
      info instead of reserving the p2m memory, as we don't need it at all.
      
      For 32 bit kernels adjust the memblock_reserve() parameters in order
      to cover the page tables only. This requires to memblock_reserve() the
      start_info page on it's own.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      70e61199
    • J
      xen: add explicit memblock_reserve() calls for special pages · 6c2681c8
      Juergen Gross 提交于
      Some special pages containing interfaces to xen are being reserved
      implicitly only today. The memblock_reserve() call to reserve them is
      meant to reserve the p2m list supplied by xen. It is just reserving
      not only the p2m list itself, but some more pages up to the start of
      the xen built page tables.
      
      To be able to move the p2m list to another pfn range, which is needed
      for support of huge RAM, this memblock_reserve() must be split up to
      cover all affected reserved pages explicitly.
      
      The affected pages are:
      - start_info page
      - xenstore ring (might be missing, mfn is 0 in this case)
      - console ring (not for initial domain)
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      6c2681c8
    • J
      xen: check pre-allocated page tables for conflict with memory map · 04414baa
      Juergen Gross 提交于
      Check whether the page tables built by the domain builder are at
      memory addresses which are in conflict with the target memory map.
      If this is the case just panic instead of running into problems
      later.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      04414baa
    • J
      xen: eliminate scalability issues from initial mapping setup · 8f5b0c63
      Juergen Gross 提交于
      Direct Xen to place the initial P->M table outside of the initial
      mapping, as otherwise the 1G (implementation) / 2G (theoretical)
      restriction on the size of the initial mapping limits the amount
      of memory a domain can be handed initially.
      
      As the initial P->M table is copied rather early during boot to
      domain private memory and it's initial virtual mapping is dropped,
      the easiest way to avoid virtual address conflicts with other
      addresses in the kernel is to use a user address area for the
      virtual address of the initial P->M table. This allows us to just
      throw away the page tables of the initial mapping after the copy
      without having to care about address invalidation.
      
      It should be noted that this patch won't enable a pv-domain to USE
      more than 512 GB of RAM. It just enables it to be started with a
      P->M table covering more memory. This is especially important for
      being able to boot a Dom0 on a system with more than 512 GB memory.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Based-on-patch-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      8f5b0c63
  20. 15 4月, 2015 1 次提交
  21. 16 3月, 2015 2 次提交
    • D
      xen/privcmd: improve performance of MMAPBATCH_V2 · 4e8c0c8c
      David Vrabel 提交于
      Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map
      multiple frames at a time rather than one at a time, despite the pages
      being non-consecutive GFNs.
      
      xen_remap_foreign_mfn_array() is added which maps an array of GFNs
      (instead of a consecutive range of GFNs).
      
      Since per-frame errors are returned in an array, privcmd must set the
      MMAPBATCH_V1 error bits as part of the "report errors" phase, after
      all the frames are mapped.
      
      Migrate times are significantly improved (when using a PV toolstack
      domain).  For example, for an idle 12 GiB PV guest:
      
              Before     After
        real  0m38.179s  0m26.868s
        user  0m15.096s  0m13.652s
        sys   0m28.988s  0m18.732s
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      4e8c0c8c
    • D
      xen: unify foreign GFN map/unmap for auto-xlated physmap guests · 628c28ee
      David Vrabel 提交于
      Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and
      unmap foreign GFNs using the same method (updating the physmap).
      Unify the two arm and x86 implementations into one commont one.
      
      Note that on arm and arm64, the correct error code will be returned
      (instead of always -EFAULT) and map/unmap failure warnings are no
      longer printed.  These changes are required if the foreign domain is
      paging (-ENOENT failures are expected and must be propagated up to the
      caller).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      628c28ee
  22. 28 1月, 2015 2 次提交
  23. 11 12月, 2014 1 次提交
    • J
      xen: switch to post-init routines in xen mmu.c earlier · cdfa0bad
      Juergen Gross 提交于
      With the virtual mapped linear p2m list the post-init mmu operations
      must be used for setting up the p2m mappings, as in case of
      CONFIG_FLATMEM the init routines may trigger BUGs.
      
      paging_init() sets up all infrastructure needed to switch to the
      post-init mmu ops done by xen_post_allocator_init(). With the virtual
      mapped linear p2m list we need some mmu ops during setup of this list,
      so we have to switch to the correct mmu ops as soon as possible.
      
      The p2m list is usable from the beginning, just expansion requires to
      have established the new linear mapping. So the call of
      xen_remap_memory() had to be introduced, but this is not due to the
      mmu ops requiring this.
      
      Summing it up: calling xen_post_allocator_init() not directly after
      paging_init() was conceptually wrong in the beginning, it just didn't
      matter up to now as no functions used between the two calls needed
      some critical mmu ops (e.g. alloc_pte). This has changed now, so I
      corrected it.
      Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cdfa0bad
  24. 04 12月, 2014 4 次提交
    • J
      xen: switch to linear virtual mapped sparse p2m list · 054954eb
      Juergen Gross 提交于
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      054954eb
    • J
      xen: Hide get_phys_to_machine() to be able to tune common path · 0aad5689
      Juergen Gross 提交于
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0aad5689
    • J
      xen: Delay remapping memory of pv-domain · 1f3ac86b
      Juergen Gross 提交于
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      1f3ac86b
    • J
      xen: use common page allocation function in p2m.c · 7108c9ce
      Juergen Gross 提交于
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      7108c9ce
  25. 16 11月, 2014 1 次提交
  26. 04 11月, 2014 1 次提交