1. 02 11月, 2017 1 次提交
  2. 10 10月, 2017 1 次提交
  3. 28 9月, 2017 3 次提交
    • Z
      xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping · 0d805ee7
      Zhenzhong Duan 提交于
      When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
      mapping overlaps with kernel module virtual space. When mapping in this space
      is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
      left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
      finish at 2MB boundary.
      
      When module loading is just on top of the 2MB space, got below warning:
      
      WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
      Call Trace:
       [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
       [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff8103ca54>] module_alloc+0x64/0x70
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
       [<ffffffff810ac9a7>] move_module+0x27/0x150
       [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
       [<ffffffff810af0a8>] load_module+0x78/0x640
       [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
       [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
       [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b
      
      Then the mapping of 2MB is cleared, finally oops when the page in that space is
      accessed.
      
      BUG: unable to handle kernel paging request at ffff880022600000
      IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
      PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
      Oops: 0002 [#1] SMP
      Call Trace:
       [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
       [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
       [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
       [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
       [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
       [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
       [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
       [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
       [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
       [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
       [<ffffffff81510c10>] do_page_fault+0x140/0x470
       [<ffffffff8150d7d5>] page_fault+0x25/0x30
      
      Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
      The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.
      
      -v2: add comment about XEN alignment from Juergen.
      
      References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.htmlSigned-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      
      [boris: added 'xen/mmu' tag to commit subject]
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      0d805ee7
    • J
      x86/xen: Add unwind hint annotations · abbe1cac
      Josh Poimboeuf 提交于
      Add unwind hint annotations to the xen head code so the ORC unwinder can
      read head_64.o.
      
      hypercall_page needs empty annotations at 32-byte intervals to match the
      'xen_hypercall_*' ELF functions at those locations.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/70ed2eb516fe9266be766d953f93c2571bca88cc.1505764066.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      abbe1cac
    • J
      x86/xen: Fix xen head ELF annotations · 2582d3df
      Josh Poimboeuf 提交于
      Mark the ends of the startup_xen and hypercall_page code sections.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3a80a394d30af43d9cefa1a29628c45ed8420c97.1505764066.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2582d3df
  4. 16 9月, 2017 1 次提交
    • A
      xen: x86: mark xen_find_pt_base as __init · 51ae2538
      Arnd Bergmann 提交于
      gcc-4.6 causes a harmless link-time warning:
      
      WARNING: vmlinux.o(.text.unlikely+0x48e): Section mismatch in reference from the function xen_find_pt_base() to the function .init.text:m2p()
      The function xen_find_pt_base() references
      the function __init m2p().
      This is often because xen_find_pt_base lacks a __init
      annotation or the annotation of m2p is wrong.
      
      Newer compilers inline this function, so it never shows up, but marking
      it __init is the right way to avoid the warning.
      
      Fixes: 70e61199 ("xen: move p2m list if conflicting with e820 map")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      51ae2538
  5. 13 9月, 2017 1 次提交
  6. 01 9月, 2017 1 次提交
  7. 31 8月, 2017 3 次提交
  8. 29 8月, 2017 2 次提交
  9. 24 8月, 2017 1 次提交
    • J
      x86/paravirt/xen: Remove xen_patch() · edcb5cf8
      Juergen Gross 提交于
      Xen's paravirt patch function xen_patch() does some special casing for
      irq_ops functions to apply relocations when those functions can be
      patched inline instead of calls.
      
      Unfortunately none of the special case function replacements is small
      enough to be patched inline, so the special case never applies.
      
      As xen_patch() will call paravirt_patch_default() in all cases it can
      be just dropped. xen-asm.h doesn't seem necessary without xen_patch()
      as the only thing left in it would be the definition of XEN_EFLAGS_NMI
      used only once. So move that definition and remove xen-asm.h.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: lguest@lists.ozlabs.org
      Cc: rusty@rustcorp.com.au
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20170816173157.8633-2-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      edcb5cf8
  10. 15 8月, 2017 1 次提交
  11. 11 8月, 2017 2 次提交
  12. 10 8月, 2017 1 次提交
  13. 23 7月, 2017 2 次提交
  14. 21 7月, 2017 2 次提交
  15. 18 7月, 2017 1 次提交
    • T
      xen/x86: Remove SME feature in PV guests · f2f931c6
      Tom Lendacky 提交于
      Xen does not currently support SME for PV guests. Clear the SME CPU
      capability in order to avoid any ambiguity.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: <xen-devel@lists.xen.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/3b605622a9fae5e588e5a13967120a18ec18071b.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2f931c6
  16. 13 7月, 2017 1 次提交
  17. 05 7月, 2017 2 次提交
    • A
      x86/mm: Enable CR4.PCIDE on supported systems · 660da7c9
      Andy Lutomirski 提交于
      We can use PCID if the CPU has PCID and PGE and we're not on Xen.
      
      By itself, this has no effect. A followup patch will start using PCID.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      660da7c9
    • A
      x86/mm: Rework lazy TLB mode and TLB freshness tracking · 94b1b03b
      Andy Lutomirski 提交于
      x86's lazy TLB mode used to be fairly weak -- it would switch to
      init_mm the first time it tried to flush a lazy TLB.  This meant an
      unnecessary CR3 write and, if the flush was remote, an unnecessary
      IPI.
      
      Rewrite it entirely.  When we enter lazy mode, we simply remove the
      CPU from mm_cpumask.  This means that we need a way to figure out
      whether we've missed a flush when we switch back out of lazy mode.
      I use the tlb_gen machinery to track whether a context is up to
      date.
      
      Note to reviewers: this patch, my itself, looks a bit odd.  I'm
      using an array of length 1 containing (ctx_id, tlb_gen) rather than
      just storing tlb_gen, and making it at array isn't necessary yet.
      I'm doing this because the next few patches add PCID support, and,
      with PCID, we need ctx_id, and the array will end up with a length
      greater than 1.  Making it an array now means that there will be
      less churn and therefore less stress on your eyeballs.
      
      NB: This is dubious but, AFAICT, still correct on Xen and UV.
      xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
      patch changes the way that mm_cpumask() works.  This should be okay,
      since Xen *also* iterates all online CPUs to find all the CPUs it
      needs to twiddle.
      
      The UV tlbflush code is rather dated and should be changed.
      
      Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
      (turbo off, intel_pstate requesting max performance) under KVM with
      the guest using idle=poll (to avoid artifacts when bouncing between
      CPUs).  I haven't done any real statistics here -- I just ran them
      in a loop and picked the fastest results that didn't look like
      outliers.  Unpatched means commit a4eb8b99, so all the
      bookkeeping overhead is gone.
      
      MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
      an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
      This is intended to be a nearly worst-case test.
      
        patched:         13.4µs
        unpatched:       21.6µs
      
      Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
      nrounds = 100, 256M data
      
        patched:         1.1 seconds or so
        unpatched:       1.9 seconds or so
      
      The sleepup on Vitaly's test appearss to be because it spends a lot
      of time blocked on mmap_sem, and this patch avoids sending IPIs to
      blocked CPUs.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      94b1b03b
  18. 03 7月, 2017 1 次提交
  19. 30 6月, 2017 1 次提交
  20. 25 6月, 2017 1 次提交
    • J
      xen: allocate page for shared info page from low memory · a5d5f328
      Juergen Gross 提交于
      In a HVM guest the kernel allocates the page for mapping the shared
      info structure via extend_brk() today. This will lead to a drop of
      performance as the underlying EPT entry will have to be split up into
      4kB entries as the single shared info page is located in hypervisor
      memory.
      
      The issue has been detected by using the libmicro munmap test:
      unmapping 8kB of memory was faster by nearly a factor of two when no
      pv interfaces were active in the HVM guest.
      
      So instead of taking a page from memory which might be mapped via
      large EPT entries use a page which is already mapped via a 4kB EPT
      entry: we can take a page from the first 1MB of memory as the video
      memory at 640kB disallows using larger EPT entries.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      a5d5f328
  21. 23 6月, 2017 2 次提交
  22. 20 6月, 2017 1 次提交
  23. 13 6月, 2017 7 次提交
    • A
      xen/vcpu: Handle xen_vcpu_setup() failure at boot · ae039001
      Ankur Arora 提交于
      On PVH, PVHVM, at failure in the VCPUOP_register_vcpu_info hypercall
      we limit the number of cpus to to MAX_VIRT_CPUS. However, if this
      failure had occurred for a cpu beyond MAX_VIRT_CPUS, we continue
      to function with > MAX_VIRT_CPUS.
      
      This leads to problems at the next save/restore cycle when there
      are > MAX_VIRT_CPUS threads going into stop_machine() but coming
      back up there's valid state for only the first MAX_VIRT_CPUS.
      
      This patch pulls the excess CPUs down via cpu_down().
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ae039001
    • A
      xen/vcpu: Handle xen_vcpu_setup() failure in hotplug · c9b5d98b
      Ankur Arora 提交于
      The hypercall VCPUOP_register_vcpu_info can fail. This failure is
      handled by making per_cpu(xen_vcpu, cpu) point to its shared_info
      slot and those without one (cpu >= MAX_VIRT_CPUS) be NULL.
      
      For PVH/PVHVM, this is not enough, because we also need to pull
      these VCPUs out of circulation.
      
      Fix for PVH/PVHVM: on registration failure in the cpuhp prepare
      callback (xen_cpu_up_prepare_hvm()), return an error to the cpuhp
      state-machine so it can fail the CPU init.
      
      Fix for PV: the registration happens before smp_init(), so, in the
      failure case we clamp setup_max_cpus and limit the number of VCPUs
      that smp_init() will bring-up to MAX_VIRT_CPUS.
      This is functionally correct but it makes the code a bit simpler
      if we get rid of this explicit clamping: for VCPUs that don't have
      valid xen_vcpu, fail the CPU init in the cpuhp prepare callback
      (xen_cpu_up_prepare_pv()).
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      c9b5d98b
    • A
      xen/pv: Fix OOPS on restore for a PV, !SMP domain · 0e4d5837
      Ankur Arora 提交于
      If CONFIG_SMP is disabled, xen_setup_vcpu_info_placement() is called from
      xen_setup_shared_info(). This is fine as far as boot goes, but it means
      that we also call it in the restore path. This results in an OOPS
      because we assign to pv_mmu_ops.read_cr2 which is __ro_after_init.
      
      Also, though less problematically, this means we call xen_vcpu_setup()
      twice at restore -- once from the vcpu info placement call and the
      second time from xen_vcpu_restore().
      
      Fix by calling xen_setup_vcpu_info_placement() at boot only.
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      0e4d5837
    • A
      xen/pvh*: Support > 32 VCPUs at domain restore · 0b64ffb8
      Ankur Arora 提交于
      When Xen restores a PVHVM or PVH guest, its shared_info only holds
      up to 32 CPUs. The hypercall VCPUOP_register_vcpu_info allows
      us to setup per-page areas for VCPUs. This means we can boot
      PVH* guests with more than 32 VCPUs. During restore the per-cpu
      structure is allocated freshly by the hypervisor (vcpu_info_mfn is
      set to INVALID_MFN) so that the newly restored guest can make a
      VCPUOP_register_vcpu_info hypercall.
      
      However, we end up triggering this condition in Xen:
      /* Run this command on yourself or on other offline VCPUS. */
       if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
      
      which means we are unable to setup the per-cpu VCPU structures
      for running VCPUS. The Linux PV code paths makes this work by
      iterating over cpu_possible in xen_vcpu_restore() with:
      
       1) is target CPU up (VCPUOP_is_up hypercall?)
       2) if yes, then VCPUOP_down to pause it
       3) VCPUOP_register_vcpu_info
       4) if it was down, then VCPUOP_up to bring it back up
      
      With Xen commit 192df6f9122d ("xen/x86: allow HVM guests to use
      hypercalls to bring up vCPUs") this is available for non-PV guests.
      As such first check if VCPUOP_is_up is actually possible before
      trying this dance.
      
      As most of this dance code is done already in xen_vcpu_restore()
      let's make it callable on PV, PVH and PVHVM.
      Based-on-patch-by: NKonrad Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      0b64ffb8
    • A
      xen/vcpu: Simplify xen_vcpu related code · ad73fd59
      Ankur Arora 提交于
      Largely mechanical changes to aid unification of xen_vcpu_restore()
      logic for PV, PVH and PVHVM.
      
      xen_vcpu_setup(): the only change in logic is that clamp_max_cpus()
      is now handled inside the "if (!xen_have_vcpu_info_placement)" block.
      
      xen_vcpu_restore(): code movement from enlighten_pv.c to enlighten.c.
      
      xen_vcpu_info_reset(): pulls together all the code where xen_vcpu
      is set to default.
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ad73fd59
    • K
      x86/boot/64: Rename init_level4_pgt and early_level4_pgt · 65ade2f8
      Kirill A. Shutemov 提交于
      With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables.
      
      Let's give these variable more generic names: init_top_pgt and
      early_top_pgt.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65ade2f8
    • A
      x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() · 6c690ee1
      Andy Lutomirski 提交于
      The kernel has several code paths that read CR3.  Most of them assume that
      CR3 contains the PGD's physical address, whereas some of them awkwardly
      use PHYSICAL_PAGE_MASK to mask off low bits.
      
      Add explicit mask macros for CR3 and convert all of the CR3 readers.
      This will keep them from breaking when PCID is enabled.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c690ee1
  24. 05 6月, 2017 1 次提交
    • A
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski 提交于
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3d28ebce