1. 20 3月, 2019 1 次提交
  2. 21 2月, 2019 1 次提交
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 15248258
      Sean Christopherson 提交于
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15248258
  3. 20 2月, 2019 3 次提交
  4. 08 2月, 2019 1 次提交
  5. 07 2月, 2019 1 次提交
  6. 05 1月, 2019 1 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
  7. 21 12月, 2018 1 次提交
  8. 20 12月, 2018 1 次提交
    • C
      KVM: arm/arm64: Fix unintended stage 2 PMD mappings · 6794ad54
      Christoffer Dall 提交于
      There are two things we need to take care of when we create block
      mappings in the stage 2 page tables:
      
        (1) The alignment within a PMD between the host address range and the
        guest IPA range must be the same, since otherwise we end up mapping
        pages with the wrong offset.
      
        (2) The head and tail of a memory slot may not cover a full block
        size, and we have to take care to not map those with block
        descriptors, since we could expose memory to the guest that the host
        did not intend to expose.
      
      So far, we have been taking care of (1), but not (2), and our commentary
      describing (1) was somewhat confusing.
      
      This commit attempts to factor out the checks of both into a common
      function, and if we don't pass the check, we won't attempt any PMD
      mappings for neither hugetlbfs nor THP.
      
      Note that we used to only check the alignment for THP, not for
      hugetlbfs, but as far as I can tell the check needs to be applied to
      both scenarios.
      
      Cc: Ralph Palutke <ralph.palutke@fau.de>
      Cc: Lukas Braun <koomi@moshbit.net>
      Reported-by: NLukas Braun <koomi@moshbit.net>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      6794ad54
  9. 18 12月, 2018 9 次提交
  10. 03 10月, 2018 1 次提交
  11. 01 10月, 2018 3 次提交
  12. 28 9月, 2018 1 次提交
  13. 07 9月, 2018 2 次提交
  14. 13 8月, 2018 2 次提交
  15. 09 7月, 2018 4 次提交
  16. 21 6月, 2018 1 次提交
    • J
      KVM: arm/arm64: add WARN_ON if size is not PAGE_SIZE aligned in unmap_stage2_range · 47a91b72
      Jia He 提交于
      There is a panic in armv8a server(QDF2400) under memory pressure tests
      (start 20 guests and run memhog in the host).
      
      ---------------------------------begin--------------------------------
      [35380.800950] BUG: Bad page state in process qemu-kvm  pfn:dd0b6
      [35380.805825] page:ffff7fe003742d80 count:-4871 mapcount:-2126053375
      mapping:          (null) index:0x0
      [35380.815024] flags: 0x1fffc00000000000()
      [35380.818845] raw: 1fffc00000000000 0000000000000000 0000000000000000
      ffffecf981470000
      [35380.826569] raw: dead000000000100 dead000000000200 ffff8017c001c000
      0000000000000000
      [35380.805825] page:ffff7fe003742d80 count:-4871 mapcount:-2126053375
      mapping:          (null) index:0x0
      [35380.815024] flags: 0x1fffc00000000000()
      [35380.818845] raw: 1fffc00000000000 0000000000000000 0000000000000000
      ffffecf981470000
      [35380.826569] raw: dead000000000100 dead000000000200 ffff8017c001c000
      0000000000000000
      [35380.834294] page dumped because: nonzero _refcount
      [...]
      --------------------------------end--------------------------------------
      
      The root cause might be what was fixed at [1]. But from the KVM points of
      view, it would be better if the issue was caught earlier.
      
      If the size is not PAGE_SIZE aligned, unmap_stage2_range might unmap the
      wrong(more or less) page range. Hence it caused the "BUG: Bad page
      state"
      
      Let's WARN in that case, so that the issue is obvious.
      
      [1] https://lkml.org/lkml/2018/5/3/1042Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: jia.he@hxt-semitech.com
      [maz: tidied up commit message]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      47a91b72
  17. 25 4月, 2018 1 次提交
    • E
      signal: Ensure every siginfo we send has all bits initialized · 3eb0f519
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure every stack allocated siginfo is properly
      initialized before being passed to the signal sending functions.
      
      Note: It is not safe to depend on C initializers to initialize struct
      siginfo on the stack because C is allowed to skip holes when
      initializing a structure.
      
      The initialization of struct siginfo in tracehook_report_syscall_exit
      was moved from the helper user_single_step_siginfo into
      tracehook_report_syscall_exit itself, to make it clear that the local
      variable siginfo gets fully initialized.
      
      In a few cases the scope of struct siginfo has been reduced to make it
      clear that siginfo siginfo is not used on other paths in the function
      in which it is declared.
      
      Instances of using memset to initialize siginfo have been replaced
      with calls clear_siginfo for clarity.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3eb0f519
  18. 19 3月, 2018 6 次提交
    • M
      arm/arm64: KVM: Introduce EL2-specific executable mappings · dc2e4633
      Marc Zyngier 提交于
      Until now, all EL2 executable mappings were derived from their
      EL1 VA. Since we want to decouple the vectors mapping from
      the rest of the hypervisor, we need to be able to map some
      text somewhere else.
      
      The "idmap" region (for lack of a better name) is ideally suited
      for this, as we have a huge range that hardly has anything in it.
      
      Let's extend the IO allocator to also deal with executable mappings,
      thus providing the required feature.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      dc2e4633
    • M
      arm64: KVM: Introduce EL2 VA randomisation · ed57cac8
      Marc Zyngier 提交于
      The main idea behind randomising the EL2 VA is that we usually have
      a few spare bits between the most significant bit of the VA mask
      and the most significant bit of the linear mapping.
      
      Those bits could be a bunch of zeroes, and could be useful
      to move things around a bit. Of course, the more memory you have,
      the less randomisation you get...
      
      Alternatively, these bits could be the result of KASLR, in which
      case they are already random. But it would be nice to have a
      *different* randomization, just to make the job of a potential
      attacker a bit more difficult.
      
      Inserting these random bits is a bit involved. We don't have a spare
      register (short of rewriting all the kern_hyp_va call sites), and
      the immediate we want to insert is too random to be used with the
      ORR instruction. The best option I could come up with is the following
      sequence:
      
      	and x0, x0, #va_mask
      	ror x0, x0, #first_random_bit
      	add x0, x0, #(random & 0xfff)
      	add x0, x0, #(random >> 12), lsl #12
      	ror x0, x0, #(63 - first_random_bit)
      
      making it a fairly long sequence, but one that a decent CPU should
      be able to execute without breaking a sweat. It is of course NOPed
      out on VHE. The last 4 instructions can also be turned into NOPs
      if it appears that there is no free bits to use.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ed57cac8
    • M
      KVM: arm/arm64: Move HYP IO VAs to the "idmap" range · e3f019b3
      Marc Zyngier 提交于
      We so far mapped our HYP IO (which is essentially the GICv2 control
      registers) using the same method as for memory. It recently appeared
      that is a bit unsafe:
      
      We compute the HYP VA using the kern_hyp_va helper, but that helper
      is only designed to deal with kernel VAs coming from the linear map,
      and not from the vmalloc region... This could in turn cause some bad
      aliasing between the two, amplified by the upcoming VA randomisation.
      
      A solution is to come up with our very own basic VA allocator for
      MMIO. Since half of the HYP address space only contains a single
      page (the idmap), we have plenty to borrow from. Let's use the idmap
      as a base, and allocate downwards from it. GICv2 now lives on the
      other side of the great VA barrier.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e3f019b3
    • M
      KVM: arm64: Fix HYP idmap unmap when using 52bit PA · 3ddd4556
      Marc Zyngier 提交于
      Unmapping the idmap range using 52bit PA is quite broken, as we
      don't take into account the right number of PGD entries, and rely
      on PTRS_PER_PGD. The result is that pgd_index() truncates the
      address, and we end-up in the weed.
      
      Let's introduce a new unmap_hyp_idmap_range() that knows about this,
      together with a kvm_pgd_index() helper, which hides a bit of the
      complexity of the issue.
      
      Fixes: 98732d1b ("KVM: arm/arm64: fix HYP ID map extension to 52 bits")
      Reported-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      3ddd4556
    • M
      KVM: arm/arm64: Fix idmap size and alignment · 46fef158
      Marc Zyngier 提交于
      Although the idmap section of KVM can only be at most 4kB and
      must be aligned on a 4kB boundary, the rest of the code expects
      it to be page aligned. Things get messy when tearing down the
      HYP page tables when PAGE_SIZE is 64K, and the idmap section isn't
      64K aligned.
      
      Let's fix this by computing aligned boundaries that the HYP code
      will use.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      46fef158
    • M
      KVM: arm/arm64: Keep GICv2 HYP VAs in kvm_vgic_global_state · 1bb32a44
      Marc Zyngier 提交于
      As we're about to change the way we map devices at HYP, we need
      to move away from kern_hyp_va on an IO address.
      
      One way of achieving this is to store the VAs in kvm_vgic_global_state,
      and use that directly from the HYP code. This requires a small change
      to create_hyp_io_mappings so that it can also return a HYP VA.
      
      We take this opportunity to nuke the vctrl_base field in the emulated
      distributor, as it is not used anymore.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1bb32a44