1. 11 10月, 2013 1 次提交
  2. 10 10月, 2013 2 次提交
    • F
      xen: Fix possible user space selector corruption · 7cde9b27
      Frediano Ziglio 提交于
      Due to the way kernel is initialized under Xen is possible that the
      ring1 selector used by the kernel for the boot cpu end up to be copied
      to userspace leading to segmentation fault in the userspace.
      
      Xen code in the kernel initialize no-boot cpus with correct selectors (ds
      and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
      On task context switch (switch_to) we assume that ds, es and cs already
      point to __USER_DS and __KERNEL_CSso these selector are not changed.
      
      If processor is an Intel that support sysenter instruction sysenter/sysexit
      is used so ds and es are not restored switching back from kernel to
      userspace. In the case the selectors point to a ring1 instead of __USER_DS
      the userspace code will crash on first memory access attempt (to be
      precise Xen on the emulated iret used to do sysexit will detect and set ds
      and es to zero which lead to GPF anyway).
      
      Now if an userspace process call kernel using sysenter and get rescheduled
      (for me it happen on a specific init calling wait4) could happen that the
      ring1 selector is set to ds and es.
      
      This is quite hard to detect cause after a while these selectors are fixed
      (__USER_DS seems sticky).
      
      Bisecting the code commit 7076aada appears
      to be the first one that have this issue.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      7cde9b27
    • G
      KVM: nVMX: fix shadow on EPT · d0d538b9
      Gleb Natapov 提交于
      72f85795 broke shadow on EPT. This patch reverts it and fixes PAE
      on nEPT (which reverted commit fixed) in other way.
      
      Shadow on EPT is now broken because while L1 builds shadow page table
      for L2 (which is PAE while L2 is in real mode) it never loads L2's
      GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
      virtualization HW does this during guest entry if EPT is disabled,
      but in our case L0 emulates L2's vmentry while EPT is enables, so we
      cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
      and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
      is doing, but by clearing cache bits during L2 vmentry we drop values
      that kvm_set_cr3() read from memory.
      
      So why the same code does not work for PAE on nEPT? kvm_set_cr3()
      reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
      vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
      uses vcpu->arch.mmu which contain incorrect values. Fix that by using
      walk_mmu in ept_(load|save)_pdptrs.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d0d538b9
  3. 06 10月, 2013 1 次提交
  4. 05 10月, 2013 2 次提交
  5. 04 10月, 2013 1 次提交
  6. 03 10月, 2013 1 次提交
    • D
      x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning · 29d274b8
      David Herrmann 提交于
      IORESOURCE_BUSY is used to mark temporary driver mem-resources
      instead of global regions. This suppresses warnings if regions
      overlap with a region marked as BUSY.
      
      This was always the case for VESA/VGA/EFI framebuffer regions so
      do the same for simplefb regions. The reason we do this is to
      allow device handover to real GPU drivers like
      i915/radeon/nouveau which get the same regions via PCI BARs.
      
      Maybe at some point we will be able to unregister platform
      devices properly during the handover. In this case the simplefb
      region would get removed before the new region is created.
      However, this is currently not the case and would require rather
      huge changes in remove_conflicting_framebuffers(). Add the BUSY
      marker now and try to eventually rewrite the handover for a next release.
      
      Also see kernel/resource.c for more information:
      
        /*
         * if a resource is "BUSY", it's not a hardware resource
         * but a driver mapping of such a resource; we don't want
         * to warn for those; some drivers legitimately map only
         * partial hardware resources. (example: vesafb)
         */
      
      This suppresses warnings like:
      
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
        Info: mapping multiple BARs. Your kernel is fine.
        Call Trace:
          dump_stack+0x54/0x8d
          warn_slowpath_common+0x7d/0xa0
          warn_slowpath_fmt+0x4c/0x50
          iomem_map_sanity_check+0xac/0xe0
          __ioremap_caller+0x2e3/0x390
          ioremap_wc+0x32/0x40
          i915_driver_load+0x670/0xf50 [i915]
          ...
      Reported-by: NTom Gundersen <teg@jklm.no>
      Tested-by: NTom Gundersen <teg@jklm.no>
      Tested-by: NPavel Roskin <proski@gnu.org>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29d274b8
  7. 02 10月, 2013 1 次提交
  8. 28 9月, 2013 1 次提交
  9. 27 9月, 2013 1 次提交
  10. 25 9月, 2013 4 次提交
    • D
      xen/p2m: check MFN is in range before using the m2p table · 0160676b
      David Vrabel 提交于
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      --
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      0160676b
    • G
      KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined · bcd1c294
      Gleb Natapov 提交于
      Bit 12 is undefined in any of the following cases:
      - If the "NMI exiting" VM-execution control is 1 and the "virtual NMIs"
        VM-execution control is 0.
      - If the VM exit sets the valid bit in the IDT-vectoring information field
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      [Add parentheses around & within && - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bcd1c294
    • D
      x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround · 7a20c2fa
      Dave Jones 提交于
      This seems to have been copied from the Optiplex 990 entry
      above, but somoene forgot to change the ident text.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a20c2fa
    • K
      xen: Do not enable spinlocks before jump_label_init() has executed · a945928e
      Konrad Rzeszutek Wilk 提交于
      xen_init_spinlocks() currently calls static_key_slow_inc() before
      jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
      the case) the effect of this static_key_slow_inc() is deferred until after
      jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
      which case the key is set immediately. Thus, depending on the value of config
      option, we may observe different behavior.
      
      In addition, when we come to __jump_label_transform() from jump_label_init(),
      the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
      ideal_nop is not the same as default_nop this will cause a BUG() since it is
      expected that before a key is enabled the latter is replaced by the former
      during initialization.
      
      To address this problem we need to move
      static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
      after jump_label_init(). We also need to make sure that this is done before
      other cpus start to boot. early_initcall appears to be  a good place to do so.
      (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
      need to be set before alternative_instructions() runs.)
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v2: Added extra comments in the code]
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      a945928e
  11. 23 9月, 2013 2 次提交
  12. 20 9月, 2013 2 次提交
  13. 18 9月, 2013 2 次提交
    • J
      x86, efi: Don't map Boot Services on i386 · 70087011
      Josh Boyer 提交于
      Add patch to fix 32bit EFI service mapping (rhbz 726701)
      
      Multiple people are reporting hitting the following WARNING on i386,
      
        WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440()
        Modules linked in:
        Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95
        Call Trace:
         [<c102b6af>] warn_slowpath_common+0x5f/0x80
         [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
         [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
         [<c102b6ed>] warn_slowpath_null+0x1d/0x20
         [<c1023fb3>] __ioremap_caller+0x3d3/0x440
         [<c106007b>] ? get_usage_chars+0xfb/0x110
         [<c102d937>] ? vprintk_emit+0x147/0x480
         [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
         [<c102406a>] ioremap_cache+0x1a/0x20
         [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
         [<c1418593>] efi_enter_virtual_mode+0x1e4/0x3de
         [<c1407984>] start_kernel+0x286/0x2f4
         [<c1407535>] ? repair_env_string+0x51/0x51
         [<c1407362>] i386_start_kernel+0x12c/0x12f
      
      Due to the workaround described in commit 916f676f ("x86, efi: Retain
      boot service code until after switching to virtual mode") EFI Boot
      Service regions are mapped for a period during boot. Unfortunately, with
      the limited size of the i386 direct kernel map it's possible that some
      of the Boot Service regions will not be directly accessible, which
      causes them to be ioremap()'d, triggering the above warning as the
      regions are marked as E820_RAM in the e820 memmap.
      
      There are currently only two situations where we need to map EFI Boot
      Service regions,
      
        1. To workaround the firmware bug described in 916f676f
        2. To access the ACPI BGRT image
      
      but since we haven't seen an i386 implementation that requires either,
      this simple fix should suffice for now.
      
      [ Added to changelog - Matt ]
      Reported-by: NBryan O'Donoghue <bryan.odonoghue.lkml@nexus-software.ie>
      Acked-by: NTom Zanussi <tom.zanussi@intel.com>
      Acked-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      70087011
    • G
      KVM: VMX: set "blocked by NMI" flag if EPT violation happens during IRET from NMI · 0be9c7a8
      Gleb Natapov 提交于
      Set "blocked by NMI" flag if EPT violation happens during IRET from NMI
      otherwise NMI can be called recursively causing stack corruption.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      0be9c7a8
  14. 17 9月, 2013 3 次提交
  15. 14 9月, 2013 2 次提交
  16. 13 9月, 2013 4 次提交
  17. 12 9月, 2013 6 次提交
  18. 11 9月, 2013 1 次提交
    • D
      shrinker: convert remaining shrinkers to count/scan API · 70534a73
      Dave Chinner 提交于
      Convert the remaining couple of random shrinkers in the tree to the new
      API.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      70534a73
  19. 10 9月, 2013 3 次提交