1. 22 4月, 2016 3 次提交
    • L
      x86, drivers/pnpbios: Replace paravirt_enabled() check with legacy device check · 80dfd83d
      Luis R. Rodriguez 提交于
      Since we are removing paravirt_enabled() replace it with a
      logical equivalent. Even though PNPBIOS is x86 specific we
      add an arch-specific type call, which can be implemented by
      any architecture to show how other legacy attribute devices
      can later be also checked for with other ACPI legacy attribute
      flags.
      
      This implicates the first ACPI 5.2.9.3 IA-PC Boot Architecture
      ACPI_FADT_LEGACY_DEVICES flag device, and shows how to add more.
      
      The reason pnpbios gets a defined structure and as such uses
      a different approach than the RTC legacy quirk is that ACPI
      has a respective RTC flag, while pnpbios does not. We fold
      the pnpbios quirk under ACPI_FADT_LEGACY_DEVICES ACPI flag
      use case, and use a struct of possible devices to enable
      future extensions of this.
      
      As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
      follows:
      
      TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
      +32     +28    +28         +28
      
      That's 4 byte overhead total, the rest is cleared out on init
      as its all __init text.
      
      v2: split out subarch handlng on switch to make it easier
          later to add other subarchs. The 'fall-through' switch
          handling can be confusing and we'll remove it later
          when we add handling for X86_SUBARCH_CE4100.
      v3: document vmlinux size impact as per 0-day, and also
          explain why pnpbios is treated differently than the
          RTC legacy feature.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: andriy.shevchenko@linux.intel.com
      Cc: bigeasy@linutronix.de
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: ffainelli@freebox.fr
      Cc: george.dunlap@citrix.com
      Cc: glin@suse.com
      Cc: jgross@suse.com
      Cc: jlee@suse.com
      Cc: josh@joshtriplett.org
      Cc: julien.grall@linaro.org
      Cc: konrad.wilk@oracle.com
      Cc: kozerkov@parallels.com
      Cc: lenb@kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-acpi@vger.kernel.org
      Cc: lv.zheng@intel.com
      Cc: matt@codeblueprint.co.uk
      Cc: mbizon@freebox.fr
      Cc: rjw@rjwysocki.net
      Cc: robert.moore@intel.com
      Cc: rusty@rustcorp.com.au
      Cc: tiwai@suse.de
      Cc: toshi.kani@hp.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1460592286-300-12-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80dfd83d
    • L
      x86/init: Use a platform legacy quirk for EBDA · 1330e3bc
      Luis R. Rodriguez 提交于
      This replaces the paravirt_enabled() check with a
      proper x86 legacy platform quirk.
      
      As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
      follows:
      
      TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
      +39     +35    +35         +25
      
      That's a 4 byte total overhead, the rest is all cleared out
      upon init as its all __init text.
      
      v2: document 0-day vmlinux size impact
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: andriy.shevchenko@linux.intel.com
      Cc: bigeasy@linutronix.de
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: ffainelli@freebox.fr
      Cc: george.dunlap@citrix.com
      Cc: glin@suse.com
      Cc: jgross@suse.com
      Cc: jlee@suse.com
      Cc: josh@joshtriplett.org
      Cc: julien.grall@linaro.org
      Cc: konrad.wilk@oracle.com
      Cc: kozerkov@parallels.com
      Cc: lenb@kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-acpi@vger.kernel.org
      Cc: lv.zheng@intel.com
      Cc: matt@codeblueprint.co.uk
      Cc: mbizon@freebox.fr
      Cc: rjw@rjwysocki.net
      Cc: robert.moore@intel.com
      Cc: rusty@rustcorp.com.au
      Cc: tiwai@suse.de
      Cc: toshi.kani@hp.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1460592286-300-7-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1330e3bc
    • L
      x86/rtc: Replace paravirt rtc check with platform legacy quirk · 8d152e7a
      Luis R. Rodriguez 提交于
      We have 4 types of x86 platforms that disable RTC:
      
        * Intel MID
        * Lguest - uses paravirt
        * Xen dom-U - uses paravirt
        * x86 on legacy systems annotated with an ACPI legacy flag
      
      We can consolidate all of these into a platform specific legacy
      quirk set early in boot through i386_start_kernel() and through
      x86_64_start_reservations(). This deals with the RTC quirks which
      we can rely on through the hardware subarch, the ACPI check can
      be dealt with separately.
      
      For Xen things are bit more complex given that the @X86_SUBARCH_XEN
      x86_hardware_subarch is shared on for Xen which uses the PV path for
      both domU and dom0. Since the semantics for differentiating between
      the two are Xen specific we provide a platform helper to help override
      default legacy features -- x86_platform.set_legacy_features(). Use
      of this helper is highly discouraged, its only purpose should be
      to account for the lack of semantics available within your given
      x86_hardware_subarch.
      
      As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
      follows:
      
      TOTAL   TEXT   init.text    x86_early_init_platform_quirks()
      +70     +62    +62          +43
      
      Only 8 bytes overhead total, as the main increase in size is
      all removed via __init.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: andriy.shevchenko@linux.intel.com
      Cc: bigeasy@linutronix.de
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: ffainelli@freebox.fr
      Cc: george.dunlap@citrix.com
      Cc: glin@suse.com
      Cc: jlee@suse.com
      Cc: josh@joshtriplett.org
      Cc: julien.grall@linaro.org
      Cc: konrad.wilk@oracle.com
      Cc: kozerkov@parallels.com
      Cc: lenb@kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-acpi@vger.kernel.org
      Cc: lv.zheng@intel.com
      Cc: matt@codeblueprint.co.uk
      Cc: mbizon@freebox.fr
      Cc: rjw@rjwysocki.net
      Cc: robert.moore@intel.com
      Cc: rusty@rustcorp.com.au
      Cc: tiwai@suse.de
      Cc: toshi.kani@hp.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8d152e7a
  2. 04 12月, 2015 1 次提交
    • K
      x86/mm: Fix regression with huge pages on PAE · 70f15287
      Kirill A. Shutemov 提交于
      Recent PAT patchset has caused issue on 32-bit PAE machines:
      
        page:eea45000 count:0 mapcount:-128 mapping:  (null) index:0x0 flags: 0x40000000()
        page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0)
        ------------[ cut here ]------------
        kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485!
        invalid opcode: 0000 [#1] SMP
        [...]
        Call Trace:
         unmap_single_vma
         ? __wake_up
         unmap_vmas
         unmap_region
         do_munmap
         vm_munmap
         SyS_munmap
         do_fast_syscall_32
         ? __do_page_fault
         sysenter_past_esp
        Code: ...
        EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98
      
      The problem is in pmd_pfn_mask() and pmd_flags_mask(). These
      helpers use PMD_PAGE_MASK to calculate resulting mask.
      PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as
      phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a
      result, the upper bits of resulting mask get truncated.
      
      pud_pfn_mask() and pud_flags_mask() aren't problematic since we
      don't have PUD page table level on 32-bit systems, but it's
      reasonable to keep them consistent with PMD counterpart.
      
      Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in
      addition to existing PHYSICAL_PAGE_MASK and reworks helpers to
      use them.
      Reported-and-Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      [ Fix -Woverflow warnings from the realmode code. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jürgen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: elliott@hpe.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Fixes: f70abb0f ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit")
      Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      70f15287
  3. 19 11月, 2015 1 次提交
  4. 24 4月, 2015 8 次提交
  5. 12 11月, 2014 1 次提交
  6. 13 12月, 2013 1 次提交
  7. 07 11月, 2013 1 次提交
    • K
      PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() · 0e4ccb15
      Konrad Rzeszutek Wilk 提交于
      Certain platforms do not allow writes in the MSI-X BARs to setup or tear
      down vector values.  To combat against the generic code trying to write to
      that and either silently being ignored or crashing due to the pagetables
      being marked R/O this patch introduces a platform override.
      
      Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
      and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
      and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
      code.
      
      For Xen, which does not allow the guest to write to MSI-X tables - as the
      hypervisor is solely responsible for setting the vector values - we
      implement two nops.
      
      This fixes a Xen guest crash when passing a PCI device with MSI-X to the
      guest.  See the bugzilla for more details.
      
      [bhelgaas: add bugzilla info]
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      0e4ccb15
  8. 29 5月, 2013 1 次提交
    • D
      x86: Increase precision of x86_platform.get/set_wallclock() · 3565184e
      David Vrabel 提交于
      All the virtualized platforms (KVM, lguest and Xen) have persistent
      wallclocks that have more than one second of precision.
      
      read_persistent_wallclock() and update_persistent_wallclock() allow
      for nanosecond precision but their implementation on x86 with
      x86_platform.get/set_wallclock() only allows for one second precision.
      This means guests may see a wallclock time that is off by up to 1
      second.
      
      Make set_wallclock() and get_wallclock() take a struct timespec
      parameter (which allows for nanosecond precision) so KVM and Xen
      guests may start with a more accurate wallclock time and a Xen dom0
      can maintain a more accurate wallclock for guests.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      3565184e
  9. 28 1月, 2013 7 次提交
  10. 18 11月, 2012 1 次提交
  11. 12 9月, 2012 4 次提交
  12. 06 6月, 2012 1 次提交
  13. 25 5月, 2012 1 次提交
  14. 02 5月, 2012 1 次提交
  15. 17 4月, 2012 1 次提交
  16. 20 3月, 2012 1 次提交
    • M
      x86: kvmclock: abstract save/restore sched_clock_state · b74f05d6
      Marcelo Tosatti 提交于
      Upon resume from hibernation, CPU 0's hvclock area contains the old
      values for system_time and tsc_timestamp. It is necessary for the
      hypervisor to update these values with uptodate ones before the CPU uses
      them.
      
      Abstract TSC's save/restore sched_clock_state functions and use
      restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
      
      Also move restore_sched_clock_state before __restore_processor_state,
      since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
      Thanks to Igor Mammedov for tracking it down.
      
      Fixes suspend-to-disk with kvmclock.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b74f05d6
  17. 05 3月, 2012 1 次提交
    • I
      x86: Introduce x86_cpuinit.early_percpu_clock_init hook · df156f90
      Igor Mammedov 提交于
      When kvm guest uses kvmclock, it may hang on vcpu hot-plug.
      This is caused by an overflow in pvclock_get_nsec_offset,
      
          u64 delta = tsc - shadow->tsc_timestamp;
      
      which in turn is caused by an undefined values from percpu
      hv_clock that hasn't been initialized yet.
      Uninitialized clock on being booted cpu is accessed from
         start_secondary
          -> smp_callin
            ->  smp_store_cpu_info
              -> identify_secondary_cpu
                -> mtrr_ap_init
                  -> mtrr_restore
                    -> stop_machine_from_inactive_cpu
                      -> queue_stop_cpus_work
                        ...
                          -> sched_clock
                            -> kvm_clock_read
      which is well before x86_cpuinit.setup_percpu_clockev call in
      start_secondary, where percpu clock is initialized.
      
      This patch introduces a hook that allows to setup/initialize
      per_cpu clock early and avoid overflow due to reading
        - undefined values
        - old values if cpu was offlined and then onlined again
      
      Another possible early user of this clock source is ftrace that
      accesses it to get timestamps for ring buffer entries. So if
      mtrr_ap_init is moved from identify_secondary_cpu to past
      x86_cpuinit.setup_percpu_clockev in start_secondary, ftrace
      may cause the same overflow/hang on cpu hot-plug anyway.
      
      More complete description of the problem:
        https://lkml.org/lkml/2012/2/2/101
      
      Credits to Marcelo Tosatti <mtosatti@redhat.com> for hook idea.
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      df156f90
  18. 07 1月, 2012 1 次提交
  19. 06 12月, 2011 1 次提交
  20. 10 11月, 2011 2 次提交
  21. 13 5月, 2011 1 次提交
    • S
      x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
      Stefano Stabellini 提交于
      Introduce a new x86_init hook called pagetable_reserve that at the end
      of init_memory_mapping is used to reserve a range of memory addresses for
      the kernel pagetable pages we used and free the other ones.
      
      On native it just calls memblock_x86_reserve_range while on xen it also
      takes care of setting the spare memory previously allocated
      for kernel pagetable pages from RO to RW, so that it can be used for
      other purposes.
      
      A detailed explanation of the reason why this hook is needed follows.
      
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      at some point init_memory_mapping is going to reach the pagetable pages
      area and map those pages too (mapping them as normal memory that falls
      in the range of addresses passed to init_memory_mapping as argument).
      Some of those pages are already pagetable pages (they are in the range
      pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
      everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason we need a hook to reserve the kernel pagetable pages we
      used and free the other ones so that they can be reused for other
      purposes.
      On native it just means calling memblock_x86_reserve_range, on Xen it
      also means marking RW the pagetable pages that we allocated before but
      that haven't been used before.
      
      Another way to fix this is without using the hook is by adding a 'if
      (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
      counterpart, but that is just nasty.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      279b706b