1. 29 3月, 2012 1 次提交
  2. 11 3月, 2012 1 次提交
    • K
      xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. · 73c154c6
      Konrad Rzeszutek Wilk 提交于
      For the hypervisor to take advantage of the MWAIT support it needs
      to extract from the ACPI _CST the register address. But the
      hypervisor does not have the support to parse DSDT so it relies on
      the initial domain (dom0) to parse the ACPI Power Management information
      and push it up to the hypervisor. The pushing of the data is done
      by the processor_harveset_xen module which parses the information that
      the ACPI parser has graciously exposed in 'struct acpi_processor'.
      
      For the ACPI parser to also expose the Cx states for MWAIT, we need
      to expose the MWAIT capability (leaf 1). Furthermore we also need to
      expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
      function.
      
      The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
      operations, but it can't do it since it needs to be backwards compatible.
      Instead we choose to use the native CPUID to figure out if the MWAIT
      capability exists and use the XEN_SET_PDC query hypercall to figure out
      if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
      
      Note: The XEN_SET_PDC query was implemented in c/s 23783:
      "ACPI: add _PDC input override mechanism".
      
      With this in place, instead of
       C3 ACPI IOPORT 415
      we get now
       C3:ACPI FFH INTEL MWAIT 0x20
      
      Note: The cpu_idle which would be calling the mwait variants for idling
      never gets set b/c we set the default pm_idle to be the hypercall variant.
      Acked-by: NJan Beulich <JBeulich@suse.com>
      [v2: Fix missing header file include and #ifdef]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73c154c6
  3. 20 2月, 2012 2 次提交
    • K
      xen/pat: Disable PAT support for now. · 8eaffa67
      Konrad Rzeszutek Wilk 提交于
      [Pls also look at https://lkml.org/lkml/2012/2/10/228]
      
      Using of PAT to change pages from WB to WC works quite nicely.
      Changing it back to WB - not so much. The crux of the matter is
      that the code that does this (__page_change_att_set_clr) has only
      limited information so when it tries to the change it gets
      the "raw" unfiltered information instead of the properly filtered one -
      and the "raw" one tell it that PSE bit is on (while infact it
      is not).  As a result when the PTE is set to be WB from WC, we get
      tons of:
      
      :WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
      :Hardware name: HP xw4400 Workstation
      .. snip..
      :Pid: 27, comm: kswapd0 Tainted: G        W    3.2.2-1.fc16.x86_64 #1
      :Call Trace:
      : [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
      : [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
      : [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
      : [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
      : [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
      : [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
      : [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
      : [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
      : [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
      : [<ffffffff8100a9b2>] ? check_events+0x12/0x20
      : [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
      : [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
      : [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
      : [<ffffffff8112ba04>] shrink_slab+0x154/0x310
      : [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
      : [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
      : [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
      : [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
      : [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
      : [<ffffffff8108fb6c>] kthread+0x8c/0xa0
      
      for every page. The proper fix for this is has been posted
      and is https://lkml.org/lkml/2012/2/10/228
      "x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
      along with a detailed description of the problem and solution.
      
      But since that posting has gone nowhere I am proposing
      this band-aid solution so that at least users don't get
      the page corruption (the pages that are WC don't get changed to WB
      and end up being recycled for filesystem or other things causing
      mysterious crashes).
      
      The negative impact of this patch is that users of WC flag
      (which are InfiniBand, radeon, nouveau drivers) won't be able
      to set that flag - so they are going to see performance degradation.
      But stability is more important here.
      
      Fixes RH BZ# 742032, 787403, and 745574
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8eaffa67
    • K
      xen/setup: Remove redundant filtering of PTE masks. · 416d7214
      Konrad Rzeszutek Wilk 提交于
      commit 7347b408 "xen: Allow
      unprivileged Xen domains to create iomap pages" added a redundant
      line in the early bootup code to filter out the PTE. That
      filtering is already done a bit earlier so this extra processing
      is not required.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      416d7214
  4. 25 1月, 2012 1 次提交
  5. 09 12月, 2011 1 次提交
    • T
      memblock: Kill memblock_init() · fe091c20
      Tejun Heo 提交于
      memblock_init() initializes arrays for regions and memblock itself;
      however, all these can be done with struct initializers and
      memblock_init() can be removed.  This patch kills memblock_init() and
      initializes memblock with struct initializer.
      
      The only difference is that the first dummy entries don't have .nid
      set to MAX_NUMNODES initially.  This doesn't cause any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      fe091c20
  6. 17 11月, 2011 1 次提交
  7. 20 10月, 2011 1 次提交
  8. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  9. 05 8月, 2011 1 次提交
  10. 19 7月, 2011 1 次提交
  11. 16 6月, 2011 1 次提交
  12. 06 6月, 2011 1 次提交
  13. 13 5月, 2011 1 次提交
  14. 06 4月, 2011 2 次提交
  15. 26 2月, 2011 2 次提交
  16. 12 2月, 2011 1 次提交
    • I
      xen: annotate functions which only call into __init at start of day · 44b46c3e
      Ian Campbell 提交于
      Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be
      called at resume time as well as at start of day but only reference
      __init functions (extend_brk) at start of day. Hence annotate with
      __ref.
      
          WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference
              from the function xen_hvm_init_shared_info() to the function
              .init.text:extend_brk()
          The function xen_hvm_init_shared_info() references
          the function __init extend_brk().
          This is often because xen_hvm_init_shared_info lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and
      initialises shared_info_page with the result. This happens at start of
      day only.
      
          WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference
              from the function xen_build_mfn_list_list() to the function
              .init.text:extend_brk()
          The function xen_build_mfn_list_list() references
          the function __init extend_brk().
          This is often because xen_build_mfn_list_list lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      (this warning occurs multiple times)
      
      xen_build_mfn_list_list only calls extend_brk() at boot time, while
      building the initial mfn list list
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      44b46c3e
  17. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6
  18. 10 1月, 2011 1 次提交
  19. 07 1月, 2011 1 次提交
  20. 17 12月, 2010 1 次提交
  21. 30 11月, 2010 2 次提交
    • I
      xen: x86/32: perform initial startup on initial_page_table · 805e3f49
      Ian Campbell 提交于
      Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
      (which also starts on initial_page_table) switches to it.  This helps ensure
      that the generic setup paths work on Xen unmodified. In particular
      clone_pgd_range writes directly to the destination pgd entries and is used to
      initialise swapper_pg_dir so we need to ensure that it remains writeable until
      the last possible moment during bring up.
      
      This is complicated slightly by the need to avoid sharing kernel PMD entries
      when running under Xen, therefore the Xen implementation must make a copy of
      the kernel PMD (which is otherwise referred to by both intial_page_table and
      swapper_pg_dir) before switching to swapper_pg_dir.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      805e3f49
    • J
      xen: don't bother to stop other cpus on shutdown/reboot · 31e323cc
      Jeremy Fitzhardinge 提交于
      Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
      no need to do it manually.
      
      In any case it will fail because all the IPI irqs have been pulled
      down by this point, so the cross-CPU calls will simply hang forever.
      
      Until change 76fac077 the function calls
      were not synchronously waited for, so this wasn't apparent.  However after
      that change the calls became synchronous leading to a hang on shutdown
      on multi-VCPU guests.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      Cc: Alok Kataria <akataria@vmware.com>
      31e323cc
  22. 25 11月, 2010 1 次提交
    • I
      xen: x86/32: perform initial startup on initial_page_table · 5b5c1af1
      Ian Campbell 提交于
      Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
      (which also starts on initial_page_table) switches to it.  This helps ensure
      that the generic setup paths work on Xen unmodified. In particular
      clone_pgd_range writes directly to the destination pgd entries and is used to
      initialise swapper_pg_dir so we need to ensure that it remains writeable until
      the last possible moment during bring up.
      
      This is complicated slightly by the need to avoid sharing kernel PMD entries
      when running under Xen, therefore the Xen implementation must make a copy of
      the kernel PMD (which is otherwise referred to by both intial_page_table and
      swapper_pg_dir) before switching to swapper_pg_dir.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5b5c1af1
  23. 23 11月, 2010 1 次提交
    • K
      xen: set IO permission early (before early_cpu_init()) · ec35a69c
      Konrad Rzeszutek Wilk 提交于
      This patch is based off "xen dom0: Set up basic IO permissions for dom0."
      by Juan Quintela <quintela@redhat.com>.
      
      On AMD machines when we boot the kernel as Domain 0 we get this nasty:
      
      mapping kernel into physical memory
      Xen: setup ISA identity maps
      about to get started...
      (XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1-101116  x86_64  debug=y  Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff8130271b>]
      (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
      (XEN) rax: 000000008000c068   rbx: ffffffff8186c680   rcx: 0000000000000068
      (XEN) rdx: 0000000000000cf8   rsi: 000000000000c000   rdi: 0000000000000000
      (XEN) rbp: ffffffff81801e98   rsp: ffffffff81801e50   r8:  ffffffff81801eac
      (XEN) r9:  ffffffff81801ea8   r10: ffffffff81801eb4   r11: 00000000ffffffff
      (XEN) r12: ffffffff8186c694   r13: ffffffff81801f90   r14: ffffffffffffffff
      (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000221803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801e50:
      
      RIP points to read_pci_config() function.
      
      The issue is that we don't set IO permissions for the Linux kernel early enough.
      
      The call sequence used to be:
      
          xen_start_kernel()
      	x86_init.oem.arch_setup = xen_setup_arch;
              setup_arch:
                 - early_cpu_init
                     - early_init_amd
                        - read_pci_config
                 - x86_init.oem.arch_setup [ xen_arch_setup ]
                     - set IO permissions.
      
      We need to set the IO permissions earlier on, which this patch does.
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ec35a69c
  24. 13 11月, 2010 1 次提交
  25. 28 10月, 2010 1 次提交
  26. 23 10月, 2010 5 次提交
  27. 22 10月, 2010 1 次提交
    • A
      x86, kexec: Make sure to stop all CPUs before exiting the kernel · 76fac077
      Alok Kataria 提交于
      x86 smp_ops now has a new op, stop_other_cpus which takes a parameter
      "wait" this allows the caller to specify if it wants to stop until all
      the cpus have processed the stop IPI.  This is required specifically
      for the kexec case where we should wait for all the cpus to be stopped
      before starting the new kernel.  We now wait for the cpus to stop in
      all cases except for panic/kdump where we expect things to be broken
      and we are doing our best to make things work anyway.
      
      This patch fixes a legitimate regression, which was introduced during
      2.6.30, by commit id 4ef702c1.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: <stable@kernel.org> v2.6.30-36
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      76fac077
  28. 18 10月, 2010 1 次提交
  29. 12 10月, 2010 1 次提交
  30. 05 8月, 2010 3 次提交