1. 20 7月, 2012 7 次提交
  2. 01 6月, 2012 1 次提交
    • A
      xen/setup: filter APERFMPERF cpuid feature out · 5e626254
      Andre Przywara 提交于
      Xen PV kernels allow access to the APERF/MPERF registers to read the
      effective frequency. Access to the MSRs is however redirected to the
      currently scheduled physical CPU, making consecutive read and
      compares unreliable. In addition each rdmsr traps into the hypervisor.
      So to avoid bogus readouts and expensive traps, disable the kernel
      internal feature flag for APERF/MPERF if running under Xen.
      This will
      a) remove the aperfmperf flag from /proc/cpuinfo
      b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
         use the feature to improve scheduling (by default disabled)
      c) not mislead the cpufreq driver to use the MSRs
      
      This does not cover userland programs which access the MSRs via the
      device file interface, but this will be addressed separately.
      Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5e626254
  3. 31 5月, 2012 1 次提交
  4. 08 5月, 2012 4 次提交
  5. 07 5月, 2012 1 次提交
    • K
      xen/apic: Return the APIC ID (and version) for CPU 0. · 558daa28
      Konrad Rzeszutek Wilk 提交于
      On x86_64 on AMD machines where the first APIC_ID is not zero, we get:
      
      ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
      BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10
      BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10
      
      which means that when the ACPI processor driver loads and
      tries to parse the _Pxx states it fails to do as, as it
      ends up calling acpi_get_cpuid which does this:
      
      for_each_possible_cpu(i) {
              if (cpu_physical_id(i) == apic_id)
                      return i;
      }
      
      And the bootup CPU, has not been found so it fails and returns -1
      for the first CPU - which then subsequently in the loop that
      "acpi_processor_get_info" does results in returning an error, which
      means that "acpi_processor_add" failing and per_cpu(processor)
      is never set (and is NULL).
      
      That means that when xen-acpi-processor tries to load (much much
      later on) and parse the P-states it gets -ENODEV from
      acpi_processor_register_performance() (which tries to read
      the per_cpu(processor)) and fails to parse the data.
      Reported-by-and-Tested-by: NStefan Bader <stefan.bader@canonical.com>
      Suggested-by: NBoris Ostrovsky <boris.ostrovsky@amd.com>
      [v2: Bit-shift APIC ID by 24 bits]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      558daa28
  6. 02 5月, 2012 1 次提交
  7. 27 4月, 2012 1 次提交
    • K
      xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. · df88b2d9
      Konrad Rzeszutek Wilk 提交于
      There are exactly four users of __monitor and __mwait:
      
       - cstate.c (which allows acpi_processor_ffh_cstate_enter to be called
         when the cpuidle API drivers are used. However patch
         "cpuidle: replace xen access to x86 pm_idle and default_idle"
         provides a mechanism to disable the cpuidle and use safe_halt.
       - smpboot (which allows mwait_play_dead to be called). However
         safe_halt is always used so we skip that.
       - intel_idle (same deal as above).
       - acpi_pad.c. This the one that we do not want to run as we
         will hit the below crash.
      
      Why do we want to expose MWAIT_LEAF in the first place?
      We want it for the xen-acpi-processor driver - which uploads
      C-states to the hypervisor. If MWAIT_LEAF is set, the cstate.c
      sets the proper address in the C-states so that the hypervisor
      can benefit from using the MWAIT functionality. And that is
      the sole reason for using it.
      
      Without this patch, if a module performs mwait or monitor we
      get this:
      
      invalid opcode: 0000 [#1] SMP
      CPU 2
      .. snip..
      Pid: 5036, comm: insmod Tainted: G           O 3.4.0-rc2upstream-dirty #2 Intel Corporation S2600CP/S2600CP
      RIP: e030:[<ffffffffa000a017>]  [<ffffffffa000a017>] mwait_check_init+0x17/0x1000 [mwait_check]
      RSP: e02b:ffff8801c298bf18  EFLAGS: 00010282
      RAX: ffff8801c298a010 RBX: ffffffffa03b2000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff8801c29800d8 RDI: ffff8801ff097200
      RBP: ffff8801c298bf18 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      R13: ffffffffa000a000 R14: 0000005148db7294 R15: 0000000000000003
      FS:  00007fbb364f2700(0000) GS:ffff8801ff08c000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 000000000179f038 CR3: 00000001c9469000 CR4: 0000000000002660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process insmod (pid: 5036, threadinfo ffff8801c298a000, task ffff8801c29cd7e0)
      Stack:
       ffff8801c298bf48 ffffffff81002124 ffffffffa03b2000 00000000000081fd
       000000000178f010 000000000178f030 ffff8801c298bf78 ffffffff810c41e6
       00007fff3fb30db9 00007fff3fb30db9 00000000000081fd 0000000000010000
      Call Trace:
       [<ffffffff81002124>] do_one_initcall+0x124/0x170
       [<ffffffff810c41e6>] sys_init_module+0xc6/0x220
       [<ffffffff815b15b9>] system_call_fastpath+0x16/0x1b
      Code: <0f> 01 c8 31 c0 0f 01 c9 c9 c3 00 00 00 00 00 00 00 00 00 00 00 00
      RIP  [<ffffffffa000a017>] mwait_check_init+0x17/0x1000 [mwait_check]
       RSP <ffff8801c298bf18>
      ---[ end trace 16582fc8a3d1e29a ]---
      Kernel panic - not syncing: Fatal exception
      
      With this module (which is what acpi_pad.c would hit):
      
      MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>");
      MODULE_DESCRIPTION("mwait_check_and_back");
      MODULE_LICENSE("GPL");
      MODULE_VERSION();
      
      static int __init mwait_check_init(void)
      {
      	__monitor((void *)&current_thread_info()->flags, 0, 0);
      	__mwait(0, 0);
      	return 0;
      }
      static void __exit mwait_check_exit(void)
      {
      }
      module_init(mwait_check_init);
      module_exit(mwait_check_exit);
      Reported-by: NLiu, Jinsong <jinsong.liu@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      df88b2d9
  8. 29 3月, 2012 1 次提交
  9. 11 3月, 2012 1 次提交
    • K
      xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. · 73c154c6
      Konrad Rzeszutek Wilk 提交于
      For the hypervisor to take advantage of the MWAIT support it needs
      to extract from the ACPI _CST the register address. But the
      hypervisor does not have the support to parse DSDT so it relies on
      the initial domain (dom0) to parse the ACPI Power Management information
      and push it up to the hypervisor. The pushing of the data is done
      by the processor_harveset_xen module which parses the information that
      the ACPI parser has graciously exposed in 'struct acpi_processor'.
      
      For the ACPI parser to also expose the Cx states for MWAIT, we need
      to expose the MWAIT capability (leaf 1). Furthermore we also need to
      expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
      function.
      
      The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
      operations, but it can't do it since it needs to be backwards compatible.
      Instead we choose to use the native CPUID to figure out if the MWAIT
      capability exists and use the XEN_SET_PDC query hypercall to figure out
      if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
      
      Note: The XEN_SET_PDC query was implemented in c/s 23783:
      "ACPI: add _PDC input override mechanism".
      
      With this in place, instead of
       C3 ACPI IOPORT 415
      we get now
       C3:ACPI FFH INTEL MWAIT 0x20
      
      Note: The cpu_idle which would be calling the mwait variants for idling
      never gets set b/c we set the default pm_idle to be the hypercall variant.
      Acked-by: NJan Beulich <JBeulich@suse.com>
      [v2: Fix missing header file include and #ifdef]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73c154c6
  10. 20 2月, 2012 2 次提交
    • K
      xen/pat: Disable PAT support for now. · 8eaffa67
      Konrad Rzeszutek Wilk 提交于
      [Pls also look at https://lkml.org/lkml/2012/2/10/228]
      
      Using of PAT to change pages from WB to WC works quite nicely.
      Changing it back to WB - not so much. The crux of the matter is
      that the code that does this (__page_change_att_set_clr) has only
      limited information so when it tries to the change it gets
      the "raw" unfiltered information instead of the properly filtered one -
      and the "raw" one tell it that PSE bit is on (while infact it
      is not).  As a result when the PTE is set to be WB from WC, we get
      tons of:
      
      :WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
      :Hardware name: HP xw4400 Workstation
      .. snip..
      :Pid: 27, comm: kswapd0 Tainted: G        W    3.2.2-1.fc16.x86_64 #1
      :Call Trace:
      : [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
      : [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
      : [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
      : [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
      : [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
      : [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
      : [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
      : [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
      : [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
      : [<ffffffff8100a9b2>] ? check_events+0x12/0x20
      : [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
      : [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
      : [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
      : [<ffffffff8112ba04>] shrink_slab+0x154/0x310
      : [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
      : [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
      : [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
      : [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
      : [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
      : [<ffffffff8108fb6c>] kthread+0x8c/0xa0
      
      for every page. The proper fix for this is has been posted
      and is https://lkml.org/lkml/2012/2/10/228
      "x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
      along with a detailed description of the problem and solution.
      
      But since that posting has gone nowhere I am proposing
      this band-aid solution so that at least users don't get
      the page corruption (the pages that are WC don't get changed to WB
      and end up being recycled for filesystem or other things causing
      mysterious crashes).
      
      The negative impact of this patch is that users of WC flag
      (which are InfiniBand, radeon, nouveau drivers) won't be able
      to set that flag - so they are going to see performance degradation.
      But stability is more important here.
      
      Fixes RH BZ# 742032, 787403, and 745574
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8eaffa67
    • K
      xen/setup: Remove redundant filtering of PTE masks. · 416d7214
      Konrad Rzeszutek Wilk 提交于
      commit 7347b408 "xen: Allow
      unprivileged Xen domains to create iomap pages" added a redundant
      line in the early bootup code to filter out the PTE. That
      filtering is already done a bit earlier so this extra processing
      is not required.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      416d7214
  11. 25 1月, 2012 1 次提交
  12. 09 12月, 2011 1 次提交
    • T
      memblock: Kill memblock_init() · fe091c20
      Tejun Heo 提交于
      memblock_init() initializes arrays for regions and memblock itself;
      however, all these can be done with struct initializers and
      memblock_init() can be removed.  This patch kills memblock_init() and
      initializes memblock with struct initializer.
      
      The only difference is that the first dummy entries don't have .nid
      set to MAX_NUMNODES initially.  This doesn't cause any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      fe091c20
  13. 17 11月, 2011 1 次提交
  14. 20 10月, 2011 1 次提交
  15. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  16. 05 8月, 2011 1 次提交
  17. 19 7月, 2011 1 次提交
  18. 16 6月, 2011 1 次提交
  19. 06 6月, 2011 1 次提交
  20. 13 5月, 2011 1 次提交
  21. 06 4月, 2011 2 次提交
  22. 26 2月, 2011 2 次提交
  23. 12 2月, 2011 1 次提交
    • I
      xen: annotate functions which only call into __init at start of day · 44b46c3e
      Ian Campbell 提交于
      Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be
      called at resume time as well as at start of day but only reference
      __init functions (extend_brk) at start of day. Hence annotate with
      __ref.
      
          WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference
              from the function xen_hvm_init_shared_info() to the function
              .init.text:extend_brk()
          The function xen_hvm_init_shared_info() references
          the function __init extend_brk().
          This is often because xen_hvm_init_shared_info lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and
      initialises shared_info_page with the result. This happens at start of
      day only.
      
          WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference
              from the function xen_build_mfn_list_list() to the function
              .init.text:extend_brk()
          The function xen_build_mfn_list_list() references
          the function __init extend_brk().
          This is often because xen_build_mfn_list_list lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      (this warning occurs multiple times)
      
      xen_build_mfn_list_list only calls extend_brk() at boot time, while
      building the initial mfn list list
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      44b46c3e
  24. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6
  25. 10 1月, 2011 1 次提交
  26. 07 1月, 2011 1 次提交
  27. 17 12月, 2010 1 次提交
  28. 30 11月, 2010 1 次提交
    • I
      xen: x86/32: perform initial startup on initial_page_table · 805e3f49
      Ian Campbell 提交于
      Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
      (which also starts on initial_page_table) switches to it.  This helps ensure
      that the generic setup paths work on Xen unmodified. In particular
      clone_pgd_range writes directly to the destination pgd entries and is used to
      initialise swapper_pg_dir so we need to ensure that it remains writeable until
      the last possible moment during bring up.
      
      This is complicated slightly by the need to avoid sharing kernel PMD entries
      when running under Xen, therefore the Xen implementation must make a copy of
      the kernel PMD (which is otherwise referred to by both intial_page_table and
      swapper_pg_dir) before switching to swapper_pg_dir.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      805e3f49