1. 05 8月, 2013 1 次提交
    • J
      x86: Correctly detect hypervisor · 9df56f19
      Jason Wang 提交于
      We try to handle the hypervisor compatibility mode by detecting hypervisor
      through a specific order. This is not robust, since hypervisors may implement
      each others features.
      
      This patch tries to handle this situation by always choosing the last one in the
      CPUID leaves. This is done by letting .detect() return a priority instead of
      true/false and just re-using the CPUID leaf where the signature were found as
      the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
      has the highest priority. Other sophisticated detection method could also be
      implemented on top.
      
      Suggested by H. Peter Anvin and Paolo Bonzini.
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Doug Covelli <dcovelli@vmware.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9df56f19
  2. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  3. 07 6月, 2013 1 次提交
  4. 08 5月, 2013 2 次提交
    • Z
      xen: mask x2APIC feature in PV · 4ea9b9ac
      Zhenzhong Duan 提交于
      On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below.
      
          SysRq : Show backtrace of all active CPUs
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120
          Call Trace:
           [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160
           [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20
           [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0
           [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50
           [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10
           [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140
           [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140
           [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60
           [<ffffffff811ca996>] proc_reg_write+0x86/0xc0
           [<ffffffff8116dd8e>] vfs_write+0xce/0x190
           [<ffffffff8116e3e5>] sys_write+0x55/0x90
           [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b
      
      That's because apic points to apic_x2apic_cluster or apic_x2apic_phys
      but the basic element like cpumask isn't initialized.
      
      Mask x2APIC feature in pvm to avoid overwrite of apic pointer,
      update commit message per Konrad's suggestion.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Tested-by: NTamon Shiose <tamon.shiose@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4ea9b9ac
    • K
      xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info · d5b17dbf
      Konrad Rzeszutek Wilk 提交于
      As it will point to some data, but not event channel data (the
      shared_info has an array limited to 32).
      
      This means that for PVHVM guests with more than 32 VCPUs without
      the usage of VCPUOP_register_info any interrupts to VCPUs
      larger than 32 would have gone unnoticed during early bootup.
      
      That is OK, as during early bootup, in smp_init we end up calling
      the hotplug mechanism (xen_hvm_cpu_notify) which makes the
      VCPUOP_register_vcpu_info call for all VCPUs and we can receive
      interrupts on VCPUs 33 and further.
      
      This is just a cleanup.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5b17dbf
  5. 07 5月, 2013 1 次提交
  6. 06 5月, 2013 1 次提交
    • K
      xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · 7f1fc268
      Konrad Rzeszutek Wilk 提交于
      If a user did:
      
      	echo 0 > /sys/devices/system/cpu/cpu1/online
      	echo 1 > /sys/devices/system/cpu/cpu1/online
      
      we would (this a build with DEBUG enabled) get to:
      smpboot: ++++++++++++++++++++=_---CPU UP  1
      .. snip..
      smpboot: Stack at about ffff880074c0ff44
      smpboot: CPU1: has booted.
      
      and hang. The RCU mechanism would kick in an try to IPI the CPU1
      but the IPIs (and all other interrupts) would never arrive at the
      CPU1. At first glance at least. A bit digging in the hypervisor
      trace shows that (using xenanalyze):
      
      [vla] d4v1 vec 243 injecting
         0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043163639 --|x d4v1 vmentry cycles 1468
      ]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043164913 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
         0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043165526 --|x d4v1 vmentry cycles 1472
      ]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043166800 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
      
      there is a pending event (subsequent debugging shows it is the IPI
      from the VCPU0 when smpboot.c on VCPU1 has done
      "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
      interrupted with the callback IPI (0xf3 aka 243) which ends up calling
      __xen_evtchn_do_upcall.
      
      The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
      the pending events. And the moment the guest does a 'cli' (that is the
      ffffffff81673254 in the log above) the hypervisor is invoked again to
      inject the IPI (0xf3) to tell the guest it has pending interrupts.
      This repeats itself forever.
      
      The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
      we set each per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
      to register per-CPU  structures (xen_vcpu_setup).
      This is used to allow events for more than 32 VCPUs and for performance
      optimizations reasons.
      
      When the user performs the VCPU hotplug we end up calling the
      the xen_vcpu_setup once more. We make the hypercall which returns
      -EINVAL as it does not allow multiple registration calls (and
      already has re-assigned where the events are being set). We pick
      the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
      However the hypervisor is still setting events in the register
      per-cpu structure (per_cpu(xen_vcpu_info, cpu)).
      
      As such when the events are set by the hypervisor (such as timer one),
      and when we iterate in __xen_evtchn_do_upcall we end up reading stale
      events from the shared_info->vcpu_info[vcpu] instead of the
      per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
      events that the hypervisor has set and the hypervisor keeps on reminding
      us to ack the events which we never do.
      
      The fix is simple. Don't on the second time when xen_vcpu_setup is
      called over-write the per_cpu(xen_vcpu, cpu) if it points to
      per_cpu(xen_vcpu_info).
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f1fc268
  7. 17 4月, 2013 2 次提交
    • K
      xen/time: Fix kasprintf splat when allocating timer%d IRQ line. · 7918c92a
      Konrad Rzeszutek Wilk 提交于
      When we online the CPU, we get this splat:
      
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      installing Xen timer for CPU 1
      BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
      Call Trace:
       [<ffffffff810c1fea>] __might_sleep+0xda/0x100
       [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff813036eb>] kvasprintf+0x5b/0x90
       [<ffffffff81303758>] kasprintf+0x38/0x40
       [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
       [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
       [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
      
      The solution to that is use kasprintf in the CPU hotplug path
      that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
      and remove the call to in xen_hvm_setup_cpu_clockevents.
      
      Unfortunatly the later is not a good idea as the bootup path
      does not use xen_hvm_cpu_notify so we would end up never allocating
      timer%d interrupt lines when booting. As such add the check for
      atomic() to continue.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7918c92a
    • D
      x86/xen: populate boot_params with EDD data · 96f28bc6
      David Vrabel 提交于
      During early setup of a dom0 kernel, populate boot_params with the
      Enhanced Disk Drive (EDD) and MBR signature data.  This makes
      information on the BIOS boot device available in /sys/firmware/edd/.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      96f28bc6
  8. 12 4月, 2013 1 次提交
  9. 28 2月, 2013 1 次提交
    • K
      xen/pat: Disable PAT using pat_enabled value. · c79c4982
      Konrad Rzeszutek Wilk 提交于
      The git commit 8eaffa67
      (xen/pat: Disable PAT support for now) explains in details why
      we want to disable PAT for right now. However that
      change was not enough and we should have also disabled
      the pat_enabled value. Otherwise we end up with:
      
      mmap-example:3481 map pfn expected mapping type write-back for
      [mem 0x00010000-0x00010fff], got uncached-minus
       ------------[ cut here ]------------
      WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0()
      mem 0x00010000-0x00010fff], got uncached-minus
      ------------[ cut here ]------------
      WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774
      untrack_pfn+0xb8/0xd0()
      ...
      Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu
      Call Trace:
       [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0
       [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20
       [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0
       [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100
       [<ffffffff81157459>] unmap_vmas+0x49/0x90
       [<ffffffff8115f808>] exit_mmap+0x98/0x170
       [<ffffffff810559a4>] mmput+0x64/0x100
       [<ffffffff810560f5>] dup_mm+0x445/0x660
       [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510
       [<ffffffff81057931>] do_fork+0x91/0x350
       [<ffffffff81057c76>] sys_clone+0x16/0x20
       [<ffffffff816ccbf9>] stub_clone+0x69/0x90
       [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f
      ---[ end trace 4918cdd0a4c9fea4 ]---
      
      (a similar message shows up if you end up launching 'mcelog')
      
      The call chain is (as analyzed by Liu, Jinsong):
      do_fork
        --> copy_process
          --> dup_mm
            --> dup_mmap
             	--> copy_page_range
                --> track_pfn_copy
                  --> reserve_pfn_range
                    --> line 624: flags != want_flags
      It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR
      (_PAGE_CACHE_UC_MINUS).
      
      Stefan Bader dug in this deep and found out that:
      "That makes it clearer as this will do
      
      reserve_memtype(...)
      --> pat_x_mtrr_type
        --> mtrr_type_lookup
          --> __mtrr_type_lookup
      
      And that can return -1/0xff in case of MTRR not being enabled/initialized. Which
      is not the case (given there are no messages for it in dmesg). This is not equal
      to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS.
      
      It looks like the problem starts early in reserve_memtype:
      
             	if (!pat_enabled) {
                      /* This is identical to page table setting without PAT */
                      if (new_type) {
                              if (req_type == _PAGE_CACHE_WC)
                                      *new_type = _PAGE_CACHE_UC_MINUS;
                              else
                                     	*new_type = req_type & _PAGE_CACHE_MASK;
                     	}
                      return 0;
              }
      
      This would be what we want, that is clearing the PWT and PCD flags from the
      supported flags - if pat_enabled is disabled."
      
      This patch does that - disabling PAT.
      
      CC: stable@vger.kernel.org # 3.3 and further
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reported-and-Tested-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c79c4982
  10. 15 2月, 2013 2 次提交
  11. 24 1月, 2013 1 次提交
  12. 18 12月, 2012 1 次提交
  13. 01 12月, 2012 1 次提交
  14. 29 11月, 2012 1 次提交
  15. 27 11月, 2012 1 次提交
  16. 02 11月, 2012 1 次提交
    • O
      xen PVonHVM: use E820_Reserved area for shared_info · 9d02b43d
      Olaf Hering 提交于
      This is a respin of 00e37bdb
      ("xen PVonHVM: move shared_info to MMIO before kexec").
      
      Currently kexec in a PVonHVM guest fails with a triple fault because the
      new kernel overwrites the shared info page. The exact failure depends on
      the size of the kernel image. This patch moves the pfn from RAM into an
      E820 reserved memory area.
      
      The pfn containing the shared_info is located somewhere in RAM. This will
      cause trouble if the current kernel is doing a kexec boot into a new
      kernel. The new kernel (and its startup code) can not know where the pfn
      is, so it can not reserve the page. The hypervisor will continue to update
      the pfn, and as a result memory corruption occours in the new kernel.
      
      The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
      E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
      starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
      will usually not allocate areas up to FE700000, so FE700000 is expected
      to work also with older toolstacks.
      
      In Xen3 there is no reserved area at a fixed location. If the guest is
      started on such old hosts the shared_info page will be placed in RAM. As
      a result kexec can not be used.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9d02b43d
  17. 20 10月, 2012 1 次提交
  18. 12 10月, 2012 2 次提交
    • K
      xen/bootup: allow {read|write}_cr8 pvops call. · 1a7bbda5
      Konrad Rzeszutek Wilk 提交于
      We actually do not do anything about it. Just return a default
      value of zero and if the kernel tries to write anything but 0
      we BUG_ON.
      
      This fixes the case when an user tries to suspend the machine
      and it blows up in save_processor_state b/c 'read_cr8' is set
      to NULL and we get:
      
      kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
      invalid opcode: 0000 [#1] SMP
      Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
      RIP: e030:[<ffffffff814d5f42>]  [<ffffffff814d5f42>] save_processor_state+0x212/0x270
      
      .. snip..
      Call Trace:
       [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
       [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
       [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1a7bbda5
    • K
      xen/bootup: allow read_tscp call for Xen PV guests. · cd0608e7
      Konrad Rzeszutek Wilk 提交于
      The hypervisor will trap it. However without this patch,
      we would crash as the .read_tscp is set to NULL. This patch
      fixes it and sets it to the native_read_tscp call.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cd0608e7
  19. 24 9月, 2012 1 次提交
  20. 20 9月, 2012 1 次提交
    • K
      xen/boot: Disable BIOS SMP MP table search. · bd49940a
      Konrad Rzeszutek Wilk 提交于
      As the initial domain we are able to search/map certain regions
      of memory to harvest configuration data. For all low-level we
      use ACPI tables - for interrupts we use exclusively ACPI _PRT
      (so DSDT) and MADT for INT_SRC_OVR.
      
      The SMP MP table is not used at all. As a matter of fact we do
      not even support machines that only have SMP MP but no ACPI tables.
      
      Lets follow how Moorestown does it and just disable searching
      for BIOS SMP tables.
      
      This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
      
      9f->100 for 1:1 PTE
      Freeing 9f-100 pfn range: 97 pages freed
      1-1 mapping on 9f->100
      .. snip..
      e820: BIOS-provided physical RAM map:
      Xen: [mem 0x0000000000000000-0x000000000009efff] usable
      Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
      Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
      .. snip..
      Scan for SMP in [mem 0x00000000-0x000003ff]
      Scan for SMP in [mem 0x0009fc00-0x0009ffff]
      Scan for SMP in [mem 0x000f0000-0x000fffff]
      found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
      (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
      . snip..
      Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
      .. snip..
      Call Trace:
       [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81ad32ac>] early_ioremap+0x13/0x15
       [<ffffffff81acc140>] get_mpc_size+0x2f/0x67
       [<ffffffff81acc284>] smp_scan_config+0x10c/0x136
       [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
       [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81abca7f>] start_kernel+0x90/0x390
       [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
       [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
      (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
      
      which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
      (which is what early_ioremap sticks as a flag) - which meant
      we would get MFN 0xFF (pte ff461, which is OK), and then it would
      also map 0x100 (b/c ioremap tries to get page aligned request, and
      it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
      as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
      bypasses the P2M lookup we would happily set the PTE to 1000461.
      Xen would deny the request since we do not have access to the
      Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
      0x80140.
      
      CC: stable@vger.kernel.org
      Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bd49940a
  21. 23 8月, 2012 2 次提交
    • K
      xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. · 3699aad0
      Konrad Rzeszutek Wilk 提交于
      We don't need to return the new PGD - as we do not use it.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3699aad0
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
  22. 22 8月, 2012 3 次提交
    • K
      xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static. · b8b0f559
      Konrad Rzeszutek Wilk 提交于
      There is no need for those functions/variables to be visible. Make them
      static and also fix the compile warnings of this sort:
      
      drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static?
      
      Some of them just require including the header file that
      declares the functions.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b8b0f559
    • K
      xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain. · 806c312e
      Konrad Rzeszutek Wilk 提交于
      If a 64-bit hypervisor is booted with a 32-bit initial domain,
      the hypervisor deals with the initial domain as "compat" and
      does some extra adjustments (like pagetables are 4 bytes instead
      of 8). It also adjusts the xen_start_info->pt_base incorrectly.
      
      When booted with a 32-bit hypervisor (32-bit initial domain):
      ..
      (XEN)  Start info:    cf831000->cf83147c
      (XEN)  Page tables:   cf832000->cf8b5000
      ..
      [    0.000000] PT: cf832000 (f832000)
      [    0.000000] Reserving PT: f832000->f8b5000
      
      And with a 64-bit hypervisor:
      (XEN)  Start info:    00000000cf831000->00000000cf8314b4
      (XEN)  Page tables:   00000000cf832000->00000000cf8b6000
      
      [    0.000000] PT: cf834000 (f834000)
      [    0.000000] Reserving PT: f834000->f8b8000
      
      To deal with this, we keep keep track of the highest physical
      address we have reserved via memblock_reserve. If that address
      does not overlap with pt_base, we have a gap which we reserve.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      806c312e
    • K
      xen/x86: Use memblock_reserve for sensitive areas. · 59b29440
      Konrad Rzeszutek Wilk 提交于
      instead of a big memblock_reserve. This way we can be more
      selective in freeing regions (and it also makes it easier
      to understand where is what).
      
      [v1: Move the auto_translate_physmap to proper line]
      [v2: Per Stefano suggestion add more comments]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      59b29440
  23. 17 8月, 2012 1 次提交
  24. 14 9月, 2012 1 次提交
  25. 20 7月, 2012 7 次提交
  26. 08 6月, 2012 1 次提交
  27. 01 6月, 2012 1 次提交
    • A
      xen/setup: filter APERFMPERF cpuid feature out · 5e626254
      Andre Przywara 提交于
      Xen PV kernels allow access to the APERF/MPERF registers to read the
      effective frequency. Access to the MSRs is however redirected to the
      currently scheduled physical CPU, making consecutive read and
      compares unreliable. In addition each rdmsr traps into the hypervisor.
      So to avoid bogus readouts and expensive traps, disable the kernel
      internal feature flag for APERF/MPERF if running under Xen.
      This will
      a) remove the aperfmperf flag from /proc/cpuinfo
      b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
         use the feature to improve scheduling (by default disabled)
      c) not mislead the cpufreq driver to use the MSRs
      
      This does not cover userland programs which access the MSRs via the
      device file interface, but this will be addressed separately.
      Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5e626254