提交 · a0727e8ce513fe6890416da960181ceb10fbfae6 · openeuler / Kernel

14 4月, 2012 2 次提交

signal, x86: add SIGSYS info and make it synchronous. · a0727e8c

由 Will Drewry 提交于 4月 12, 2012

This change enables SIGSYS, defines _sigfields._sigsys, and adds
x86 (compat) arch support.  _sigsys defines fields which allow
a signal handler to receive the triggering system call number,
the relevant AUDIT_ARCH_* value for that number, and the address
of the callsite.

SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
to have setup_frame() called for it. The goal is to ensure that
ucontext_t reflects the machine state from the time-of-syscall and not
from another signal handler.

The first consumer of SIGSYS would be seccomp filter.  In particular,
a filter program could specify a new return value, SECCOMP_RET_TRAP,
which would result in the system call being denied and the calling
thread signaled.  This also means that implementing arch-specific
support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NEric Paris <eparis@redhat.com>

v18: - added acked by, rebase
v17: - rebase and reviewed-by addition
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6
v12: - reworded changelog (oleg@redhat.com)
v11: - fix dropped words in the change description
     - added fallback copy_siginfo support.
     - added __ARCH_SIGSYS define to allow stepped arch support.
v10: - first version based on suggestion
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

a0727e8c

arch/x86: add syscall_get_arch to syscall.h · b7456536

由 Will Drewry 提交于 4月 12, 2012

Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call
entry path.
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

v18: - update comment about x32 tasks
     - rebase to v3.4-rc2
v17: rebase and reviewed-by
v14: rebase/nochanges
v13: rebase on to 88ebdda6Signed-off-by: NJames Morris <james.l.morris@oracle.com>

b7456536

07 4月, 2012 1 次提交

Make the "word-at-a-time" helper functions more commonly usable · f68e556e

由 Linus Torvalds 提交于 4月 06, 2012

I have a new optimized x86 "strncpy_from_user()" that will use these
same helper functions for all the same reasons the name lookup code uses
them.  This is preparation for that.

This moves them into an architecture-specific header file.  It's
architecture-specific for two reasons:

 - some of the functions are likely to want architecture-specific
   implementations.  Even if the current code happens to be "generic" in
   the sense that it should work on any little-endian machine, it's
   likely that the "multiply by a big constant and shift" implementation
   is less than optimal for an architecture that has a guaranteed fast
   bit count instruction, for example.

 - I expect that if architectures like sparc want to start playing
   around with this, we'll need to abstract out a few more details (in
   particular the actual unaligned accesses).  So we're likely to have
   more architecture-specific stuff if non-x86 architectures start using
   this.

   (and if it turns out that non-x86 architectures don't start using
   this, then having it in an architecture-specific header is still the
   right thing to do, of course)
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f68e556e

30 3月, 2012 1 次提交

x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility · f6365201

由 Len Brown 提交于 3月 29, 2012

The X86_32-only disable_hlt/enable_hlt mechanism was used by the
32-bit floppy driver. Its effect was to replace the use of the
HLT instruction inside default_idle() with cpu_relax() - essentially
it turned off the use of HLT.

This workaround was commented in the code as:

 "disable hlt during certain critical i/o operations"

 "This halt magic was a workaround for ancient floppy DMA
  wreckage. It should be safe to remove."

H. Peter Anvin additionally adds:

 "To the best of my knowledge, no-hlt only existed because of
  flaky power distributions on 386/486 systems which were sold to
  run DOS.  Since DOS did no power management of any kind,
  including HLT, the power draw was fairly uniform; when exposed
  to the much hhigher noise levels you got when Linux used HLT
  caused some of these systems to fail.

  They were by far in the minority even back then."

Alan Cox further says:

 "Also for the Cyrix 5510 which tended to go castors up if a HLT
  occurred during a DMA cycle and on a few other boxes HLT during
  DMA tended to go astray.

  Do we care ? I doubt it. The 5510 was pretty obscure, the 5520
  fixed it, the 5530 is probably the oldest still in any kind of
  use."

So, let's finally drop this.
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephen Hemminger <shemminger@vyatta.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org
[ If anyone cares then alternative instruction patching could be
  used to replace HLT with a one-byte NOP instruction. Much simpler. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f6365201

29 3月, 2012 3 次提交

Delete all instances of asm/system.h · 141124c0

由 David Howells 提交于 3月 28, 2012

Delete all instances of asm/system.h as they should be redundant by this
point.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

141124c0

Move all declarations of free_initmem() to linux/mm.h · 49a7f04a

由 David Howells 提交于 3月 28, 2012

Move all declarations of free_initmem() to linux/mm.h so that there's only one
and it's used by everything.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: linux-c6x-dev@linux-c6x.org
cc: microblaze-uclinux@itee.uq.edu.au
cc: linux-sh@vger.kernel.org
cc: sparclinux@vger.kernel.org
cc: x86@kernel.org
cc: linux-mm@kvack.org

49a7f04a

Disintegrate asm/system.h for X86 · f05e798a

由 David Howells 提交于 3月 28, 2012

Disintegrate asm/system.h for X86.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
cc: x86@kernel.org

f05e798a

28 3月, 2012 2 次提交

X86 & IA64: adapt for dma_map_ops changes · baa676fc

由 Andrzej Pietrasiewicz 提交于 3月 27, 2012

Adapt core x86 and IA64 architecture code for dma_map_ops changes: replace
alloc/free_coherent with generic alloc/free methods.
Signed-off-by: NAndrzej Pietrasiewicz <andrzej.p@samsung.com>
Acked-by: NKyungmin Park <kyungmin.park@samsung.com>
[removed swiotlb related changes and replaced it with wrappers,
 merged with IA64 patch to avoid inter-patch dependences in intel-iommu code]
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NTony Luck <tony.luck@intel.com>

baa676fc

x86/ioapic: Add io_apic_ops driver layer to allow interception · 136d249e

由 Jeremy Fitzhardinge 提交于 3月 21, 2012

Xen dom0 needs to paravirtualize IO operations to the IO APIC,
so add a io_apic_ops for it to intercept.  Do this as ops
structure because there's at least some chance that another
paravirtualized environment may want to intercept these.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: jwboyer@redhat.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1332385090-18056-2-git-send-email-konrad.wilk@oracle.com
[ Made all the affected code easier on the eyes ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

136d249e

26 3月, 2012 1 次提交

x86: Merge the x86_32 and x86_64 cpu_idle() functions · 90e24014

由 Richard Weinberger 提交于 3月 25, 2012

Both functions are mostly identical.
The differences are:

- x86_32's cpu_idle() makes use of check_pgt_cache(), which is a
  nop on both x86_32 and x86_64.

- x86_64's cpu_idle() uses enter/__exit_idle/(), on x86_32 these
  function are a nop.

- In contrast to x86_32, x86_64 calls rcu_idle_enter/exit() in
  the innermost loop because idle notifications need RCU.
  Calling these function on x86_32 also in the innermost loop
  does not hurt.

So we can merge both functions.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: paulmck@linux.vnet.ibm.com
Cc: josh@joshtriplett.org
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1332709204-22496-1-git-send-email-richard@nod.atSigned-off-by: NIngo Molnar <mingo@kernel.org>

90e24014

24 3月, 2012 1 次提交

x86-64: Simplify and optimize vdso clock_gettime monotonic variants · 91ec87d5

由 Andy Lutomirski 提交于 3月 22, 2012

We used to store the wall-to-monotonic offset and the realtime base.
It's faster to precompute the monotonic base.

This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC.
It's much more impressive for CLOCK_MONOTONIC_COARSE.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

91ec87d5

23 3月, 2012 2 次提交

x86/apic: Add separate apic_id_valid() functions for selected apic drivers · b7157acf

由 Steffen Persvold 提交于 3月 16, 2012

As suggested by Suresh Siddha and Yinghai Lu:

For x2apic pre-enabled systems, apic driver is set already early
through early_acpi_boot_init()/early_acpi_process_madt()/
acpi_parse_madt()/default_acpi_madt_oem_check() path so that
apic_id_valid() checking will be sufficient during MADT and SRAT
parsing.

For non-x2apic pre-enabled systems, all apic ids should be less
than 255.

This allows us to substitute the checks in
arch/x86/kernel/acpi/boot.c::acpi_parse_x2apic() and
arch/x86/mm/srat.c::acpi_numa_x2apic_affinity_init() with
apic->apic_id_valid().

In addition we can avoid feigning the x2apic cpu feature in the
NumaChip apic code.

The following apic drivers have separate apic_id_valid()
functions which will accept x2apic type IDs :

 x2apic_phys
 x2apic_cluster
 x2apic_uv_x
 apic_numachip
Signed-off-by: NSteffen Persvold <sp@numascale.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1331925935-13372-1-git-send-email-sp@numascale.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b7157acf

kgdb: x86: Return all segment registers also in 64-bit mode · 639077fb

由 Jan Kiszka 提交于 3月 19, 2012

Even if the content is always 0, gdb expects us to return also ds,
es, fs, and gs while in x86-64 mode. Do this to avoid ugly errors on
"info registers".

[jason.wessel@windriver.com: adjust NUMREGBYTES for two new regs]
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

639077fb

20 3月, 2012 2 次提交

highmem: kill all __kmap_atomic() · a24401bc

由 Cong Wang 提交于 11月 26, 2011

[swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
Signed-off-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NCong Wang <amwang@redhat.com>

a24401bc

x86: kvmclock: abstract save/restore sched_clock_state · b74f05d6

由 Marcelo Tosatti 提交于 2月 13, 2012

Upon resume from hibernation, CPU 0's hvclock area contains the old
values for system_time and tsc_timestamp. It is necessary for the
hypervisor to update these values with uptodate ones before the CPU uses
them.

Abstract TSC's save/restore sched_clock_state functions and use
restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.

Also move restore_sched_clock_state before __restore_processor_state,
since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
Thanks to Igor Mammedov for tracking it down.

Fixes suspend-to-disk with kvmclock.
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b74f05d6

16 3月, 2012 1 次提交

x86: vdso: Use seqcount instead of seqlock · 2ab51657

由 Thomas Gleixner 提交于 2月 28, 2012

The update of the vdso data happens under xtime_lock, so adding a
nested lock is pointless. Just use a seqcount to sync the readers.
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

2ab51657

15 3月, 2012 1 次提交

x32: Fix alignment fail in struct compat_siginfo · 31796ac4

由 H. Peter Anvin 提交于 3月 14, 2012

Adding struct _sigchld_x32 caused a misalignment cascade in struct
siginfo, because union _sifields is located on an 4-byte boundary
(8-byte misaligned.)

Adding new fields that are 8-byte aligned caused the intermediate
structures to also be aligned to 8 bytes, thereby adding padding in
unexpected places.

Thus, change s64 to compat_s64 here, which makes it "misaligned on
paper".  In reality these fields *are* actually aligned (there are 3
preceeding ints outside the union and 3 inside struct _sigchld_x32),
but because of the intervening union and struct it is not possible for
gcc to avoid padding without breaking the ABI.
Reported-and-tested-by: NH. J. Lu <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com

31796ac4

14 3月, 2012 1 次提交

x86/platform: Move APIC ID validity check into platform APIC code · fa63030e

由 Daniel J Blueman 提交于 3月 14, 2012

Move APIC ID validity check into platform APIC code, so it can
be overridden when needed. For NumaChip systems, always trust
MADT, as it's constructed with high APIC IDs.

Behaviour verifies on standard x86 systems and on NumaChip
systems with this, and compile-tested with allyesconfig.
Signed-off-by: NDaniel J Blueman <daniel@numascale-asia.com>
Reviewed-by: NSteffen Persvold <sp@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1331709454-27966-1-git-send-email-daniel@numascale-asia.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

fa63030e

13 3月, 2012 3 次提交

x86: Move is_ia32_task to asm/thread_info.h from asm/compat.h · ef334a20

由 Srikar Dronamraju 提交于 3月 13, 2012

is_ia32_task() is useful even in !CONFIG_COMPAT cases - utrace will
use it for example. Hence move it to a more generic file: asm/thread_info.h

Also now is_ia32_task() returns true if CONFIG_X86_32 is defined.
Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313140303.17134.1401.sendpatchset@srdronam.in.ibm.com
[ Performed minor cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ef334a20

sched/x86: Fix overflow in cyc2ns_offset · 9993bc63

由 Salman Qazi 提交于 3月 09, 2012

When a machine boots up, the TSC generally gets reset.  However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel.  The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec.  The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.

We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.

Refactor code to share the calculation with the previous
fix in __cycles_2_ns().
Signed-off-by: NSalman Qazi <sqazi@google.com>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9993bc63

x86: Rename trap_no to trap_nr in thread_struct · 51e7dc70

由 Srikar Dronamraju 提交于 3月 12, 2012

There are precedences of trap number being referred to as
trap_nr. However thread struct refers trap number as trap_no.
Change it to trap_nr.

Also use enum instead of left-over literals for trap values.

This is pure cleanup, no functional change intended.
Suggested-by: NIngo Molnar <mingo@eltu.hu>
Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com
[ Fixed the math-emu build ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

51e7dc70

11 3月, 2012 1 次提交

xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. · 73c154c6

由 Konrad Rzeszutek Wilk 提交于 2月 13, 2012

For the hypervisor to take advantage of the MWAIT support it needs
to extract from the ACPI _CST the register address. But the
hypervisor does not have the support to parse DSDT so it relies on
the initial domain (dom0) to parse the ACPI Power Management information
and push it up to the hypervisor. The pushing of the data is done
by the processor_harveset_xen module which parses the information that
the ACPI parser has graciously exposed in 'struct acpi_processor'.

For the ACPI parser to also expose the Cx states for MWAIT, we need
to expose the MWAIT capability (leaf 1). Furthermore we also need to
expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
function.

The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
operations, but it can't do it since it needs to be backwards compatible.
Instead we choose to use the native CPUID to figure out if the MWAIT
capability exists and use the XEN_SET_PDC query hypercall to figure out
if the hypervisor wants us to expose the MWAIT_LEAF capability or not.

Note: The XEN_SET_PDC query was implemented in c/s 23783:
"ACPI: add _PDC input override mechanism".

With this in place, instead of
 C3 ACPI IOPORT 415
we get now
 C3:ACPI FFH INTEL MWAIT 0x20

Note: The cpu_idle which would be calling the mwait variants for idling
never gets set b/c we set the default pm_idle to be the hypercall variant.
Acked-by: NJan Beulich <JBeulich@suse.com>
[v2: Fix missing header file include and #ifdef]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

73c154c6

10 3月, 2012 1 次提交

x86: Use enum instead of literals for trap values · c9408265

由 Kees Cook 提交于 3月 09, 2012

The traps are referred to by their numbers and it can be difficult to
understand them while reading the code without context. This patch adds
enumeration of the trap numbers and replaces the numbers with the correct
enum for x86.
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20120310000710.GA32667@www.outflux.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

c9408265

08 3月, 2012 10 次提交

KVM: PMU: warn when pin control is set in eventsel msr · a7b9d2cc

由 Gleb Natapov 提交于 2月 26, 2012

Print warning once if pin control bit is set in eventsel msr since
emulation does not support it yet.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a7b9d2cc

KVM: x86 emulator: Allow PM/VM86 switch during task switch · 4cee4798

由 Kevin Wolf 提交于 2月 08, 2012

Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly.

In order to let privilege checks succeed, rflags needs to be updated in
the vcpu struct as this causes a CPL update.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4cee4798

KVM: x86 emulator: Fix task switch privilege checks · 7f3d35fd

由 Kevin Wolf 提交于 2月 08, 2012

Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

[avi: kill kvm-kmod remnants]
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f3d35fd

KVM: Introduce kvm_memory_slot::arch and move lpage_info into it · db3fe4eb

由 Takuya Yoshikawa 提交于 2月 08, 2012

Some members of kvm_memory_slot are not used by every architecture.

This patch is the first step to make this difference clear by
introducing kvm_memory_slot::arch;  lpage_info is moved into it.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

db3fe4eb

KVM: Track TSC synchronization in generations · e26101b1

由 Zachary Amsden 提交于 2月 03, 2012

This allows us to track the original nanosecond and counter values
at each phase of TSC writing by the guest.  This gets us perfect
offset matching for stable TSC systems, and perfect software
computed TSC matching for machines with unstable TSC.
Signed-off-by: NZachary Amsden <zamsden@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e26101b1

KVM: Dont mark TSC unstable due to S4 suspend · 0dd6a6ed

由 Zachary Amsden 提交于 2月 03, 2012

During a host suspend, TSC may go backwards, which KVM interprets
as an unstable TSC.  Technically, KVM should not be marking the
TSC unstable, which causes the TSC clocksource to go bad, but we
need to be adjusting the TSC offsets in such a case.

Dealing with this issue is a little tricky as the only place we
can reliably do it is before much of the timekeeping infrastructure
is up and running.  On top of this, we are not in a KVM thread
context, so we may not be able to safely access VCPU fields.
Instead, we compute our best known hardware offset at power-up and
stash it to be applied to all VCPUs when they actually start running.
Signed-off-by: NZachary Amsden <zamsden@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0dd6a6ed

KVM: Allow adjust_tsc_offset to be in host or guest cycles · f1e2b260

由 Marcelo Tosatti 提交于 2月 03, 2012

Redefine the API to take a parameter indicating whether an
adjustment is in host or guest cycles.
Signed-off-by: NZachary Amsden <zamsden@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f1e2b260

KVM: Add last_host_tsc tracking back to KVM · 6f526ec5

由 Zachary Amsden 提交于 2月 03, 2012

The variable last_host_tsc was removed from upstream code. I am adding
it back for two reasons. First, it is unnecessary to use guest TSC
computation to conclude information about the host TSC. The guest may
set the TSC backwards (this case handled by the previous patch), but
the computation of guest TSC (and fetching an MSR) is significanlty more
work and complexity than simply reading the hardware counter. In addition,
we don't actually need the guest TSC for any part of the computation,
by always recomputing the offset, we can eliminate the need to deal with
the current offset and any scaling factors that may apply.

The second reason is that later on, we are going to be using the host
TSC value to restore TSC offsets after a host S4 suspend, so we need to
be reading the host values, not the guest values here.
Signed-off-by: NZachary Amsden <zamsden@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6f526ec5

KVM: Improve TSC offset matching · 5d3cb0f6

由 Zachary Amsden 提交于 2月 03, 2012

There are a few improvements that can be made to the TSC offset
matching code.  First, we don't need to call the 128-bit multiply
(especially on a constant number), the code works much nicer to
do computation in nanosecond units.

Second, the way everything is setup with software TSC rate scaling,
we currently have per-cpu rates.  Obviously this isn't too desirable
to use in practice, but if for some reason we do change the rate of
all VCPUs at runtime, then reset the TSCs, we will only want to
match offsets for VCPUs running at the same rate.

Finally, for the case where we have an unstable host TSC, but
rate scaling is being done in hardware, we should call the platform
code to compute the TSC offset, so the math is reorganized to recompute
the base instead, then transform the base into an offset using the
existing API.

[avi: fix 64-bit division on i386]
Signed-off-by: NZachary Amsden <zamsden@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

KVM: Fix 64-bit division in kvm_write_tsc()

Breaks i386 build.
Signed-off-by: NAvi Kivity <avi@redhat.com>

5d3cb0f6

KVM: Infrastructure for software and hardware based TSC rate scaling · cc578287

由 Zachary Amsden 提交于 2月 03, 2012

This requires some restructuring; rather than use 'virtual_tsc_khz'
to indicate whether hardware rate scaling is in effect, we consider
each VCPU to always have a virtual TSC rate.  Instead, there is new
logic above the vendor-specific hardware scaling that decides whether
it is even necessary to use and updates all rate variables used by
common code.  This means we can simply query the virtual rate at
any point, which is needed for software rate scaling.

There is also now a threshold added to the TSC rate scaling; minor
differences and variations of measured TSC rate can accidentally
provoke rate scaling to be used when it is not needed.  Instead,
we have a tolerance variable called tsc_tolerance_ppm, which is
the maximum variation from user requested rate at which scaling
will be used.  The default is 250ppm, which is the half the
threshold for NTP adjustment, allowing for some hardware variation.

In the event that hardware rate scaling is not available, we can
kludge a bit by forcing TSC catchup to turn on when a faster than
hardware speed has been requested, but there is nothing available
yet for the reverse case; this requires a trap and emulate software
implementation for RDTSC, which is still forthcoming.

[avi: fix 64-bit division on i386]
Signed-off-by: NZachary Amsden <zamsden@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cc578287

06 3月, 2012 2 次提交

x32: Switch to a 64-bit clock_t · e7084fd5

由 H. Peter Anvin 提交于 3月 05, 2012

clock_t is used mainly to give the number of jiffies a certain process
has burned.  It is entirely feasible for a long-running process to
consume more than 2^32 jiffies especially in a multiprocess system.
As such, switch to a 64-bit clock_t for x32, just as we already
switched to a 64-bit time_t.

clock_t is only used in a handful of places, and as such it is really
not a very significant change.  The one that has the biggest impact is
in struct siginfo, but since the *size* of struct siginfo doesn't
change (it is padded to the hilt) it is fairly easy to make this a
localized change.

This also gets rid of sys_x32_times, however since this is a pretty
late change don't compactify the system call numbers; we can reuse
system call slot 521 next time we need an x32 system call.
Reported-by: NGregory M. Lueck <gregory.m.lueck@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com

e7084fd5

x32: Provide separate is_ia32_task() and is_x32_task() predicates · a628b684

由 H. Peter Anvin 提交于 3月 05, 2012

The is_compat_task() test is composed of two predicates already, so
make each of them available separately.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com

a628b684

05 3月, 2012 5 次提交

perf/x86: Add Intel LBR MSR definitions · 225ce539

由 Stephane Eranian 提交于 2月 09, 2012

This patch adds the LBR definitions for NHM/WSM/SNB and Core.
It also adds the definitions for the architected LBR MSR:
LBR_SELECT, LBRT_TOS.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

225ce539

KVM: x86: increase recommended max vcpus to 160 · a59cb29e

由 Marcelo Tosatti 提交于 2月 03, 2012

Increase recommended max vcpus from 64 to 160 (tested internally
at Red Hat).
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a59cb29e

x86: Introduce x86_cpuinit.early_percpu_clock_init hook · df156f90

由 Igor Mammedov 提交于 2月 07, 2012

When kvm guest uses kvmclock, it may hang on vcpu hot-plug.
This is caused by an overflow in pvclock_get_nsec_offset,

    u64 delta = tsc - shadow->tsc_timestamp;

which in turn is caused by an undefined values from percpu
hv_clock that hasn't been initialized yet.
Uninitialized clock on being booted cpu is accessed from
   start_secondary
    -> smp_callin
      ->  smp_store_cpu_info
        -> identify_secondary_cpu
          -> mtrr_ap_init
            -> mtrr_restore
              -> stop_machine_from_inactive_cpu
                -> queue_stop_cpus_work
                  ...
                    -> sched_clock
                      -> kvm_clock_read
which is well before x86_cpuinit.setup_percpu_clockev call in
start_secondary, where percpu clock is initialized.

This patch introduces a hook that allows to setup/initialize
per_cpu clock early and avoid overflow due to reading
  - undefined values
  - old values if cpu was offlined and then onlined again

Another possible early user of this clock source is ftrace that
accesses it to get timestamps for ring buffer entries. So if
mtrr_ap_init is moved from identify_secondary_cpu to past
x86_cpuinit.setup_percpu_clockev in start_secondary, ftrace
may cause the same overflow/hang on cpu hot-plug anyway.

More complete description of the problem:
  https://lkml.org/lkml/2012/2/2/101

Credits to Marcelo Tosatti <mtosatti@redhat.com> for hook idea.
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

df156f90

KVM: MMU: Remove unused kvm_pte_chain · 3ea8b75e

由 Takuya Yoshikawa 提交于 1月 17, 2012

Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3ea8b75e

KVM: provide synchronous registers in kvm_run · b9e5dc8d

由 Christian Borntraeger 提交于 1月 11, 2012

On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b9e5dc8d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功