提交 · be7b263ea925324e54e48c3558d4719be5374053 · openanolis / cloud-kernel

10 11月, 2015 7 次提交

KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC · be7b263e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be7b263e

KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded · ff2c3a18

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff2c3a18

KVM: VMX: Enable and initialize VMX TSC scaling · 64903d61

由 Haozhong Zhang 提交于 10月 20, 2015

This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64903d61

KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() · 58ea6767

由 Haozhong Zhang 提交于 10月 20, 2015

For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58ea6767

KVM: x86: Replace call-back compute_tsc_offset() with a common function · 07c1419a

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07c1419a

KVM: x86: Replace call-back set_tsc_khz() with a common function · 381d585c

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

381d585c

KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common · 450869d6

由 Paolo Bonzini 提交于 11月 04, 2015

They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON.  Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

450869d6

05 11月, 2015 1 次提交

KVM: VMX: Fix commit which broke PML · a3eaa864

由 Kai Huang 提交于 11月 04, 2015

I found PML was broken since below commit:

	commit feda805f
	Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
	Date:   Wed Sep 9 14:05:55 2015 +0800

	KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

	Unify the update in vmx_cpuid_update()
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
	[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control,
in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as
PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after
vcpu is created, PML will be disabled unexpectedly while log-dirty code still
thinks PML is used.

Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control
only when PML is not supported or not enabled (!enable_pml). This is more
reasonable as PML is currently either always enabled or disabled. With this
explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not
needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer.

Fixes: feda805fSigned-off-by: NKai Huang <kai.huang@linux.intel.com>
[While at it, change a wrong ASSERT to an "if".  The condition can happen
 if creating the VCPU fails with ENOMEM. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3eaa864

04 11月, 2015 1 次提交

KVM: VMX: fix SMEP and SMAP without EPT · 656ec4a4

由 Radim Krčmář 提交于 11月 02, 2015

The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.

Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

656ec4a4

19 10月, 2015 1 次提交

kvm: x86: zero EFER on INIT · 5690891b

由 Paolo Bonzini 提交于 10月 19, 2015

Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
without clearing EFER.LME first (which it should not know about).
Yang Zhang from Intel confirmed that the manual is wrong and EFER is
cleared to zero on INIT.

Fixes: d28bc9dd
Cc: stable@vger.kernel.org
Cc: Yang Z Zhang <yang.z.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5690891b

16 10月, 2015 3 次提交

KVM: nVMX: expose VPID capability to L1 · 089d7b6e

由 Wanpeng Li 提交于 10月 13, 2015

Expose VPID capability to L1. For nested guests, we don't do anything
specific for single context invalidation. Hence, only advertise support
for global context invalidation. The major benefit of nested VPID comes
from having separate vpids when switching between L1 and L2, and also
when L2's vCPUs not sched in/out on L1.
Reviewed-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

089d7b6e

KVM: nVMX: nested VPID emulation · 5c614b35

由 Wanpeng Li 提交于 10月 13, 2015

VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel    Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000  nested VPID
kernel    Linux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  vanilla
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c614b35

KVM: nVMX: emulate the INVVPID instruction · 99b83ac8

由 Wanpeng Li 提交于 10月 13, 2015

Add the INVVPID instruction emulation.
Reviewed-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

99b83ac8

14 10月, 2015 3 次提交

KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid · dd5f5341

由 Wanpeng Li 提交于 9月 23, 2015

Introduce __vmx_flush_tlb() to handle specific vpid.
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dd5f5341

KVM: VMX: adjust interface to allocate/free_vpid · 991e7a0e

由 Wanpeng Li 提交于 9月 16, 2015

Adjust allocate/free_vid so that they can be reused for the nested vpid.
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

991e7a0e

KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region · 1d8007bd

由 Paolo Bonzini 提交于 10月 12, 2015

The next patch will make x86_set_memory_region fill the
userspace_addr.  Since the struct is not used untouched
anymore, it makes sense to build it in x86_set_memory_region
directly; it also simplifies the callers.
Reported-by: NAlexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d8007bd

01 10月, 2015 20 次提交

KVM: Update Posted-Interrupts Descriptor when vCPU is blocked · bf9f6ac8

由 Feng Wu 提交于 9月 18, 2015

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list
Signed-off-by: NFeng Wu <feng.wu@intel.com>
[Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bf9f6ac8

KVM: Update Posted-Interrupts Descriptor when vCPU is preempted · 28b835d6

由 Feng Wu 提交于 9月 18, 2015

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR
Signed-off-by: NFeng Wu <feng.wu@intel.com>
[Include asm/cpu.h to fix !CONFIG_SMP compilation. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

28b835d6

KVM: x86: Update IRTE for posted-interrupts · efc64404

由 Feng Wu 提交于 9月 18, 2015

This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
[Squashed in automatically generated patch from the build robot
 "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

efc64404

KVM: Add some helper functions for Posted-Interrupts · ebbfc765

由 Feng Wu 提交于 9月 18, 2015

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
[Make the new functions inline. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ebbfc765

KVM: Extend struct pi_desc for VT-d Posted-Interrupts · 6ef1522f

由 Feng Wu 提交于 9月 18, 2015

Extend struct pi_desc for VT-d Posted-Interrupts.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ef1522f

KVM: VMX: drop rdtscp_enabled field · 1cea0ce6

由 Xiao Guangrong 提交于 9月 09, 2015

Check cpuid bit instead of it
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1cea0ce6

KVM: VMX: clean up bit operation on SECONDARY_VM_EXEC_CONTROL · 7ec36296

由 Xiao Guangrong 提交于 9月 09, 2015

Use vmcs_set_bits() and vmcs_clear_bits() to clean up the code
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ec36296

KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update · feda805f

由 Xiao Guangrong 提交于 9月 09, 2015

Unify the update in vmx_cpuid_update()
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

feda805f

P
KVM: VMX: align vmx->nested.nested_vmx_secondary_ctls_high to vmx->rdtscp_enabled · 8b97265a
由 Paolo Bonzini 提交于 9月 15, 2015
```
The SECONDARY_EXEC_RDTSCP must be available iff RDTSCP is enabled in the
guest.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
8b97265a

KVM: VMX: simplify invpcid handling in vmx_cpuid_update() · 29541bb8

由 Xiao Guangrong 提交于 9月 09, 2015

If vmx_invpcid_supported() is true, second execution control
filed must be supported and SECONDARY_EXEC_ENABLE_INVPCID
must have already been set in current vmcs by
vmx_secondary_exec_control()

If vmx_invpcid_supported() is false, no need to clear
SECONDARY_EXEC_ENABLE_INVPCID
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

29541bb8

KVM: VMX: simplify rdtscp handling in vmx_cpuid_update() · f36201e5

由 Xiao Guangrong 提交于 9月 09, 2015

if vmx_rdtscp_supported() is true SECONDARY_EXEC_RDTSCP must
have already been set in current vmcs by
vmx_secondary_exec_control()
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f36201e5

KVM: VMX: drop rdtscp_enabled check in prepare_vmcs02() · e2821620

由 Xiao Guangrong 提交于 9月 09, 2015

SECONDARY_EXEC_RDTSCP set for L2 guest comes from vmcs12
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e2821620

KVM: x86: add pcommit support · 8b3e34e4

由 Xiao Guangrong 提交于 9月 09, 2015

Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction

Currently we do not catch pcommit instruction for L1 guest and
allow L1 to catch this instruction for L2 if, as required by the spec,
L1 can enumerate the PCOMMIT instruction via CPUID:
| IA32_VMX_PROCBASED_CTLS2[53] (which enumerates support for the
| 1-setting of PCOMMIT exiting) is always the same as
| CPUID.07H:EBX.PCOMMIT[bit 22]. Thus, software can set PCOMMIT exiting
| to 1 if and only if the PCOMMIT instruction is enumerated via CPUID

The spec can be found at
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdfSigned-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b3e34e4

KVM: vmx: disable posted interrupts if no local APIC · d6a858d1

由 Paolo Bonzini 提交于 9月 28, 2015

Uniprocessor 32-bit randconfigs can disable the local APIC, and posted
interrupts require reserving a vector on the LAPIC, so they are
incompatible.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6a858d1

kvm: add tracepoint for fast mmio · 931c33b1

由 Jason Wang 提交于 9月 15, 2015

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

931c33b1

KVM: x86: unify handling of interrupt window · 4ca7dd8c

由 Paolo Bonzini 提交于 7月 30, 2015

The interrupt window is currently checked twice, once in vmx.c/svm.c and
once in dm_request_for_irq_injection. The only difference is the extra
check for kvm_arch_interrupt_allowed in dm_request_for_irq_injection,
and the different return value (EINTR/KVM_EXIT_INTR for vmx.c/svm.c vs.
0/KVM_EXIT_IRQ_WINDOW_OPEN for dm_request_for_irq_injection).

However, dm_request_for_irq_injection is basically dead code! Revive it
by removing the checks in vmx.c and svm.c's vmexit handlers, and
fixing the returned values for the dm_request_for_irq_injection case.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ca7dd8c

KVM: x86: introduce lapic_in_kernel · 35754c98

由 Paolo Bonzini 提交于 7月 29, 2015

Avoid pointer chasing and memory barriers, and simplify the code
when split irqchip (LAPIC in kernel, IOAPIC/PIC in userspace)
is introduced.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35754c98

KVM: x86: replace vm_has_apicv hook with cpu_uses_apicv · d50ab6c1

由 Paolo Bonzini 提交于 7月 29, 2015

This will avoid an unnecessary trip to ->kvm and from there to the VPIC.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d50ab6c1

KVM: x86: store IOAPIC-handled vectors in each VCPU · 3bb345f3

由 Paolo Bonzini 提交于 7月 29, 2015

We can reuse the algorithm that computes the EOI exit bitmap to figure
out which vectors are handled by the IOAPIC.  The only difference
between the two is for edge-triggered interrupts other than IRQ8
that have no notifiers active; however, the IOAPIC does not have to
do anything special for these interrupts anyway.

This again limits the interactions between the IOAPIC and the LAPIC,
making it easier to move the former to userspace.

Inspired by a patch from Steve Rutherford.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3bb345f3

Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages" · 606decd6

由 Paolo Bonzini 提交于 10月 01, 2015

This reverts commit fd717f11.
It was reported to cause Machine Check Exceptions (bug 104091).

Reported-by: harn-solo@gmx.de
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

606decd6

16 9月, 2015 1 次提交

KVM: vmx: fix VPID is 0000H in non-root operation · 04bb92e4

由 Wanpeng Li 提交于 9月 16, 2015

Reference SDM 28.1:

The current VPID is 0000H in the following situations:
- Outside VMX operation. (This includes operation in system-management
  mode under the default treatment of SMIs and SMM with VMX operation;
  see Section 34.14.)
- In VMX root operation.
- In VMX non-root operation when the “enable VPID” VM-execution control
  is 0.

The VPID should never be 0000H in non-root operation when "enable VPID"
VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx:
move some vmx setting from vmx_init() to hardware_setup()") remove the
codes which reserve 0000H for VMX root operation.

This patch fix it by again reserving 0000H for VMX root operation.

Cc: stable@vger.kernel.org # 3.19+
Fixes: 34a1cd60Reported-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

04bb92e4

11 9月, 2015 1 次提交

kexec: split kexec_load syscall from kexec core code · 2965faa5

由 Dave Young 提交于 9月 09, 2015

There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NDave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2965faa5

09 9月, 2015 1 次提交

mm: rename alloc_pages_exact_node() to __alloc_pages_node() · 96db800f

由 Vlastimil Babka 提交于 9月 08, 2015

alloc_pages_exact_node() was introduced in commit 6484eb3e ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise.  In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see for example
commits 5265047a ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb4 ("mm, mempolicy:
migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage.  Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly.  The number of users would be small
anyway.

Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead.  This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.

Both differences will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers.  Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NRobin Holt <robinmholt@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96db800f

15 8月, 2015 1 次提交

x86/kvm: Rename VMX's segment access rights defines · 4d283ec9

由 Andy Lutomirski 提交于 8月 13, 2015

VMX encodes access rights differently from LAR, and the latter is
most likely what x86 people think of when they think of "access
rights".

Rename them to avoid confusion.

Cc: kvm@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d283ec9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功