提交 · a547c6db4d2f16ba5ce8e7054bffad6acc248d40 · openanolis / cloud-kernel

18 3月, 2013 1 次提交

perf,x86: fix wrmsr_on_cpu() warning on suspend/resume · 2a6e06b2

由 Linus Torvalds 提交于 3月 17, 2013

Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.

init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.

This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: NParag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a6e06b2

16 3月, 2013 1 次提交

perf,x86: fix kernel crash with PEBS/BTS after suspend/resume · 1d9d8639

由 Stephane Eranian 提交于 3月 15, 2013

This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d9d8639

07 3月, 2013 1 次提交

x86: Fix 32-bit *_cpu_data initializers · 015221fe

由 Krzysztof Mazur 提交于 3月 03, 2013

The commit 27be4570
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.

If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.

To avoid such problems in future C99-style initialization is now
used.
Signed-off-by: NKrzysztof Mazur <krzysiek@podlesie.net>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

015221fe

06 3月, 2013 1 次提交

x86, smpboot: Remove unused variable · 576cfb40

由 Borislav Petkov 提交于 3月 04, 2013

The cpuinfo_x86 ptr is unused now. Drop it. Got obsolete by 69fb3676
("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
removing its only user.

[ hpa: fixes gcc warning ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428180-8865-2-git-send-email-bp@alien8.de
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

576cfb40

03 3月, 2013 1 次提交

x86, ACPI, mm: Revert movablemem_map support · 20e6926d

由 Yinghai Lu 提交于 3月 01, 2013

Tim found:

  WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
  Hardware name: S2600CP
  sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
  smpboot: Booting Node   1, Processors  #1
  Modules linked in:
  Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
  Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
   can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
   and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
   a. that early_parse_srat is NOT called for ia64, so you break ia64.
   b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
	     set_apicid_to_node(i, NUMA_NO_NODE)
     still left in numa_init. So it will just clear result from early_parse_srat.
     it should be moved before that....
   c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
       early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
   but it changes critical x86 code. It caused x86 guys did not
   pay attention to find the problem early. Those patches really should
   be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
  a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
  b. initrd... it will be freed after booting, so it could be on movable...
  c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
	anymore.
  d. init_mem_mapping: can not put page table high anymore.
  e. initmem_init: vmemmap can not be high local node anymore. That is
     not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

 f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

 01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

 27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

 e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

 fb06bc8e ("page_alloc: bootmem limit with movablecore_map")

 42f47e27 ("page_alloc: make movablemem_map have higher priority")

 6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

 34b71f1e ("page_alloc: add movable_memmap kernel parameter")

 4d59a751 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0.  Also
need to find way to put other kernel used ram to local node ram.
Reported-by: NTim Gardner <tim.gardner@canonical.com>
Reported-by: NDon Morris <don.morris@hp.com>
Bisected-by: NDon Morris <don.morris@hp.com>
Tested-by: NDon Morris <don.morris@hp.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

20e6926d

28 2月, 2013 4 次提交

x86/kvm: Fix pvclock vsyscall fixmap · 3d2a80a2

由 Peter Hurley 提交于 2月 27, 2013

The physical memory fixmapped for the pvclock clock_gettime vsyscall
was allocated, and thus is not a kernel symbol. __pa() is the proper
method to use in this case.

Fixes the crash below when booting a next-20130204+ smp guest on a
3.8-rc5+ KVM host.

[    0.666410] udevd[97]: starting version 175
[    0.674043] udevd[97]: udevd:[97]: segfault at ffffffffff5fd020
     ip 00007fff069e277f sp 00007fff068c9ef8 error d
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

3d2a80a2

hlist: drop the node parameter from iterators · b67bfe0d

由 Sasha Levin 提交于 2月 27, 2013

I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b67bfe0d

A
more file_inode() open-coded instances · 6131ffaa
由 Al Viro 提交于 2月 27, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6131ffaa

x86: Make sure we can boot in the case the BDA contains pure garbage · 7c100936

由 H. Peter Anvin 提交于 2月 27, 2013

On non-BIOS platforms it is possible that the BIOS data area contains
garbage instead of being zeroed or something equivalent (firmware
people: we are talking of 1.5K here, so please do the sane thing.)

We need on the order of 20-30K of low memory in order to boot, which
may grow up to < 64K in the future.  We probably want to avoid the
lowest of the low memory.  At the same time, it seems extremely
unlikely that a legitimate EBDA would ever reach down to the 128K
(which would require it to be over half a megabyte in size.)  Thus,
pick 128K as the cutoff for "this is insane, ignore."  We may still
end up reserving a bunch of extra memory on the low megabyte, but that
is not really a major issue these days.  In the worst case we lose
512K of RAM.

This code really should be merged with trim_bios_range() in
arch/x86/kernel/setup.c, but that is a bigger patch for a later merge
window.
Reported-by: NDarren Hart <dvhart@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.org

7c100936

27 2月, 2013 1 次提交

x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock · fe1140cc

由 Jan Kiszka 提交于 2月 23, 2013

This fixes boot lockups with "no-kvmclock", when the host is not
exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when
the kvmclock initialization failed for whatever reason.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

fe1140cc

26 2月, 2013 1 次提交

x86, doc: Fix incorrect comment about 64-bit code segment descriptors · 1256276c

由 Konrad Rzeszutek Wilk 提交于 2月 25, 2013

The AMD64 Architecture Programmer's Manual Volume 2, on page
89 mentions: "If the processor is running in 64-bit mode (L=1),
the only valid setting of the D bit is 0." This matches
with what the code does.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1361825650-14031-4-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1256276c

24 2月, 2013 2 次提交

acpi, memory-hotplug: parse SRAT before memblock is ready · e8d19552

由 Tang Chen 提交于 2月 22, 2013

On linux, the pages used by kernel could not be migrated.  As a result,
if a memory range is used by kernel, it cannot be hot-removed.  So if we
want to hot-remove memory, we should prevent kernel from using it.

The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.

But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel.  And before we parse SRAT, and know the
node memory ranges, memblock is working.  And it may allocate memory in
ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
and never be freed.

So, let's parse SRAT before memblock is called first.  And it is early
enough.

The first call of memblock_find_in_range_node() is in:

  setup_arch()
    |-->setup_real_mode()

so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.

NOTE:

1) early_parse_srat() is called before numa_init(), and has initialized
   numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
   NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
   numa info.

2) I don't know why using count of memory affinities parsed from SRAT
   as a return value in original acpi_numa_init().  So I add a static
   variable srat_mem_cnt to remember this count and use it as the return
   value of the new acpi_numa_init()

[mhocko@suse.cz: parse SRAT before memblock is ready fix]
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8d19552

cpu_hotplug: clear apicid to node when the cpu is hotremoved · c4c60524

由 Wen Congyang 提交于 2月 22, 2013

When a cpu is hotpluged, we call acpi_map_cpu2node() in
_acpi_map_lsapic() to store the cpu's node and apicid's node.  But we
don't clear the cpu's node in acpi_unmap_lsapic() when this cpu is
hotremoved.  If the node is also hotremoved, we will get the following
messages:

  kernel BUG at include/linux/gfp.h:329!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core sg acpi_memhotplug igb dca sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 3126, comm: init Not tainted 3.6.0-rc3-tangchen-hostbridge+ #13 FUJITSU-SV PRIMEQUEST 1800E/SB
  RIP: 0010:[<ffffffff811bc3fd>]  [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300
  RSP: 0018:ffff88078a049cf8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000246
  RBP: ffff88078a049d38 R08: 00000000000040d0 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000b5f R12: 00000000000052d0
  R13: ffff8807c1417300 R14: 0000000000030038 R15: 0000000000000003
  FS:  00007fa9b1b44700(0000) GS:ffff8807c3800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fa9b09acca0 CR3: 000000078b855000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process init (pid: 3126, threadinfo ffff88078a048000, task ffff8807bb6f2650)
  Call Trace:
    new_slab+0x30/0x1b0
    __slab_alloc+0x358/0x4c0
    kmem_cache_alloc_node_trace+0xb4/0x1e0
    alloc_fair_sched_group+0xd0/0x1b0
    sched_create_group+0x3e/0x110
    sched_autogroup_create_attach+0x4d/0x180
    sys_setsid+0xd4/0xf0
    system_call_fastpath+0x16/0x1b
  Code: 89 c4 e9 73 fe ff ff 31 c0 89 de 48 c7 c7 45 de 9e 81 44 89 45 c8 e8 22 05 4b 00 85 db 44 8b 45 c8 0f 89 4f ff ff ff 0f 0b eb fe <0f> 0b 90 eb fd 0f 0b eb fe 89 de 48 c7 c7 45 de 9e 81 31 c0 44
  RIP  [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300
   RSP <ffff88078a049cf8>
  ---[ end trace adf84c90f3fea3e5 ]---

The reason is that the cpu's node is not NUMA_NO_NODE, we will call
alloc_pages_exact_node() to alloc memory on the node, but the node is
offlined.

If the node is onlined, we still need cpu's node.  For example: a task
on the cpu is sleeped when the cpu is hotremoved.  We will choose
another cpu to run this task when it is waked up.  If we know the cpu's
node, we will choose the cpu on the same node first.  So we should clear
cpu-to-node mapping when the node is offlined.

This patch only clears apicid-to-node mapping when the cpu is
hotremoved.

[akpm@linux-foundation.org: fix section error]
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4c60524

23 2月, 2013 2 次提交

A
new helper: file_inode(file) · 496ad9aa
由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
496ad9aa

x86-64: don't set the early IDT to point directly to 'early_idt_handler' · ac630dd9

由 Linus Torvalds 提交于 2月 22, 2013

The code requires the use of the proper per-exception-vector stub
functions (set up as the early_idt_handlers[] array - note the 's') that
make sure to set up the error vector number.  This is true regardless of
whether CONFIG_EARLY_PRINTK is set or not.

Why? The stack offset for the comparison of __KERNEL_CS won't be right
otherwise, nor will the new check (from commit 8170e6be: "x86,
64bit: Use a #PF handler to materialize early mappings on demand") for
the page fault exception vector.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac630dd9

20 2月, 2013 3 次提交

x86/apic: Fix parsing of the 'lapic' cmdline option · 27cf9298

由 Mathias Krause 提交于 2月 19, 2013

Including " lapic " in the kernel cmdline on an x86-64 kernel
makes it panic while parsing early params -- e.g. with no user
visible output.

Fix this bug by ensuring arg is non-NULL before passing it to
strncmp().
Reported-by: NPaX Team <pageexec@freemail.hu>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1361303227-13174-1-git-send-email-minipli@googlemail.com
Cc: stable@vger.kernel.org	# v3.8
Signed-off-by: NIngo Molnar <mingo@kernel.org>

27cf9298

perf/x86: Add Intel IvyBridge event scheduling constraints · 69943182

由 Stephane Eranian 提交于 2月 20, 2013

Intel IvyBridge processor has different constraints compared
to SandyBridge. Therefore it needs its own contraint table.
This patch adds the constraint table.

Without this patch, the events listed in the patch may not be
scheduled correctly and bogus counts may be collected.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1361355312-3323-1-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

69943182

x86, cpu, amd: Fix WC+ workaround for older virtual hosts · 52d3d06e

由 Borislav Petkov 提交于 2月 19, 2013

The WC+ workaround for F10h introduces a new MSR and kvm host #GPs
on accesses to unknown MSRs if paravirt is not compiled in. Use the
exception-handling MSR accessors so as not to break 3.8 and later guests
booting on older hosts.

Remove a redundant family check while at it.

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1361298793-31834-1-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

52d3d06e

19 2月, 2013 1 次提交

x86: pvclock kvm: align allocation size to page size · ed55705d

由 Marcelo Tosatti 提交于 2月 18, 2013

To match whats mapped via vsyscalls to userspace.
Reported-by: NPeter Hurley <peter@hurleysoftware.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ed55705d

18 2月, 2013 2 次提交

x86 idle: rename global pm_idle to static x86_idle · a476bda3

由 Len Brown 提交于 2月 09, 2013

(pm_idle)() is being removed from linux/pm.h
because Linux does not have such a cross-architecture concept.

x86 uses an idle function pointer in its architecture
specific code as a backup to cpuidle.  So we re-name
x86 use of pm_idle to x86_idle, and make it static to x86.
Signed-off-by: NLen Brown <len.brown@intel.com>
Cc: x86@kernel.org

a476bda3

APM idle: register apm_cpu_idle via cpuidle · dd8af076

由 Len Brown 提交于 2月 09, 2013

Update APM to register its local idle routine with cpuidle.

This allows us to stop exporting pm_idle to modules on x86.

The Kconfig sub-option, APM_CPU_IDLE, now depends on on CPU_IDLE.

Compile-tested only.
Signed-off-by: NLen Brown <len.brown@intel.com>
Reviewed-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: Jiri Kosina <jkosina@suse.cz>

dd8af076

16 2月, 2013 1 次提交

perf/x86/amd: Enable northbridge performance counters on AMD family 15h · e259514e

由 Jacob Shin 提交于 2月 06, 2013

On AMD family 15h processors, there are 4 new performance
counters (in addition to 6 core performance counters) that can
be used for counting northbridge events (i.e. DRAM accesses).

Their bit fields are almost identical to the core performance
counters. However, unlike the core performance counters, these
MSRs are shared between multiple cores (that share the same
northbridge).

We will reuse the same code path as existing family 10h
northbridge event constraints handler logic to enforce
this sharing.
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Acked-by: NStephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-7-git-send-email-jacob.shin@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e259514e

15 2月, 2013 2 次提交

x86, mm: Move reserving low memory later in initialization · 95c96084

由 H. Peter Anvin 提交于 2月 14, 2013

Move the reservation of low memory, except for the 4K which actually
does belong to the BIOS, later in the initialization; in particular,
after we have already reserved the trampoline.

The current code locates the trampoline as high as possible, so by
deferring the allocation we will still be able to reserve as much
memory as is possible.  This allows us to run with reservelow=640k
without getting a crash on system startup.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-0y9dqmmsousf69wutxwl3kkf@git.kernel.org

95c96084

x86: ptrace.c only needs export.h and not the full module.h · 19348e74

由 Paul Gortmaker 提交于 2月 14, 2013

Commit cb57a2b4 ("x86-32: Export
kernel_stack_pointer() for modules") added an include of the
module.h header in conjunction with adding an EXPORT_SYMBOL_GPL
of kernel_stack_pointer.

But module.h should be avoided for simple exports, since it in turn
includes the world.  Swap the module.h for export.h instead.

Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1360872842-28417-1-git-send-email-paul.gortmaker@windriver.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

19348e74

14 2月, 2013 2 次提交

A
x86: convert to ksignal · 235b8022
由 Al Viro 提交于 11月 09, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
235b8022

x86, efi: remove duplicate code in setup_arch() by using, efi_is_native() · 6b59e366

由 Satoru Takeuchi 提交于 2月 14, 2013

The check, "IS_ENABLED(CONFIG_X86_64) != efi_enabled(EFI_64BIT)",
in setup_arch() can be replaced by efi_is_enabled(). This change
remove duplicate code and improve readability.
Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Olof Johansson <olof@lixom.net>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

6b59e366

13 2月, 2013 7 次提交

X86: Handle Hyper-V vmbus interrupts as special hypervisor interrupts · bc2b0331

由 K. Y. Srinivasan 提交于 2月 03, 2013

Starting with win8, vmbus interrupts can be delivered on any VCPU in the guest
and furthermore can be concurrently active on multiple VCPUs. Support this
interrupt delivery model by setting up a separate IDT entry for Hyper-V vmbus.
interrupts. I would like to thank Jan Beulich <JBeulich@suse.com> and
Thomas Gleixner <tglx@linutronix.de>, for their help.

In this version of the patch, based on the feedback, I have merged the IDT
vector for Xen and Hyper-V and made the necessary adjustments. Furhermore,
based on Jan's feedback I have added the necessary compilation switches.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1359940959-32168-3-git-send-email-kys@microsoft.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

bc2b0331

X86: Add a check to catch Xen emulation of Hyper-V · db34bbb7

由 K. Y. Srinivasan 提交于 2月 03, 2013

Xen emulates Hyper-V to host enlightened Windows. Looks like this
emulation may be turned on by default even for Linux guests. Check and
fail Hyper-V detection if we are on Xen.

[ hpa: the problem here is that Xen doesn't emulate Hyper-V well
  enough, and if the Xen support isn't compiled in, we end up stubling
  over the Hyper-V emulation and try to activate it -- and it fails. ]
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1359940959-32168-2-git-send-email-kys@microsoft.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

db34bbb7

x86: Hyper-V: register clocksource only if its advertised · 32068f65

由 Olaf Hering 提交于 2月 03, 2013

Enable hyperv_clocksource only if its advertised as a feature.
XenServer 6 returns the signature which is checked in
ms_hyperv_platform(), but it does not offer all features. Currently the
clocksource is enabled unconditionally in ms_hyperv_init_platform(), and
the result is a hanging guest.

Hyper-V spec Bit 1 indicates the availability of Partition Reference
Counter.  Register the clocksource only if this bit is set.

The guest in question prints this in dmesg:
 [    0.000000] Hypervisor detected: Microsoft HyperV
 [    0.000000] HyperV: features 0x70, hints 0x0

This bug can be reproduced easily be setting 'viridian=1' in a HVM domU
.cfg file. A workaround without this patch is to boot the HVM guest with
'clocksource=jiffies'.
Signed-off-by: NOlaf Hering <olaf@aepfle.de>
Link: http://lkml.kernel.org/r/1359940959-32168-1-git-send-email-kys@microsoft.comSigned-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

32068f65

x86, head_32: Give the 6 label a real name · 5e2a044d

由 Borislav Petkov 提交于 2月 11, 2013

Jumping here we are about to enable paging so rename the label
accordingly.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-5-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

5e2a044d

x86, head_32: Remove second CPUID detection from default_entry · c3a22a26

由 Borislav Petkov 提交于 2月 11, 2013

We do that once earlier now and cache it into new_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-4-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c3a22a26

x86: Detect CPUID support early at boot · 9efb58de

由 Borislav Petkov 提交于 2月 11, 2013

We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-3-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9efb58de

x86, head_32: Remove i386 pieces · 166df91d

由 Borislav Petkov 提交于 2月 11, 2013

Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-2-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

166df91d

12 2月, 2013 1 次提交

x86, uv, uv3: Update x2apic Support for SGI UV3 · b15cc4a1

由 Mike Travis 提交于 2月 11, 2013

This patch adds support for the SGI UV3 hub to the common x2apic
functions. The primary changes are to account for the similarities
between UV2 and UV3 which are encompassed within the "UVX" nomenclature.

One significant difference within UV3 is the handling of the MMIOH
regions which are redirected to the target blade (with the device) in
a different manner. It also now has two MMIOH regions for both small and
large BARs. This aids in limiting the amount of physical address space
removed from real memory that's used for I/O in the max config of 64TB.
Signed-off-by: NMike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.752924185@gulag1.americas.sgi.comAcked-by: NRuss Anderson <rja@sgi.com>
Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Steffen Persvold <sp@numascale.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b15cc4a1

11 2月, 2013 2 次提交

x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems · cb214ede

由 Stoney Wang 提交于 2月 07, 2013

When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.

The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode
enabled:

When user specifies x2apic_phys, or FADT indicates PHYSICAL:

1. During madt oem check, apic driver is set with xapic logical
   or xapic phys driver at first.

2. enable_IR_x2apic() will enable x2apic_mode.

3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
   will install the correct x2apic phys driver and use x2apic phys mode.
   Otherwise it will skip the driver will let x2apic_cluster_probe to
   take over to install x2apic cluster driver (wrong one) even though FADT
   indicates PHYSICAL, because x2apic_phys_probe does not check
   FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.
Signed-off-by: NStoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

cb214ede

x86/kvm: Fix compile warning in kvm_register_steal_time() · 136867f5

由 Shuah Khan 提交于 2月 05, 2013

Fix the following compile warning in kvm_register_steal_time():

  CC      arch/x86/kernel/kvm.o
  arch/x86/kernel/kvm.c: In function ‘kvm_register_steal_time’: arch/x86/kernel/kvm.c:302:3:
  warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Wformat]

Introduced via:

  5dfd486c x86, kvm: Fix kvm's use of __pa() on percpu areas
  d7656534 x86, mm: Create slow_virt_to_phys()
  f3c4fbb6 x86, mm: Use new pagetable helpers in try_preserve_large_page()
  4cbeb51b x86, mm: Pagetable level size/shift/mask helpers
  a25b9316 x86, mm: Make DEBUG_VIRTUAL work earlier in boot
Signed-off-by: NShuah Khan <shuah.khan@hp.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: shuahkhan@gmail.com
Cc: avi@redhat.com
Cc: gleb@redhat.com
Cc: mst@redhat.com
Link: http://lkml.kernel.org/r/1360119442.8356.8.camel@lorien2Signed-off-by: NIngo Molnar <mingo@kernel.org>

136867f5

10 2月, 2013 3 次提交

x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag · 27be4570

由 Len Brown 提交于 2月 10, 2013

Remove 32-bit x86 a cmdline param "no-hlt",
and the cpuinfo_x86.hlt_works_ok that it sets.

If a user wants to avoid HLT, then "idle=poll"
is much more useful, as it avoids invocation of HLT
in idle, while "no-hlt" failed to do so.

Indeed, hlt_works_ok was consulted in only 3 places.

First, in /proc/cpuinfo where "hlt_bug yes"
would be printed if and only if the user booted
the system with "no-hlt" -- as there was no other code
to set that flag.

Second, check_hlt() would not invoke halt() if "no-hlt"
were on the cmdline.

Third, it was consulted in stop_this_cpu(), which is invoked
by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
all cases where the machine is being shutdown/reset.
The flag was not consulted in the more frequently invoked
play_dead()/hlt_play_dead() used in processor offline and suspend.

Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
indicating that it would be removed in 2012.
Signed-off-by: NLen Brown <len.brown@intel.com>
Cc: x86@kernel.org

27be4570

x86 idle: remove mwait_idle() and "idle=mwait" cmdline param · 69fb3676

由 Len Brown 提交于 2月 10, 2013

mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT, starting on Pentium-4 HT-enabled processors.

But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
and no longer use mwait_idle().

Here we simplify the x86 native idle code by removing mwait_idle(),
and the "idle=mwait" bootparam used to invoke it.

Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
was invoked saying it would be removed in 2012.  This removal
was also noted in the (now removed:-) feature-removal-schedule.txt.

After this change, kernels configured with
(CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
that supports MWAIT will simply use HLT.  If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be enabled.
Signed-off-by: NLen Brown <len.brown@intel.com>
Cc: x86@kernel.org

69fb3676

xen idle: make xen-specific macro xen-specific · 6a377ddc

由 Len Brown 提交于 2月 09, 2013

This macro is only invoked by Xen,
so make its definition specific to Xen.

> set_pm_idle_to_default()
< xen_set_default_idle()
Signed-off-by: NLen Brown <len.brown@intel.com>
Cc: xen-devel@lists.xensource.com

6a377ddc

09 2月, 2013 1 次提交

uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain() · 74e59dfc

由 Oleg Nesterov 提交于 12月 30, 2012

Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
what consumer->handler() needs but uprobe_get_swbp_addr() is not
exported.

This also simplifies the code and makes it more consistent across
the supported architectures. handle_swbp() becomes the only caller
of uprobe_get_swbp_addr().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>

74e59dfc

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功