提交 · 0bb8aeee7b73b21e09d3ea12f2120d974f70b669 · openanolis / cloud-kernel

02 11月, 2013 7 次提交

x86/mm/pageattr: Add a PUD error unwinding path · 0bb8aeee

由 Borislav Petkov 提交于 10月 31, 2013

In case we encounter an error during the mapping of a region, we want to
unwind what we've established so far exactly the way we did the mapping.
This is the PUD part kept deliberately small for easier review.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

0bb8aeee

x86/mm/pageattr: Add a PTE pagetable populating function · c6b6f363

由 Borislav Petkov 提交于 10月 31, 2013

Handle last level by unconditionally writing the PTEs into the PTE page
while paying attention to the NX bit.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

c6b6f363

x86/mm/pageattr: Add a PMD pagetable populating function · f900a4b8

由 Borislav Petkov 提交于 10月 31, 2013

Handle PMD-level mappings the same as PUD ones.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

f900a4b8

x86/mm/pageattr: Add a PUD pagetable populating function · 4b23538d

由 Borislav Petkov 提交于 10月 31, 2013

Add the next level of the pagetable populating function, we handle
chunks around a 1G boundary by mapping them with the lower level
functions - otherwise we use 1G pages for the mappings, thus using as
less amount of pagetable pages as possible.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

4b23538d

x86/mm/pageattr: Add a PGD pagetable populating function · f3f72966

由 Borislav Petkov 提交于 10月 31, 2013

This allocates, if necessary, and populates the corresponding PGD entry
with a PUD page. The next population level is a dummy macro which will
be removed by the next patch and it is added here to keep the patch
small and easily reviewable but not break bisection, at the same time.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

f3f72966

x86/mm/pageattr: Lookup address in an arbitrary PGD · 0fd64c23

由 Borislav Petkov 提交于 10月 31, 2013

This is preparatory work in order to be able to map pages into a
specified PGD and not implicitly and only into init_mm.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

0fd64c23

x86/efi: Simplify EFI_DEBUG · f4fccac0

由 Borislav Petkov 提交于 10月 31, 2013

... and lose one #ifdef .. #endif sandwich.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

f4fccac0

29 10月, 2013 1 次提交

x86/efi: Add EFI framebuffer earlyprintk support · 72548e83

由 Matt Fleming 提交于 10月 04, 2013

It's incredibly difficult to diagnose early EFI boot issues without
special hardware because earlyprintk=vga doesn't work on EFI systems.

Add support for writing to the EFI framebuffer, via earlyprintk=efi,
which will actually give users a chance of providing debug output.

Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

72548e83

05 10月, 2013 1 次提交

x86/efi: Fix config_table_type array termination · 722da9d2

由 Leif Lindholm 提交于 10月 03, 2013

Incorrect use of 0 in terminating entry of arch_tables[] causes the
following sparse warning,

  arch/x86/platform/efi/efi.c:74:27: sparse: Using plain integer as NULL pointer

Replace with NULL.
Signed-off-by: NLeif Lindholm <leif.lindholm@linaro.org>
[ Included sparse warning in commit message. ]
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

722da9d2

30 9月, 2013 2 次提交

x86 efi: bugfix interrupt disabling sequence · 0ce6cda2

由 Bart Kuivenhoven 提交于 9月 23, 2013

The problem in efi_main was that the idt was cleared before the
interrupts were disabled.

The UEFI spec states that interrupts aren't used so this shouldn't be
too much of a problem. Peripherals however don't necessarily know about
this and thus might cause interrupts to happen anyway. Even if
ExitBootServices() has been called.

This means there is a risk of an interrupt being triggered while the IDT
register is nullified and the interrupt bit hasn't been cleared,
allowing for a triple fault.

This patch disables the interrupt flag, while leaving the existing IDT
in place. The CPU won't care about the IDT at all as long as the
interrupt bit is off, so it's safe to leave it in place as nothing will
ever happen to it.

[ Removed the now unused 'idt' variable - Matt ]
Signed-off-by: NBart Kuivenhoven <bemk@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

0ce6cda2

x86: EFI stub support for large memory maps · d2078d5a

由 Linn Crosetto 提交于 9月 22, 2013

This patch fixes a problem with EFI memory maps larger than 128 entries
when booting using the EFI stub, which results in overflowing e820_map
in boot_params and an eventual halt when checking the map size in
sanitize_e820_map().

If the number of map entries is greater than what can fit in e820_map,
add the extra entries to the setup_data list using type SETUP_E820_EXT.
These extra entries are then picked up when the setup_data list is
parsed in parse_e820_ext().
Signed-off-by: NLinn Crosetto <linn@hp.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

d2078d5a

25 9月, 2013 11 次提交

efi: Generalize handle_ramdisks() and rename to handle_cmdline_files(). · 46f4582e

由 Roy Franz 提交于 9月 22, 2013

The handle_cmdline_files now takes the option to handle as a string,
and returns the loaded data through parameters, rather than taking
an x86 specific setup_header structure.  For ARM, this will be used
to load a device tree blob in addition to initrd images.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Acked-by: NMark Salter <msalter@redhat.com>
Reviewed-by: NGrant Likely <grant.likely@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

46f4582e

efi: Allow efi_free() to be called with size of 0 · 0e1cadb0

由 Roy Franz 提交于 9月 22, 2013

Make efi_free() safely callable with size of 0, similar to free() being
callable with NULL pointers, and do nothing in that case.
Remove size checks that this makes redundant.  This also avoids some
size checks in the ARM EFI stub code that will be added as well.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

0e1cadb0

efi: use efi_get_memory_map() to get final map for x86 · ae8e9060

由 Roy Franz 提交于 9月 22, 2013

Replace the open-coded memory map getting with the
efi_get_memory_map() that is now general enough to use.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

ae8e9060

efi: Move unicode to ASCII conversion to shared function. · 5fef3870

由 Roy Franz 提交于 9月 22, 2013

Move the open-coded conversion to a shared function for
use by all architectures.  Change the allocation to prefer
a high address for ARM, as this is required to avoid conflicts
with reserved regions in low memory.  We don't know the specifics
of these regions until after we process the command line and
device tree.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

5fef3870

efi: Generalize relocate_kernel() for use by other architectures. · 4a9f3a7c

由 Roy Franz 提交于 9月 22, 2013

Rename relocate_kernel() to efi_relocate_kernel(), and take
parameters rather than x86 specific structure.  Add max_addr
argument as for ARM we have some address constraints that we
need to enforce when relocating the kernel.  Add alloc_size
parameter for use by ARM64 which uses an uncompressed kernel,
and needs to allocate space for BSS.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

4a9f3a7c

efi: Move relocate_kernel() to shared file. · c6866d72

由 Roy Franz 提交于 9月 22, 2013

The relocate_kernel() function will be generalized and used
by all architectures, as they all have similar requirements.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

c6866d72

efi: Rename memory allocation/free functions · 40e4530a

由 Roy Franz 提交于 9月 22, 2013

Rename them to be more similar, as low_free() could be used to free
memory allocated by both high_alloc() and low_alloc().
high_alloc() -> efi_high_alloc()
low_alloc()  -> efi_low_alloc()
low_free()   -> efi_free()
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Acked-by: NMark Salter <msalter@redhat.com>
Reviewed-by: NGrant Likely <grant.likely@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

40e4530a

efi: Add system table pointer argument to shared functions. · 876dc36a

由 Roy Franz 提交于 9月 22, 2013

Add system table pointer argument to shared EFI stub related functions
so they no longer use a global system table pointer as they did when part
of eboot.c.  For the ARM EFI stub this allows us to avoid global
variables completely and thereby not have to deal with GOT fixups.
Not having the EFI stub fixup its GOT, which is shared with the
decompressor, simplifies the relocating of the zImage to a
bootable address.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

876dc36a

efi: Move common EFI stub code from x86 arch code to common location · 7721da4c

由 Roy Franz 提交于 9月 22, 2013

No code changes made, just moving functions and #define from x86 arch
directory to common location.  Code is shared using #include, similar
to how decompression code is shared among architectures.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Acked-by: NMark Salter <msalter@redhat.com>
Reviewed-by: NGrant Likely <grant.likely@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

7721da4c

efi: Add proper definitions for some EFI function pointers. · ed37ddff

由 Roy Franz 提交于 9月 22, 2013

The x86/AMD64 EFI stubs must use a call wrapper to convert between
the Linux and EFI ABIs, so void pointers are sufficient.  For ARM,
the ABIs are compatible, so we can directly invoke the function
pointers.  The functions that are used by the ARM stub are updated
to match the EFI definitions.
Also add some EFI types used by EFI functions.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Acked-by: NMark Salter <msalter@redhat.com>
Reviewed-by: NGrant Likely <grant.likely@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

ed37ddff

EFI stub documentation updates · 4172fe2f

由 Roy Franz 提交于 9月 22, 2013

Move efi-stub.txt out of x86 directory and into common directory
in preparation for adding ARM EFI stub support.
Signed-off-by: NRoy Franz <roy.franz@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

4172fe2f

05 9月, 2013 2 次提交

efi: x86: make efi_lookup_mapped_addr() a common function · 258f6fd7

由 Leif Lindholm 提交于 9月 05, 2013

efi_lookup_mapped_addr() is a handy utility for other platforms than
x86. Move it from arch/x86 to drivers/firmware. Add memmap pointer
to global efi structure, and initialise it on x86.
Signed-off-by: NLeif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

258f6fd7

efi: x86: ia64: provide a generic efi_config_init() · 272686bf

由 Leif Lindholm 提交于 9月 05, 2013

Common to (U)EFI support on all platforms is the global "efi" data
structure, and the code that parses the System Table to locate
addresses to populate that structure with.

This patch adds both of these to the global EFI driver code and
removes the local definition of the global "efi" data structure from
the x86 and ia64 code.

Squashed into one big patch to avoid breaking bisection.
Signed-off-by: NLeif Lindholm <leif.lindholm@linaro.org>
Acked-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

272686bf

23 8月, 2013 2 次提交

x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member · 41aacc1e

由 Radu Caragea 提交于 8月 21, 2013

This is the updated version of df54d6fa ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.
Signed-off-by: NRadu Caragea <sinaelgl@gmail.com>
Reported-and-tested-by: NJeff Shorey <shoreyjeff@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41aacc1e

Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction" · 5ea80f76

由 Linus Torvalds 提交于 8月 22, 2013

This reverts commit df54d6fa.

The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.

In particular, the MATLAB Java runtime seems to be unhappy. See

  https://bugzilla.kernel.org/show_bug.cgi?id=60774

So we'll want to apply the random offset only once, and Radu has a patch
for that.  Revert this older commit in order to apply the other one.
Reported-by: NJeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ea80f76

20 8月, 2013 3 次提交

xen/smp: initialize IPI vectors before marking CPU online · fc78d343

由 Chuck Anderson 提交于 8月 06, 2013

An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:

	kernel BUG at drivers/xen/events.c:1328!

RCU has detected that a CPU has not entered a quiescent state within the
grace period.  It needs to send the CPU a reschedule IPI if it is not
offline.  rcu_implicit_offline_qs() does this check:

	/*
	 * If the CPU is offline, it is in a quiescent state.  We can
	 * trust its state not to change because interrupts are disabled.
	 */
	if (cpu_is_offline(rdp->cpu)) {
		rdp->offline_fqs++;
		return 1;
	}

	Else the CPU is online.  Send it a reschedule IPI.

The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()).  See start_secondary():

	set_cpu_online(smp_processor_id(), true);
	...
	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;

start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:

	/*
	 * Wait until the cpu which brought this one up marked it
	 * online before enabling interrupts. If we don't do that then
	 * we can end up waking up the softirq thread before this cpu
	 * reached the active state, which makes the scheduler unhappy
	 * and schedule the softirq thread on the wrong cpu. This is
	 * only observable with forced threaded interrupts, but in
	 * theory it could also happen w/o them. It's just way harder
	 * to achieve.
	 */
	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
		cpu_relax();

	/* enable local interrupts */
	local_irq_enable();

The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.

The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:

	kernel BUG at drivers/xen/events.c:1328!

	xen_send_IPI_one()
	xen_smp_send_reschedule()
	rcu_implicit_offline_qs()
	rcu_implicit_dynticks_qs()
	force_qs_rnp()
	force_quiescent_state()
	__rcu_process_callbacks()
	rcu_process_callbacks()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()

because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.

There is at least one other place that has caused the same crash:

	xen_smp_send_reschedule()
	wake_up_idle_cpu()
	add_timer_on()
	clocksource_watchdog()
	call_timer_fn()
	run_timer_softirq()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()
	xen_hvm_callback_vector()

clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:

	/*
	 * Cycle through CPUs to check if the CPUs stay synchronized
	 * to each other.
	 */
	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
	if (next_cpu >= nr_cpu_ids)
		next_cpu = cpumask_first(cpu_online_mask);
	watchdog_timer.expires += WATCHDOG_INTERVAL;
	add_timer_on(&watchdog_timer, next_cpu);

This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).

But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."

The conclusion was that:
"
1. At the IPI sender side:

   It is incorrect to send an IPI to an offline CPU (cpu not present in
   the cpu_online_mask). There are numerous places where we check this
   and warn/complain.

2. At the IPI receiver side:

   It is incorrect to let the world know of our presence (by setting
   ourselves in global bitmasks) until our initialization steps are complete
   to such an extent that we can handle the consequences (such as
   receiving interrupts without crashing the sender etc.)
" (from Srivatsa)

As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.

It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.

This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.

Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fc78d343

x86/xen: do not identity map UNUSABLE regions in the machine E820 · 3bc38cbc

由 David Vrabel 提交于 8月 16, 2013

If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.

There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.

We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.

This fixes a boot failure on hosts with tboot.

tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.

  (XEN)  0000000000000000 - 0000000000060000 (usable)
  (XEN)  0000000000060000 - 0000000000068000 (reserved)
  (XEN)  0000000000068000 - 000000000009e000 (usable)
  (XEN)  0000000000100000 - 0000000000800000 (usable)
  (XEN)  0000000000800000 - 0000000000972000 (unusable)

tboot marked this region as unusable.

  (XEN)  0000000000972000 - 00000000cf200000 (usable)
  (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
  (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
  (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
  (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
  (XEN)  00000000fe000000 - 0000000100000000 (reserved)
  (XEN)  0000000100000000 - 0000000630000000 (usable)
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3bc38cbc

x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM · 527bf129

由 Yinghai Lu 提交于 8月 12, 2013

Dave Hansen reported that systems between 500G and 600G RAM
crash early if DEBUG_PAGEALLOC is selected.

 > [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
 > [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
 > [    0.000000] BRK [0x02086000, 0x02086fff] PGTABLE
 > [    0.000000] BRK [0x02087000, 0x02087fff] PGTABLE
 > [    0.000000] BRK [0x02088000, 0x02088fff] PGTABLE
 > [    0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff]
 > [    0.000000]  [mem 0xe80ee00000-0xe80effffff] page 4k
 > [    0.000000] BRK [0x02089000, 0x02089fff] PGTABLE
 > [    0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE
 > [    0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory

It turns out that we missed increasing needed pages in BRK to
mapping initial 2M and [0,1M) when we switched to use the #PF
handler to set memory mappings:

 > commit 8170e6be
 > Author: H. Peter Anvin <hpa@zytor.com>
 > Date:   Thu Jan 24 12:19:52 2013 -0800
 >
 >     x86, 64bit: Use a #PF handler to materialize early mappings on demand

Before that, we had the maping from [0,512M) in head_64.S, and we
can spare two pages [0-1M).  After that change, we can not reuse
pages anymore.

When we have more than 512M ram, we need an extra page for pgd page
with [512G, 1024g).

Increase pages in BRK for page table to solve the boot crash.
Reported-by: NDave Hansen <dave.hansen@intel.com>
Bisected-by: NDave Hansen <dave.hansen@intel.com>
Tested-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> # v3.9 and later
Link: http://lkml.kernel.org/r/1376351004-4015-1-git-send-email-yinghai@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

527bf129

14 8月, 2013 3 次提交

x86 get_unmapped_area(): use proper mmap base for bottom-up direction · df54d6fa

由 Radu Caragea 提交于 8月 13, 2013

When the stack is set to unlimited, the bottomup direction is used for
mmap-ings but the mmap_base is not used and thus effectively renders
ASLR for mmapings along with PIE useless.

Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df54d6fa

mm: save soft-dirty bits on file pages · 41bb3476

由 Cyrill Gorcunov 提交于 8月 13, 2013

Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry.  Thus when #pf happens on such non-present pte
we can restore it back.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41bb3476

mm: save soft-dirty bits on swapped pages · 179ef71c

由 Cyrill Gorcunov 提交于 8月 13, 2013

Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.

To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out.  When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.

One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap.  The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

179ef71c

13 8月, 2013 3 次提交

sched: fix the theoretical signal_wake_up() vs schedule() race · e0acd0a6

由 Oleg Nesterov 提交于 8月 12, 2013

This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0acd0a6

x86, microcode, AMD: Fix early microcode loading · 84516098

由 Torsten Kaiser 提交于 8月 08, 2013

load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading does not depend on the CPU wrt the
patches loaded since they will end up in a global list for all CPUs
anyway.

The change from cpu to x86family in load_microcode_amd()
now allows to drop the code messing with cpu_data(cpu) from
collect_cpu_info_amd_early(), which is wrong anyway because at that
point the per-cpu cpu_info is not yet setup (These values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).

Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
because its only used at one place and without the cpuinfo_x86 accesses
it was not much left.
Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
[ Fengguang: build fix ]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
[ Boris: adapt it to current tree. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

84516098

x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 · 8c6b79bb

由 Torsten Kaiser 提交于 7月 23, 2013

cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early()
will fill ->x86 and so the fallback to boot_cpu_data is not used. But
->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
caller init_amd() will have a struct cpuinfo_x86 as parameter and the
set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
and the broken fallback can be dropped.

[ Boris: Drop WARN_ON() since we're called only from init_amd() ]
Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

8c6b79bb

12 8月, 2013 1 次提交

perf/x86: Add Haswell ULT model number used in Macbook Air and other systems · 0499bd86

由 Andi Kleen 提交于 8月 08, 2013

This one was missed earlier.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0499bd86

10 8月, 2013 1 次提交

x86: Don't clear olpc_ofw_header when sentinel is detected · d55e37bb

由 Daniel Drake 提交于 8月 09, 2013

OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.

This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.

OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.

OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.
Signed-off-by: NDaniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.orgAcked-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.9+

d55e37bb

05 8月, 2013 1 次提交

perf/x86: Fix intel QPI uncore event definitions · c9601247

由 Vince Weaver 提交于 8月 02, 2013

John McCalpin reports that the "drs_data" and "ncb_data" QPI
uncore events are missing the "extra bit" and always return zero
values unless the bit is properly set.

More details from him:

 According to the Xeon E5-2600 Product Family Uncore Performance
 Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
 (including the ones that "perf" calls "drs_data" and "ncb_data") require
 that the "extra bit" be set.

 This was confusing for a while -- a note at the bottom of page 94 says
 that the "extra bit" is bit 16 of the control register.
 Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
 be zero.  Looking around a bit, I found that bit 21 appears to be the
 correct "extra bit", and further investigation shows that "perf" actually
 agrees with me:
	[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
	config:0-7,21

 So the command
	# perf -e "uncore_qpi_0/event=drs_data/"
 Is the same as
	# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
 While it should be
	# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"

 I confirmed that this last version gives results that agree with the
 amount of data that I expected the STREAM benchmark to move across the QPI
 link in the second (cross-chip) test of the original script.
Reported-by: NJohn McCalpin <mccalpin@tacc.utexas.edu>
Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
Cc: zheng.z.yan@intel.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>

c9601247

01 8月, 2013 1 次提交

arch/x86/platform/ce4100/ce4100.c: include reboot.h · 31a1b26f

由 Andrew Morton 提交于 7月 31, 2013

Fix the build:

  arch/x86/platform/ce4100/ce4100.c: In function 'x86_ce4100_early_setup':
  arch/x86/platform/ce4100/ce4100.c:165:2: error: 'reboot_type' undeclared (first use in this function)
Reported-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31a1b26f

31 7月, 2013 1 次提交

x86, amd, microcode: Fix error path in apply_microcode_amd() · d982057f

由 Torsten Kaiser 提交于 7月 23, 2013

Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.
Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Link: http://lkml.kernel.org/r/20130723225823.2e4e7588@googlemail.comAcked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d982057f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功