提交 · 3669ef9fa7d35f573ec9c0e0341b29251c2734a7 · openeuler / raspberrypi-kernel

23 1月, 2015 4 次提交

x86, tls: Interpret an all-zero struct user_desc as "no segment" · 3669ef9f

由 Andy Lutomirski 提交于 1月 22, 2015

The Witcher 2 did something like this to allocate a TLS segment index:

        struct user_desc u_info;
        bzero(&u_info, sizeof(u_info));
        u_info.entry_number = (uint32_t)-1;

        syscall(SYS_set_thread_area, &u_info);

Strictly speaking, this code was never correct.  It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real.  The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.

The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.

This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working.  In
particular, using the code above to allocate two segments will
allocate the same segment both times.

According to FrostbittenKing on Github, this fixes The Witcher 2.

If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.

Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

3669ef9f

x86, tls, ldt: Stop checking lm in LDT_empty · e30ab185

由 Andy Lutomirski 提交于 1月 22, 2015

32-bit programs don't have an lm bit in their ABI, so they can't
reliably cause LDT_empty to return true without resorting to memset.
They shouldn't need to do this.

This should fix a longstanding, if minor, issue in all 64-bit kernels
as well as a potential regression in the TLS hardening code.

Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e30ab185

x86, mpx: Fix potential performance issue on unmaps · c922228e

由 Dave Hansen 提交于 1月 08, 2015

The 3.19 merge window saw some TLB modifications merged which caused a
performance regression. They were fixed in commit 045bbb9fa.

Once that fix was applied, I also noticed that there was a small
but intermittent regression still present.  It was not present
consistently enough to bisect reliably, but I'm fairly confident
that it came from (my own) MPX patches.  The source was reading
a relatively unused field in the mm_struct via arch_unmap.

I also noted that this code was in the main instruction flow of
do_munmap() and probably had more icache impact than we want.

This patch does two things:
1. Adds a static (via Kconfig) and dynamic (via cpuid) check
   for MPX with cpu_feature_enabled().  This keeps us from
   reading that cacheline in the mm and trades it for a check
   of the global CPUID variables at least on CPUs without MPX.
2. Adds an unlikely() to ensure that the MPX call ends up out
   of the main instruction flow in do_munmap().  I've added
   a detailed comment about why this was done and why we want
   it even on systems where MPX is present.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: luto@amacapital.net
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c922228e

x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels · 814564a0

由 Dave Hansen 提交于 1月 08, 2015

We had originally planned on submitting MPX support in one patch
set.  We eventually broke it up in to two pieces for easier
review.  One of the features that didn't make the first round
was supporting 32-bit binaries on 64-bit kernels.

Once we split the set up, we never added code to restrict 32-bit
binaries from _using_ MPX on 64-bit kernels.

The 32-bit bounds tables are a different format than the 64-bit
ones.  Without this patch, the kernel will try to read a 32-bit
binary's tables as if they were the 64-bit version.  They will
likely be noticed as being invalid rather quickly and the app
will get killed, but that's kinda mean.

This patch adds an explicit check, and will make a 64-bit kernel
essentially behave as if it has no MPX support when called from
a 32-bit binary.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20150108223020.9E9AA511@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

814564a0

20 1月, 2015 6 次提交

x86, hyperv: Mark the Hyper-V clocksource as being continuous · 32c6590d

由 K. Y. Srinivasan 提交于 1月 12, 2015

The Hyper-V clocksource is continuous; mark it accordingly.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Acked-by: jasowang@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: devel@linuxdriverproject.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

32c6590d

x86: Don't rely on VMWare emulating PAT MSR correctly · 9d34cfdf

由 Juergen Gross 提交于 1月 12, 2015

VMWare seems not to emulate the PAT MSR correctly: reaeding
MSR_IA32_CR_PAT returns 0 even after writing another value to it.

Commit bd809af1 triggers this VMWare bug when the kernel is
booted as a VMWare guest.

Detect this bug and don't use the read value if it is 0.

Fixes: bd809af1 "x86: Enable PAT to use cache mode translation tables"
Reported-and-tested-by: NJongman Heo <jongman.heo@samsung.com>
Acked-by: NAlok N Kataria <akataria@vmware.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Link: http://lkml.kernel.org/r/1421039745-14335-1-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

9d34cfdf

x86, irq: Properly tag virtualization entry in /proc/interrupts · 4a0d3107

由 Jan Beulich 提交于 1月 16, 2015

The mis-naming likely was a copy-and-paste effect.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/54B9408B0200007800055E8B@mail.emea.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4a0d3107

x86, boot: Skip relocs when load address unchanged · f285f4a2

由 Kees Cook 提交于 1月 15, 2015

On 64-bit, relocation is not required unless the load address gets
changed. Without this, relocations do unexpected things when the kernel
is above 4G.
Reported-by: NBaoquan He <bhe@redhat.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Tested-by: NThomas D. <whissi@whissi.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150116005146.GA4212@www.outflux.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f285f4a2

x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi · 8abb850a

由 Jiang Liu 提交于 1月 20, 2015

Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is.
That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or
acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(),
which may cause undesired effects. So override __acpi_unregister_gsi to
NULL for safety.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Graeme Gregory <graeme.gregory@linaro.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

8abb850a

x86/xen: Treat SCI interrupt as normal GSI interrupt · b568b860

由 Jiang Liu 提交于 1月 20, 2015

Currently Xen Domain0 has special treatment for ACPI SCI interrupt,
that is initialize irq for ACPI SCI at early stage in a special way as:
xen_init_IRQ()
	->pci_xen_initial_domain()
		->xen_setup_acpi_sci()
			Allocate and initialize irq for ACPI SCI

Function xen_setup_acpi_sci() calls acpi_gsi_to_irq() to get an irq
number for ACPI SCI. But unfortunately acpi_gsi_to_irq() depends on
IOAPIC irqdomains through following path
acpi_gsi_to_irq()
	->mp_map_gsi_to_irq()
		->mp_map_pin_to_irq()
			->check IOAPIC irqdomain

For PV domains, it uses Xen event based interrupt manangement and
doesn't make uses of native IOAPIC, so no irqdomains created for IOAPIC.
This causes Xen domain0 fail to install interrupt handler for ACPI SCI
and all ACPI events will be lost. Please refer to:
https://lkml.org/lkml/2014/12/19/178

So the fix is to get rid of special treatment for ACPI SCI, just treat
ACPI SCI as normal GSI interrupt as:
acpi_gsi_to_irq()
	->acpi_register_gsi()
		->acpi_register_gsi_xen()
			->xen_register_gsi()

With above change, there's no need for xen_setup_acpi_sci() anymore.
The above change also works with bare metal kernel too.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1421720467-7709-2-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b568b860

16 1月, 2015 2 次提交

perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM · 33636732

由 Kan Liang 提交于 1月 12, 2015

cycles:p and cycles:pp do not work on SLM since commit:

   86a04461 ("perf/x86: Revamp PEBS event selection")

UOPS_RETIRED.ALL is not a PEBS capable event, so it should not be used
to count cycle number.

Actually SLM calls intel_pebs_aliases_core2() which uses INST_RETIRED.ANY_P
to count the number of cycles. It's a PEBS capable event. But inv and
cmask must be set to count cycles.

Considering SLM allows all events as PEBS with no flags, only
INST_RETIRED.ANY_P, inv=1, cmask=16 needs to handled specially.
Signed-off-by: NKan Liang <kan.liang@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1421084541-31639-1-git-send-email-kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

33636732

perf/rapl: Fix sysfs_show() initialization for RAPL PMU · 433678bd

由 Stephane Eranian 提交于 1月 13, 2015

This patch fixes a problem with the initialization of the
sysfs_show() routine for the RAPL PMU.

The current code was wrongly relying on the EVENT_ATTR_STR()
macro which uses the events_sysfs_show() function in the x86
PMU code. That function itself was relying on the x86_pmu data
structure. Yet RAPL and the core PMU (x86_pmu) have nothing to
do with each other. They should therefore not interact with
each other.

The x86_pmu structure is initialized at boot time based on
the host CPU model. When the host CPU is not supported, the
x86_pmu remains uninitialized and some of the callbacks it
contains are NULL.

The false dependency with x86_pmu could potentially cause crashes
in case the x86_pmu is not initialized while the RAPL PMU is. This
may, for instance, be the case in virtualized environments.

This patch fixes the problem by using a private sysfs_show()
routine for exporting the RAPL PMU events.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150113225953.GA21525@thinkpad
Cc: vincent.weaver@maine.edu
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

433678bd

15 1月, 2015 1 次提交

ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing · 237d28db

由 Steven Rostedt (Red Hat) 提交于 1月 12, 2015

If the function graph tracer traces a jprobe callback, the system will
crash. This can easily be demonstrated by compiling the jprobe
sample module that is in the kernel tree, loading it and running the
function graph tracer.

 # modprobe jprobe_example.ko
 # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 # ls

The first two commands end up in a nice crash after the first fork.
(do_fork has a jprobe attached to it, so "ls" just triggers that fork)

The problem is caused by the jprobe_return() that all jprobe callbacks
must end with. The way jprobes works is that the function a jprobe
is attached to has a breakpoint placed at the start of it (or it uses
ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
will copy the stack frame and change the ip address to return to the
jprobe handler instead of the function. The jprobe handler must end
with jprobe_return() which swaps the stack and does an int3 (breakpoint).
This breakpoint handler will then put back the saved stack frame,
simulate the instruction at the beginning of the function it added
a breakpoint to, and then continue on.

For function tracing to work, it hijakes the return address from the
stack frame, and replaces it with a hook function that will trace
the end of the call. This hook function will restore the return
address of the function call.

If the function tracer traces the jprobe handler, the hook function
for that handler will not be called, and its saved return address
will be used for the next function. This will result in a kernel crash.

To solve this, pause function tracing before the jprobe handler is called
and unpause it before it returns back to the function it probed.

Some other updates:

Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
code look a bit cleaner and easier to understand (various tries to fix
this bug required this change).

Note, if fentry is being used, jprobes will change the ip address before
the function graph tracer runs and it will not be able to trace the
function that the jprobe is probing.

Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org

Cc: stable@vger.kernel.org # 2.6.30+
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

237d28db

13 1月, 2015 1 次提交

x86/xen: properly retrieve NMI reason · f221b04f

由 Jan Beulich 提交于 1月 13, 2015

Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.

Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

f221b04f

12 1月, 2015 4 次提交

xen: check for zero sized area when invalidating memory · 9a17ad7f

由 Juergen Gross 提交于 1月 12, 2015

With the introduction of the linear mapped p2m list setting memory
areas to "invalid" had to be delayed. When doing the invalidation
make sure no zero sized areas are processed.
Signed-off-by: NJuegren Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

9a17ad7f

xen: use correct type for physical addresses · e86f9496

由 Juergen Gross 提交于 1月 12, 2015

When converting a pfn to a physical address be sure to use 64 bit
wide types or convert the physical address to a pfn if possible.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

e86f9496

xen: correct race in alloc_p2m_pmd() · f241b0b8

由 Juergen Gross 提交于 1月 12, 2015

When allocating a new pmd for the linear mapped p2m list a check is
done for not introducing another pmd when this just happened on
another cpu. In this case the old pte pointer was returned which
points to the p2m_missing or p2m_identity page. The correct value
would be the pointer to the found new page.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

f241b0b8

xen: correct error for building p2m list on 32 bits · 82c92ed1

由 Juergen Gross 提交于 1月 12, 2015

In xen_rebuild_p2m_list() for large areas of invalid or identity
mapped memory the pmd entries on 32 bit systems are initialized
wrong. Correct this error.
Suggested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

82c92ed1

09 1月, 2015 4 次提交

perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes · 5306c31c

由 Andi Kleen 提交于 1月 06, 2015

There was another report of a boot failure with a #GP fault in the
uncore SBOX initialization. The earlier work around was not enough
for this system.

The boot was failing while trying to initialize the third SBOX.

This patch detects parts with only two SBOXes and limits the number
of SBOX units to two there.

Stable material, as it affects boot problems on 3.18.
Tested-by: NAndreas Oehler <andreas@oehler-net.de>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1420583675-9163-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

5306c31c

perf/x86_64: Improve user regs sampling · 86c269fe

由 Andy Lutomirski 提交于 1月 04, 2015

Perf reports user regs for kernel-mode samples so that samples can
be backtraced through user code.  The old code was very broken in
syscall context, resulting in useless backtraces.

The new code, in contrast, is still dangerously racy, but it should
at least work most of the time.
Tested-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/243560c26ff0f739978e2459e203f6515367634d.1420396372.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

86c269fe

perf: Move task_pt_regs sampling into arch code · 88a7c26a

由 Andy Lutomirski 提交于 1月 04, 2015

On x86_64, at least, task_pt_regs may be only partially initialized
in many contexts, so x86_64 should not use it without extra care
from interrupt context, let alone NMI context.

This will allow x86_64 to override the logic and will supply some
scratch space to use to make a cleaner copy of user regs.
Tested-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

88a7c26a

x86: Fix off-by-one in instruction decoder · 0f363b25

由 Peter Zijlstra 提交于 12月 16, 2014

Stephane reported that the PEBS fixup was broken by the recent commit to
the instruction decoder. The thing had an off-by-one which resulted in
not being able to decode the last instruction and always bail.
Reported-by: NStephane Eranian <eranian@google.com>
Fixes: 6ba48ff4 ("x86: Remove arbitrary instruction size limit in instruction decoder")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 3.18
Cc: <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20141216104614.GV3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

0f363b25

08 1月, 2015 4 次提交

x86/xen: avoid freeing static 'name' when kasprintf() fails · 7be0772d

由 Vitaly Kuznetsov 提交于 1月 05, 2015

In case kasprintf() fails in xen_setup_timer() we assign name to the
static string "<timer kasprintf failed>". We, however, don't check
that fact before issuing kfree() in xen_teardown_timer(), kernel is
supposed to crash with 'kernel BUG at mm/slub.c:3341!'

Solve the issue by making name a fixed length string inside struct
xen_clock_event_device. 16 bytes should be enough.
Suggested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

7be0772d

x86/xen: add extra memory for remapped frames during setup · a97dae1a

由 David Vrabel 提交于 1月 07, 2015

If the non-RAM regions in the e820 memory map are larger than the size
of the initial balloon, a BUG was triggered as the frames are remaped
beyond the limit of the linear p2m.  The frames are remapped into the
initial balloon area (xen_extra_mem) but not enough of this is
available.

Ensure enough extra memory regions are added for these remapped
frames.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>

a97dae1a

x86/xen: don't count how many PFNs are identity mapped · bc7142cf

由 David Vrabel 提交于 1月 07, 2015

This accounting is just used to print a diagnostic message that isn't
very useful.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>

bc7142cf

x86/xen: Free bootmem in free_p2m_page() during early boot · 701a261a

由 Boris Ostrovsky 提交于 1月 07, 2015

With recent changes in p2m we now have legitimate cases when
p2m memory needs to be freed during early boot (i.e. before
slab is initialized).
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

701a261a

06 1月, 2015 1 次提交

ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu() · d02dc27d

由 Hanjun Guo 提交于 1月 04, 2015

acpi_map_lsapic() will allocate a logical CPU number and map it to
physical CPU id (such as APIC id) for the hot-added CPU, it will also
do some mapping for NUMA node id and etc, acpi_unmap_lsapic() will
do the reverse.

We can see that the name of the function is a little bit confusing and
arch (IA64) dependent so rename them as acpi_(un)map_cpu() to make arch
agnostic and explicit.
Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d02dc27d

05 1月, 2015 2 次提交

crypto: sha-mb - Add avx2_supported check. · 0b8c960c

由 Vinson Lee 提交于 12月 29, 2014

This patch fixes this allyesconfig target build error with older
binutils.

  LD      arch/x86/crypto/built-in.o
ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: NVinson Lee <vlee@twitter.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b8c960c

crypto: aesni - fix "by8" variant for 128 bit keys · 0b1e95b2

由 Mathias Krause 提交于 12月 30, 2014

The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.

The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).

Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.

Kudos to James for reporting the issue and providing a test case
showing the discrepancies.
Reported-by: NJames Yonan <james@openvpn.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b1e95b2

04 1月, 2015 1 次提交

x86, um: actually mark system call tables readonly · b485342b

由 Daniel Borkmann 提交于 1月 03, 2015

Commit a074335a ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

b485342b

28 12月, 2014 2 次提交

kvm: x86: drop severity of "generation wraparound" message · a629df7e

由 Paolo Bonzini 提交于 12月 22, 2014

Since most virtual machines raise this message once, it is a bit annoying.
Make it KERN_DEBUG severity.

Cc: stable@vger.kernel.org
Fixes: 7a2e8aafSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a629df7e

kvm: x86: vmx: reorder some msr writing · baa03522

由 Tiejun Chen 提交于 12月 23, 2014

The commit 34a1cd60, "x86: vmx: move some vmx setting from
vmx_init() to hardware_setup()", tried to refactor some codes
specific to vmx hardware setting into hardware_setup(), but some
msr writing should depend on our previous setting condition like
enable_apicv, enable_ept and so on.
Reported-by: NJamie Heilman <jamie@audible.transient.net>
Tested-by: NJamie Heilman <jamie@audible.transient.net>
Signed-off-by: NTiejun Chen <tiejun.chen@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

baa03522

24 12月, 2014 1 次提交

x86, vdso: Use asm volatile in __getcpu · 1ddf0b1b

由 Andy Lutomirski 提交于 12月 21, 2014

In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly.  For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of

e76b027e x86,vdso: Use LSL unconditionally for vgetcpu

but I don't trust GCC enough to expect the problem to stay fixed.

There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.

Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:

     9c3:       44 0f 03 e8             lsl    %ax,%r13d
     9c7:       45 89 eb                mov    %r13d,%r11d
     9ca:       0f 03 d8                lsl    %ax,%ebx

This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.

Fixes: 51c19b4f x86: vdso: pvclock gettime support
Cc: stable@vger.kernel.org # 3.8+
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>

1ddf0b1b

23 12月, 2014 4 次提交

x86/build: Clean auto-generated processor feature files · 280dbc57

由 Bjørn Mork 提交于 12月 23, 2014

Commit 9def39be ("x86: Support compiling out human-friendly
processor feature names") made two source file targets
conditional. Such conditional targets will not be cleaned
automatically by make mrproper.

Fix by adding explicit clean-files targets for the two files.

Fixes: 9def39be ("x86: Support compiling out human-friendly processor feature names")
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Cc: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/1419335863-10608-1-git-send-email-bjorn@mork.noSigned-off-by: NIngo Molnar <mingo@kernel.org>

280dbc57

x86: Fix mkcapflags.sh bash-ism · ea174f4c

由 Sylvain BERTRAND 提交于 12月 23, 2014

Chocked while compiling linux with dash shell instead of bash
shell. See:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.htmlSigned-off-by: NSylvain BERTRAND <sylvain.bertrand@gmail.com>
Link: http://lkml.kernel.org/r/20141223123912.GA1386@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

ea174f4c

x86: Fix step size adjustment during initial memory mapping · 132978b9

由 Jan Beulich 提交于 12月 19, 2014

The old scheme can lead to failure in certain cases - the
problem is that after bumping step_size the next (non-final)
iteration is only guaranteed to make available a memory block
the size of what step_size was before. E.g. for a memory block
[0,3004600000) we'd have:

 iter	start		end		step		amount
 1	3004400000	30045fffff	 2M		  2M
 2	3004000000	30043fffff	64M		  4M
 3	3000000000	3003ffffff	 2G		 64M
 4	2000000000	2fffffffff	64G		 64G

Yet to map 64G with 4k pages (as happens e.g. under PV Xen) we
need slightly over 128M, but the first three iterations made
only about 70M available.

The condition (new_mapped_ram_size > mapped_ram_size) for
bumping step_size is just not suitable. Instead we want to bump
it when we know we have enough memory available to cover a block
of the new step_size. And rather than making that condition more
complicated than needed, simply adjust step_size by the largest
possible factor we know we can cover at that point - which is
shifting it left by one less than the difference between page
table level shifts. (Interestingly the original STEP_SIZE_SHIFT
definition had a comment hinting at that having been the
intention, just that it should have been PUD_SHIFT-PMD_SHIFT-1
instead of (PUD_SHIFT-PMD_SHIFT)/2, and of course for non-PAE
32-bit we can't really use these two constants as they're equal
there.)

Furthermore the comment in get_new_step_size() didn't get
updated when the bottom-down mapping logic got added. Yet while
an overflow (flushing step_size to zero) of the shift doesn't
matter for the top-down method, it does for bottom-up because
round_up(x, 0) = 0, and an upper range boundary of zero can't
really work well.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/54945C1E020000780005114E@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

132978b9

x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer() · 8b8cd8a3

由 Boris Ostrovsky 提交于 12月 22, 2014

There is no reason for having it and, with commit 250a1ac6 ("x86,
smpboot: Remove pointless preempt_disable() in
native_smp_prepare_cpus()"), it prevents HVM guests from booting.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

8b8cd8a3

21 12月, 2014 1 次提交

x86_64, vdso: Fix the vdso address randomization algorithm · 394f56fe

由 Andy Lutomirski 提交于 12月 19, 2014

The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>

394f56fe

18 12月, 2014 2 次提交

x86/tls: Don't validate lm in set_thread_area() after all · 3fb2f423

由 Andy Lutomirski 提交于 12月 17, 2014

It turns out that there's a lurking ABI issue.  GCC, when
compiling this in a 32-bit program:

struct user_desc desc = {
	.entry_number    = idx,
	.base_addr       = base,
	.limit           = 0xfffff,
	.seg_32bit       = 1,
	.contents        = 0, /* Data, grow-up */
	.read_exec_only  = 0,
	.limit_in_pages  = 1,
	.seg_not_present = 0,
	.useable         = 0,
};

will leave .lm uninitialized.  This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.

Revert the .lm check in set_thread_area().  The value never did
anything in the first place.

Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # Only if 0e58af4e is backported
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

3fb2f423

x86/gup: Replace ACCESS_ONCE with READ_ONCE · 14cf3d97

由 Christian Borntraeger 提交于 11月 21, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the gup code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

14cf3d97