提交 · c6e9d6f38894798696f23c8084ca7edbf16ee895 · openeuler / raspberrypi-kernel

06 8月, 2014 1 次提交

random: introduce getrandom(2) system call · c6e9d6f3

由 Theodore Ts'o 提交于 7月 17, 2014

The getrandom(2) system call was requested by the LibreSSL Portable
developers.  It is analoguous to the getentropy(2) system call in
OpenBSD.

The rationale of this system call is to provide resiliance against
file descriptor exhaustion attacks, where the attacker consumes all
available file descriptors, forcing the use of the fallback code where
/dev/[u]random is not available.  Since the fallback code is often not
well-tested, it is better to eliminate this potential failure mode
entirely.

The other feature provided by this new system call is the ability to
request randomness from the /dev/urandom entropy pool, but to block
until at least 128 bits of entropy has been accumulated in the
/dev/urandom entropy pool.  Historically, the emphasis in the
/dev/urandom development has been to ensure that urandom pool is
initialized as quickly as possible after system boot, and preferably
before the init scripts start execution.

This is because changing /dev/urandom reads to block represents an
interface change that could potentially break userspace which is not
acceptable.  In practice, on most x86 desktop and server systems, in
general the entropy pool can be initialized before it is needed (and
in modern kernels, we will printk a warning message if not).  However,
on an embedded system, this may not be the case.  And so with this new
interface, we can provide the functionality of blocking until the
urandom pool has been initialized.  Any userspace program which uses
this new functionality must take care to assure that if it is used
during the boot process, that it will not cause the init scripts or
other portions of the system startup to hang indefinitely.

SYNOPSIS
	#include <linux/random.h>

	int getrandom(void *buf, size_t buflen, unsigned int flags);

DESCRIPTION
	The system call getrandom() fills the buffer pointed to by buf
	with up to buflen random bytes which can be used to seed user
	space random number generators (i.e., DRBG's) or for other
	cryptographic uses.  It should not be used for Monte Carlo
	simulations or other programs/algorithms which are doing
	probabilistic sampling.

	If the GRND_RANDOM flags bit is set, then draw from the
	/dev/random pool instead of the /dev/urandom pool.  The
	/dev/random pool is limited based on the entropy that can be
	obtained from environmental noise, so if there is insufficient
	entropy, the requested number of bytes may not be returned.
	If there is no entropy available at all, getrandom(2) will
	either block, or return an error with errno set to EAGAIN if
	the GRND_NONBLOCK bit is set in flags.

	If the GRND_RANDOM bit is not set, then the /dev/urandom pool
	will be used.  Unlike using read(2) to fetch data from
	/dev/urandom, if the urandom pool has not been sufficiently
	initialized, getrandom(2) will block (or return -1 with the
	errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).

	The getentropy(2) system call in OpenBSD can be emulated using
	the following function:

            int getentropy(void *buf, size_t buflen)
            {
                    int     ret;

                    if (buflen > 256)
                            goto failure;
                    ret = getrandom(buf, buflen, 0);
                    if (ret < 0)
                            return ret;
                    if (ret == buflen)
                            return 0;
            failure:
                    errno = EIO;
                    return -1;
            }

RETURN VALUE
       On success, the number of bytes that was filled in the buf is
       returned.  This may not be all the bytes requested by the
       caller via buflen if insufficient entropy was present in the
       /dev/random pool, or if the system call was interrupted by a
       signal.

       On error, -1 is returned, and errno is set appropriately.

ERRORS
	EINVAL		An invalid flag was passed to getrandom(2)

	EFAULT		buf is outside the accessible address space.

	EAGAIN		The requested entropy was not available, and
			getentropy(2) would have blocked if the
			GRND_NONBLOCK flag was not set.

	EINTR		While blocked waiting for entropy, the call was
			interrupted by a signal handler; see the description
			of how interrupted read(2) calls on "slow" devices
			are handled with and without the SA_RESTART flag
			in the signal(7) man page.

NOTES
	For small requests (buflen <= 256) getrandom(2) will not
	return EINTR when reading from the urandom pool once the
	entropy pool has been initialized, and it will return all of
	the bytes that have been requested.  This is the recommended
	way to use getrandom(2), and is designed for compatibility
	with OpenBSD's getentropy() system call.

	However, if you are using GRND_RANDOM, then getrandom(2) may
	block until the entropy accounting determines that sufficient
	environmental noise has been gathered such that getrandom(2)
	will be operating as a NRBG instead of a DRBG for those people
	who are working in the NIST SP 800-90 regime.  Since it may
	block for a long time, these guarantees do *not* apply.  The
	user may want to interrupt a hanging process using a signal,
	so blocking until all of the requested bytes are returned
	would be unfriendly.

	For this reason, the user of getrandom(2) MUST always check
	the return value, in case it returns some error, or if fewer
	bytes than requested was returned.  In the case of
	!GRND_RANDOM and small request, the latter should never
	happen, but the careful userspace code (and all crypto code
	should be careful) should check for this anyway!

	Finally, unless you are doing long-term key generation (and
	perhaps not even then), you probably shouldn't be using
	GRND_RANDOM.  The cryptographic algorithms used for
	/dev/urandom are quite conservative, and so should be
	sufficient for all purposes.  The disadvantage of GRND_RANDOM
	is that it can block, and the increased complexity required to
	deal with partially fulfilled getrandom(2) requests.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NZach Brown <zab@zabbo.net>

c6e9d6f3

11 7月, 2014 2 次提交

x86-32, vdso: Fix vDSO build error due to missing align_vdso_addr() · d093601b

由 Jan Beulich 提交于 7月 03, 2014

Relying on static functions used just once to get inlined (and
subsequently have dead code paths eliminated) is wrong: Compilers are
free to decide whether they do this, regardless of optimization level.
With this not happening for vdso_addr() (observed with gcc 4.1.x), an
unresolved reference to align_vdso_addr() causes the build to fail.

[ hpa: vdso_addr() is never actually used on x86-32, as calculate_addr
  in map_vdso() is always false.  It ought to be possible to clean
  this up further, but this fixes the immediate problem. ]
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/53B5863B02000078000204D5@mail.emea.novell.comAcked-by: NAndy Lutomirski <luto@amacapital.net>
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d093601b

x86-64, vdso: Fix vDSO build breakage due to empty .rela.dyn · 9f88b906

由 Jan Beulich 提交于 7月 03, 2014

Certain ld versions (observed with 2.20.0) put an empty .rela.dyn
section into shared object files, breaking the assumption on the number
of sections to be copied to the final output. Simply discard any empty
SHT_REL and SHT_RELA sections to address this.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/53B5861E02000078000204D1@mail.emea.novell.comAcked-by: NAndy Lutomirski <luto@amacapital.net>
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9f88b906

04 7月, 2014 1 次提交

ptrace,x86: force IRET path after a ptrace_stop() · b9cd18de

由 Tejun Heo 提交于 7月 03, 2014

The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values.  That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'.  Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9cd18de

30 6月, 2014 1 次提交

KVM: SVM: Fix CPL export via SS.DPL · 33b458d2

由 Jan Kiszka 提交于 6月 29, 2014

We import the CPL via SS.DPL since ae9fedc7. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33b458d2

25 6月, 2014 3 次提交

crypto: sha512_ssse3 - fix byte count to bit count conversion · cfe82d4f

由 Jussi Kivilinna 提交于 6月 23, 2014

Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:

CHECK arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long

Cc: <stable@vger.kernel.org>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

cfe82d4f

x86/vdso: Error out in vdso2c if DT_RELA is present · 6a89d710

由 Andy Lutomirski 提交于 6月 24, 2014

vdso2c was checking for various types of relocations to detect when
the vdso had undefined symbols or was otherwise dependent on
relocation at load time.  Undefined symbols in the vdso would fail if
accessed at runtime, and certain implementation errors (e.g. branch
profiling or incorrect symbol visibilities) could result in data
access through the GOT that requires relocations.  This could be
as simple as:

    extern char foo;
    return foo;

Without some kind of visibility control, the compiler would assume
that foo could be interposed at load time and would generate a
relocation.

x86-64 and x32 (as opposed to i386) use explicit-addent (RELA) instead
of implicit-addent (REL) relocations for data access, and vdso2c
forgot to detect those.

Whether these bad relocations would actually fail at runtime depends
on what the linker sticks in the unrelocated references.  Nonetheless,
these relocations have no business existing in the vDSO and should be
fixed rather than silently ignored.

This error could trigger on some configurations due to branch
profiling.  The previous patch fixed that.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/74ef0c00b4d2a3b573e00a4113874e62f772e348.1403642755.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

6a89d710

x86/vdso: Move DISABLE_BRANCH_PROFILING into the vdso makefile · 46b57a76

由 Andy Lutomirski 提交于 6月 24, 2014

DISABLE_BRANCH_PROFILING turns off branch profiling (i.e. a
redefinition of 'if'). Branch profiling depends on a bunch of
kernel-internal symbols and generates extra output sections, none of
which are useful or functional in the vDSO.

It's currently turned off for vclock_gettime.c, but vgetcpu.c also
triggers branch profiling, so just turn it off in the makefile.

This fixes the build on some configurations: the vdso could contain
undefined symbols, and the fake section table overflowed due to
ftrace's added sections.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/bf1ec29e03b2bbc081f6dcaefa64db1c3a83fb21.1403642755.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

46b57a76

24 6月, 2014 3 次提交

nmi: provide the option to issue an NMI back trace to every cpu but current · f3aca3d0

由 Aaron Tomlin 提交于 6月 23, 2014

Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current.  For
instance if one was previously captured recently.

This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.

Patch x86 and sparc to support new routine.

[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism]
Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3aca3d0

x86_32, signal: Fix vdso rt_sigreturn · 6ba19a67

由 Andy Lutomirski 提交于 6月 21, 2014

This commit:

    commit 6f121e54
    Author: Andy Lutomirski <luto@amacapital.net>
    Date:   Mon May 5 12:19:34 2014 -0700

        x86, vdso: Reimplement vdso.so preparation in build-time C

Contained this obvious typo:

-               restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn);
+               restorer = current->mm->context.vdso +
+                       selected_vdso32->sym___kernel_sigreturn;

Note the missing 'rt_' in the new code.  Fix it.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1eb40ad923acde2e18357ef2832867432e70ac42.1403361010.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

6ba19a67

x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508) · 554086d8

由 Andy Lutomirski 提交于 6月 23, 2014

The bad syscall nr paths are their own incomprehensible route
through the entry control flow.  Rearrange them to work just like
syscalls that return -ENOSYS.

This fixes an OOPS in the audit code when fast-path auditing is
enabled and sysenter gets a bad syscall nr (CVE-2014-4508).

This has probably been broken since Linux 2.6.27:
af0575bb i386 syscall audit fast-path

Cc: stable@vger.kernel.org
Cc: Roland McGrath <roland@redhat.com>
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

554086d8

21 6月, 2014 1 次提交

x86/vdso: Create .build-id links for unstripped vdso files · dda1e95c

由 Andy Lutomirski 提交于 6月 20, 2014

With this change, doing 'make vdso_install' and telling gdb:

set debug-file-directory /lib/modules/KVER/vdso

will enable vdso debugging with symbols.  This is useful for
testing, but kernel RPM builds will probably want to manually delete
these symlinks or otherwise do something sensible when they strip
the vdso/*.so files.

If ld does not support --build-id, then the symlinks will not be
created.

Note that kernel packagers that use vdso_install may need to adjust
their packaging scripts to accomdate this change.  For example,
Fedora's scripts create build-id symlinks themselves in a different
location, so the spec should probably be updated to remove the
symlinks created by make vdso_install.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a424b189ce3ced85fe1e82d032a20e765e0fe0d3.1403291930.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

dda1e95c

20 6月, 2014 4 次提交

x86/vdso: Remove some redundant in-memory section headers · 0e3727a8

由 Andy Lutomirski 提交于 6月 18, 2014

.data doesn't need to be separate from .rodata: they're both readonly.

.altinstructions and .altinstr_replacement aren't needed by anything
except vdso2c; strip them from the final image.

While we're at it, rather than aligning the actual executable text,
just shove some unused-at-runtime data in between real data and
text.

My vdso image is still above 4k, but I'm disinclined to try to
trim it harder for 3.16.  For future trimming, I suspect that these
sections could be moved to later in the file and dropped from
the in-memory image:

.gnu.version and .gnu.version_d   (this may lose versions in gdb)
.eh_frame                         (should be harmless)
.eh_frame_hdr                     (I'm not really sure)
.hash                             (AFAIK nothing needs this section header)
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2e96d0c49016ea6d026a614ae645e93edd325961.1403129369.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

0e3727a8

x86/vdso: Improve the fake section headers · bfad381c

由 Andy Lutomirski 提交于 6月 18, 2014

Fully stripping the vDSO has other unfortunate side effects:

 - binutils is unable to find ELF notes without a SHT_NOTE section.

 - Even elfutils has trouble: it can find ELF notes without a section
   table at all, but if a section table is present, it won't look for
   PT_NOTE.

 - gdb wants section names to match between stripped DSOs and their
   symbols; otherwise it will corrupt symbol addresses.

We're also breaking the rules: section 0 is supposed to be SHT_NULL.

Fix these problems by building a better fake section table.  While
we're at it, we might as well let buggy Go versions keep working well
by giving the SHT_DYNSYM entry the correct size.

This is a bit unfortunate: it adds quite a bit of size to the vdso
image.

If/when binutils improves and the improved versions become widespread,
it would be worth considering dropping most of this.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/0e546a5eeaafdf1840e6ee654a55c1e727c26663.1403129369.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

bfad381c

x86/vdso2c: Use better macros for ELF bitness · c1979c37

由 Andy Lutomirski 提交于 6月 18, 2014

Rather than using a separate macro for each replacement, use generic
macros.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/d953cd2e70ceee1400985d091188cdd65fba2f05.1403129369.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c1979c37

x86/vdso: Discard the __bug_table section · 5f56e716

由 Andy Lutomirski 提交于 6月 18, 2014

It serves no purpose in user code.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2a5bebff42defd8a5e81d96f7dc00f21143c80e8.1403129369.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

5f56e716

19 6月, 2014 3 次提交

KVM: x86: preserve the high 32-bits of the PAT register · 7cb060a9

由 Paolo Bonzini 提交于 6月 19, 2014

KVM does not really do much with the PAT, so this went unnoticed for a
long time.  It is exposed however if you try to do rdmsr on the PAT
register.
Reported-by: NValentine Sinitsyn <valentine.sinitsyn@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cb060a9

kvm: fix wrong address when writing Hyper-V tsc page · e1fa108d

由 Xiaoming Gao 提交于 6月 19, 2014

When kvm_write_guest writes the tsc_ref structure to the guest, or it will lead
the low HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT bits of the TSC page address
must be cleared, or the guest can see a non-zero sequence number.

Otherwise Windows guests would not be able to get a correct clocksource
(QueryPerformanceCounter will always return 0) which causes serious chaos.
Signed-off-by: NXiaoming Gao <newtongao@tencnet.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e1fa108d

KVM: x86: Increase the number of fixed MTRR regs to 10 · 682367c4

由 Nadav Amit 提交于 6月 18, 2014

Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

682367c4

18 6月, 2014 1 次提交

x86/xen: no need to explicitly register an NMI callback · ea9f9274

由 David Vrabel 提交于 6月 16, 2014

Remove xen_enable_nmi() to fix a 64-bit guest crash when registering
the NMI callback on Xen 3.1 and earlier.

It's not needed since the NMI callback is set by a set_trap_table
hypercall (in xen_load_idt() or xen_write_idt_entry()).

It's also broken since it only set the current VCPU's callback.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>

ea9f9274

17 6月, 2014 1 次提交

x86, kaslr: boot-time selectable with hibernation · 24f2e027

由 Kees Cook 提交于 6月 13, 2014

Changes kASLR from being compile-time selectable (blocked by
CONFIG_HIBERNATION), to being boot-time selectable (with hibernation
available by default) via the "kaslr" kernel command line.
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

24f2e027

14 6月, 2014 2 次提交

x86/kprobes: Fix build errors and blacklist context_track_user · 4cdf77a8

由 Masami Hiramatsu 提交于 6月 14, 2014

This essentially reverts commit:

  ecd50f71 ("kprobes, x86: Call exception_enter after kprobes handled")

since it causes build errors with CONFIG_CONTEXT_TRACKING and
that has been made from misunderstandings;
context_track_user_*() don't involve much in interrupt context,
it just returns if in_interrupt() is true.

Instead of changing the do_debug/int3(), this just adds
context_track_user_*() to kprobes blacklist, since those are
still can be called right before kprobes handles int3 and debug
exceptions, and probing those will cause an infinite loop.
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20140614064711.7865.45957.stgit@kbuild-fedora.novalocalSigned-off-by: NIngo Molnar <mingo@kernel.org>

4cdf77a8

x86/vdso: Fix vdso_install · a934fb5b

由 Andy Lutomirski 提交于 6月 12, 2014

"make vdso_install" installs unstripped versions of the vdso objects
for the benefit of the debugger. This was broken by checkin:

6f121e54 x86, vdso: Reimplement vdso.so preparation in build-time C

The filenames are different now, so update the Makefile to cope.

This still installs the 64-bit vdso as vdso64.so. We believe this
will be okay, as the only known user is a patched gdb which is known
to use build-ids, but if it turns out to be a problem we may have to
add a link.

Inspired by a patch from Sam Ravnborg.
Acked-by: NSam Ravnborg <sam@ravnborg.org>
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Tested-by: NJosh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b10299edd8ba98d17e07dafcd895b8ecf4d99eff.1402586707.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

a934fb5b

13 6月, 2014 2 次提交

x86/vdso: Hack to keep 64-bit Go programs working · e0bf7b86

由 Andy Lutomirski 提交于 6月 12, 2014

The Go runtime has a buggy vDSO parser that currently segfaults.
This writes an empty SHT_DYNSYM entry that causes Go's runtime to
malfunction by thinking that the vDSO is empty rather than
malfunctioning by running off the end and segfaulting.

This affects x86-64 only as far as we know, so we do not need this for
the i386 and x32 vdsos.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/d10618176c4bd39b457a5e85c497295c90cab1bc.1402620737.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

e0bf7b86

x86/vdso: Add PUT_LE to store little-endian values · b4b31f61

由 Andy Lutomirski 提交于 6月 12, 2014

Add PUT_LE() by analogy with GET_LE() to write littleendian values in
addition to reading them.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/3d9b27e92745b27b6fda1b9a98f70dc9c1246c7a.1402620737.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

b4b31f61

11 6月, 2014 3 次提交

net: filter: cleanup A/X name usage · e430f34e

由 Alexei Starovoitov 提交于 6月 06, 2014

The macro 'A' used in internal BPF interpreter:
 #define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.

This patch is trying to clean up the naming and clarify its usage in the
following way:

- A and X are names of two classic BPF registers

- BPF_REG_A denotes internal BPF register R0 used to map classic register A
  in internal BPF programs generated from classic

- BPF_REG_X denotes internal BPF register R7 used to map classic register X
  in internal BPF programs generated from classic

- internal BPF instruction format:
struct sock_filter_int {
        __u8    code;           /* opcode */
        __u8    dst_reg:4;      /* dest register */
        __u8    src_reg:4;      /* source register */
        __s16   off;            /* signed offset */
        __s32   imm;            /* signed immediate constant */
};

- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
  BPF_X - means use register X as source operand
  BPF_K - means use 32-bit immediate as source operand
In internal:
  BPF_X - means use 'src_reg' register as source operand
  BPF_K - means use 32-bit immediate as source operand
Suggested-by: NChema Gonzalez <chema@google.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NChema Gonzalez <chema@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e430f34e

x86, vdso: Remove one final use of htole16() · 4d048b02

由 H. Peter Anvin 提交于 6月 10, 2014

One final use of the macros from <endian.h> which are not available on
older system.  In this case we had one sole case of *writing* a
littleendian number, but the number is SHN_UNDEF which is the constant
zero, so rather than dealing with the general case of littleendian
puts here, just document that the constant is zero and be done with
it.
Reported-and-Tested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/20140610135051.c3c34165f73d67d218b62bd9@linux-foundation.org

4d048b02

x86: intel-mid: add watchdog platform code for Merrifield · 78a3bb9e

由 David Cohen 提交于 4月 15, 2014

This patch adds platform code for Intel Merrifield.
Since the watchdog is not part of SFI table, we have no other option but
to manually register watchdog's platform device (argh!).
Signed-off-by: NDavid Cohen <david.a.cohen@linux.intel.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

78a3bb9e

09 6月, 2014 1 次提交

Revert "x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" · bb077d60

由 Linus Torvalds 提交于 6月 08, 2014

This reverts commit 3e1a878b.

It came in very late, and already has one reported failure: Sitsofe
reports that the current tree fails to boot on his EeePC, and bisected
it down to this.  Rather than waste time trying to figure out what's
wrong, just revert it.
Reported-by: NSitsofe Wheeler <sitsofe@gmail.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb077d60

08 6月, 2014 1 次提交

x86/boot: EFI_MIXED should not prohibit loading above 4G · 745c5167

由 Matt Fleming 提交于 6月 07, 2014

commit 7d453eee ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
regression for the functionality to load kernels above 4G. The relevant
(incorrect) reasoning behind this change can be seen in the commit
message,

  "The xloadflags field in the bzImage header is also updated to reflect
  that the kernel supports both entry points by setting both of
  XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
  XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
  guaranteed to be addressable with 32-bits."

This is obviously bogus since 32-bit EFI loaders will never place the
kernel above the 4G mark. So this restriction is entirely unnecessary.

But things are worse than that - since we want to encourage people to
always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
the box for both 32-bit and 64-bit firmware, commit 7d453eee
effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.

Remove the overzealous and superfluous restriction and restore the
XLF_CAN_BE_LOADED_ABOVE_4G functionality.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

745c5167

07 6月, 2014 2 次提交

signals: kill sigfindinword() · 36fac0a2

由 Oleg Nesterov 提交于 6月 06, 2014

It has no users and it doesn't look useful.  I do not know why/when it was
introduced, I can't even find any user in the git history.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36fac0a2

x86, vdso: Use <tools/le_byteshift.h> for littleendian access · bdfb9bcc

由 H. Peter Anvin 提交于 6月 06, 2014

There are no standard functions for littleendian data (unlike
bigendian data.)  Thus, use <tools/le_byteshift.h> to access
littleendian data members.  Those are fairly inefficient, but it
doesn't matter for this purpose (and can be optimized later.)  This
avoids portability problems.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Tested-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/20140606140017.afb7f91142f66cb3dd13c186@linux-foundation.org

bdfb9bcc

06 6月, 2014 1 次提交

x86, locking/rwlocks: Enable qrwlocks on x86 · bd01ec1a

由 Waiman Long 提交于 2月 03, 2014

Make x86 use the fair rwlock_t.

Implement the custom queue_write_unlock() for best performance.
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
[peterz: near complete rewrite]
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-r1xuzmdysvuhl3h86n5fbxi7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

bd01ec1a

05 6月, 2014 7 次提交

x86/smpboot: Initialize secondary CPU only if master CPU will wait for it · 3e1a878b

由 Igor Mammedov 提交于 6月 05, 2014

Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

   https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Acked-by: NToshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3e1a878b

x86/smpboot: Log error on secondary CPU wakeup failure at ERR level · feef1e8e

由 Igor Mammedov 提交于 6月 05, 2014

If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Acked-by: NToshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

feef1e8e

x86: Fix list/memory corruption on CPU hotplug · 89f898c1

由 Igor Mammedov 提交于 6月 05, 2014

currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[  418.178445] Call Trace:
[  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
[  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
[  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
[  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

 cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.

Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.

Also fix memory corruption in acpi_unmap_lsapic()

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Acked-by: NToshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

89f898c1

uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates · 5cdb76d6

由 Oleg Nesterov 提交于 6月 01, 2014

Purely cosmetic, no changes in .o,

1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to
   ->defparam.

2. Add the comment into default_post_xol_op() to explain "regs->sp +=".

3. Remove the stale part of the comment in arch_uprobe_analyze_insn().
Suggested-by: NJim Keniston <jkenisto@us.ibm.com>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

5cdb76d6

Revert "xen/pvh: Update E820 to work with PVH (v2)" · 562658f3

由 David Vrabel 提交于 6月 02, 2014

This reverts commit 9103bb0f.

Now than xen_memory_setup() is not called for auto-translated guests,
we can remove this commit.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Tested-by: NRoger Pau Monné <roger.pau@citrix.com>

562658f3

x86/xen: fix memory setup for PVH dom0 · abacaadc

由 David Vrabel 提交于 6月 02, 2014

Since af06d66ee32b (x86: fix setup of PVH Dom0 memory map) in Xen, PVH
dom0 need only use the memory memory provided by Xen which has already
setup all the correct holes.

xen_memory_setup() then ends up being trivial for a PVH guest so
introduce a new function (xen_auto_xlated_memory_setup()).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Tested-by: NRoger Pau Monné <roger.pau@citrix.com>

abacaadc

perf/x86: Add conditional branch filtering support · 37548914

由 Anshuman Khandual 提交于 5月 22, 2014

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: mpe@ellerman.id.au
Cc: benh@kernel.crashing.org
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1400743210-32289-3-git-send-email-khandual@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

37548914