提交 · 69048176078adda4087a648c9b1812ddd800fad1 · openeuler / Kernel

24 5月, 2016 1 次提交

vdso: make arch_setup_additional_pages wait for mmap_sem for write killable · 69048176

由 Michal Hocko 提交于 5月 23, 2016

most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages.  If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving.  Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>	[x86 vdso]
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69048176

13 4月, 2016 1 次提交

x86/vdso: Remove direct HPET access through the vDSO · 1ed95e52

由 Andy Lutomirski 提交于 4月 07, 2016

Allowing user code to map the HPET is problematic.  HPET
implementations are notoriously buggy, and there are probably many
machines on which even MMIO reads from bogus HPET addresses are
problematic.

We have a report that the Dell Precision M2800 with:

  ACPI: HPET 0x00000000C8FE6238 000038 (v01 DELL   CBX3  01072009 AMI. 00000005)

is either so slow when accessing the HPET or actually hangs in some
regard, causing soft lockups to be reported if users do unexpected
things to the HPET.

The vclock HPET code has also always been a questionable speedup.
Accessing an HPET is exceedingly slow (on the order of several
microseconds), so the added overhead in requiring a syscall to read
the HPET is a small fraction of the total code of accessing it.

To avoid future problems, let's just delete the code entirely.

In the long run, this could actually be a speedup.  Waiman Long as a
patch to optimize the case where multiple CPUs contend for the HPET,
but that won't help unless all the accesses are mediated by the
kernel.
Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NBorislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Waiman Long <waiman.long@hpe.com>
Link: http://lkml.kernel.org/r/d2f90bba98db9905041cff294646d290d378f67a.1460074438.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

1ed95e52

30 1月, 2016 2 次提交

x86/vdso: Use static_cpu_has() · 8c725306

由 Borislav Petkov 提交于 1月 26, 2016

... and simplify and speed up a tad.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-10-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

8c725306

x86/cpufeature: Carve out X86_FEATURE_* · cd4d09ec

由 Borislav Petkov 提交于 1月 26, 2016

Move them to a separate header and have the following
dependency:

  x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

cd4d09ec

12 1月, 2016 4 次提交

x86/vdso: Disallow vvar access to vclock IO for never-used vclocks · bd902c53

由 Andy Lutomirski 提交于 12月 29, 2015

It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.

While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track
all the mappings), we can do almost as well by tracking which
vclocks have ever been used and only allowing pages associated
with used vclocks to be faulted in.

This will cause rogue programs that try to peek at the HPET to
get SIGBUS instead on most systems.

We can't restrict faults to vclock pages that are associated
with the currently selected vclock due to a race: a process
could start to access the HPET for the first time and race
against a switch away from the HPET as the current clocksource.
We can't segfault the process trying to peek at the HPET in this
case, even though the process isn't going to do anything useful
with the data.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

bd902c53

x86/vdso: Use ->fault() instead of remap_pfn_range() for the vvar mapping · a48a7042

由 Andy Lutomirski 提交于 12月 29, 2015

This is IMO much less ugly, and it also opens the door to
disallowing unprivileged userspace HPET access on systems with
usable TSCs.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c19c2909e5ee3c3d8742f916586676bb7c40345f.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

a48a7042

x86/vdso: Use .fault for the vDSO text mapping · 05ef76b2

由 Andy Lutomirski 提交于 12月 29, 2015

The old scheme for mapping the vDSO text is rather complicated.
vdso2c generates a struct vm_special_mapping and a blank .pages
array of the correct size for each vdso image.  Init code in
vdso/vma.c populates the .pages array for each vDSO image, and
the mapping code selects the appropriate struct
vm_special_mapping.

With .fault, we can use a less roundabout approach: vdso_fault()
just returns the appropriate page for the selected vDSO image.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

05ef76b2

x86/vdso: Track each mm's loaded vDSO image as well as its base · 352b78c6

由 Andy Lutomirski 提交于 12月 29, 2015

As we start to do more intelligent things with the vDSO at
runtime (as opposed to just at mm initialization time), we'll
need to know which vDSO is in use.

In principle, we could guess based on the mm type, but that's
over-complicated and error-prone.  Instead, just track it in the
mmu context.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

352b78c6

11 12月, 2015 2 次提交

x86/vdso: Remove pvclock fixmap machinery · cc1e24fd

由 Andy Lutomirski 提交于 12月 10, 2015

Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

cc1e24fd

x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap · dac16fba

由 Andy Lutomirski 提交于 12月 10, 2015

Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

dac16fba

07 10月, 2015 1 次提交

x86/vdso: Remove runtime 32-bit vDSO selection · 0a6d1fa0

由 Andy Lutomirski 提交于 10月 05, 2015

32-bit userspace will now always see the same vDSO, which is
exactly what used to be the int80 vDSO.  Subsequent patches will
clean it up and make it support SYSENTER and SYSCALL using
alternatives.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0a6d1fa0

06 7月, 2015 1 次提交

x86/compat: Don't build the 32-bit VDSO if not needed · ab8b82ee

由 Brian Gerst 提交于 6月 22, 2015

Build the 32-bit vdso only for native 32-bit or 32-bit compat is
enabled.  x32 should not force it to build.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-7-git-send-email-brgerst@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ab8b82ee

04 6月, 2015 1 次提交

x86/asm/entry, x86/vdso: Move the vDSO code to arch/x86/entry/vdso/ · d603c8e1

由 Ingo Molnar 提交于 6月 03, 2015

Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

d603c8e1

21 12月, 2014 1 次提交

x86_64, vdso: Fix the vdso address randomization algorithm · 394f56fe

由 Andy Lutomirski 提交于 12月 19, 2014

The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>

394f56fe

02 11月, 2014 1 次提交

x86: vdso: Fix build with older gcc · a92f101b

由 Andrew Morton 提交于 11月 01, 2014

gcc-4.4.4:

arch/x86/vdso/vma.c: In function 'vgetcpu_cpu_init':
arch/x86/vdso/vma.c:247: error: unknown field 'limit0' specified in initializer
arch/x86/vdso/vma.c:247: warning: missing braces around initializer
arch/x86/vdso/vma.c:247: warning: (near initialization for '(anonymous).<anonymous>')
arch/x86/vdso/vma.c:248: error: unknown field 'limit' specified in initializer
arch/x86/vdso/vma.c:248: warning: excess elements in struct initializer
arch/x86/vdso/vma.c:248: warning: (near initialization for '(anonymous)')
....

I couldn't find any way of tricking it into accepting an initializer
format :(
Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Fixes: 25880156 ("x86/vdso: Change the PER_CPU segment to use struct desc_struct")
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a92f101b

28 10月, 2014 5 次提交

x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls · 1c0c1b93

由 Andy Lutomirski 提交于 9月 23, 2014

Now vdso/vma.c has a single initcall and no references to
"vsyscall".
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/945c463e2804fedd8b08d63a040cbe85d55195aa.1411494540.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

1c0c1b93

x86/vdso: Make the PER_CPU segment 32 bits · 287e0131

由 Andy Lutomirski 提交于 9月 23, 2014

IMO users ought not to be able to use 16-bit segments without
using modify_ldt. Fortunately, it's impossible to break
espfix64 by loading the PER_CPU segment into SS because it's
PER_CPU is marked read-only and SS cannot contain an RO segment,
but marking PER_CPU as 32-bit is less fragile.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/179f490d659307873eefd09206bebd417e2ab5ad.1411494540.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

287e0131

x86/vdso: Make the PER_CPU segment start out accessed · 9c0080ef

由 Andy Lutomirski 提交于 9月 23, 2014

The first userspace attempt to read or write the PER_CPU segment
will write the accessed bit to the GDT. This is visible to
userspace using the LAR instruction, and it also pointlessly
dirties a cache line.

Set the segment's accessed bit at boot to prevent userspace
access to segments from having side effects.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/ac63814ca4c637a08ec2fd0360d67ca67560a9ee.1411494540.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

9c0080ef

x86/vdso: Change the PER_CPU segment to use struct desc_struct · 25880156

由 Andy Lutomirski 提交于 9月 23, 2014

This makes it easier to see what's going on. It produces
exactly the same segment descriptor as the old code.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/d492f7b55136cbc60f016adae79160707b2e03b7.1411494540.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

25880156

x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c · d4f829dd

由 Andy Lutomirski 提交于 9月 23, 2014

This is pure cut-and-paste. At this point, vsyscall_64.c
contains only code needed for vsyscall emulation, but some of
the comments and function names are still confused.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a244daf7d3cbe71afc08ad09fdfe1866ca1f1978.1411494540.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

d4f829dd

26 7月, 2014 1 次提交

x86/vdso: Set VM_MAYREAD for the vvar vma · ac379835

由 Andy Lutomirski 提交于 7月 25, 2014

The VVAR area can, obviously, be read; that is kind of the point.

AFAIK this has no effect whatsoever unless x86 suddenly turns into a
nommu architecture. Nonetheless, not setting it is suspicious.
Reported-by: NNathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e4c8bf4bc2725bda22c4a4b7d0c82adcd8f8d9b8.1406330779.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

ac379835

12 7月, 2014 1 次提交

x86, vdso: Move the vvar area before the vdso text · e6577a7c

由 Andy Lutomirski 提交于 7月 10, 2014

Putting the vvar area after the vdso text is rather complicated: it
only works of the total length of the vdso text mapping is known at
vdso link time, and the linker doesn't allow symbol addresses to
depend on the sizes of non-allocatable data after the PT_LOAD
segment.

Moving the vvar area before the vdso text will allow is to safely
map non-allocatable data after the vdso text, which is a nice
simplification.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/156c78c0d93144ff1055a66493783b9e56813983.1405040914.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e6577a7c

11 7月, 2014 1 次提交

x86-32, vdso: Fix vDSO build error due to missing align_vdso_addr() · d093601b

由 Jan Beulich 提交于 7月 03, 2014

Relying on static functions used just once to get inlined (and
subsequently have dead code paths eliminated) is wrong: Compilers are
free to decide whether they do this, regardless of optimization level.
With this not happening for vdso_addr() (observed with gcc 4.1.x), an
unresolved reference to align_vdso_addr() causes the build to fail.

[ hpa: vdso_addr() is never actually used on x86-32, as calculate_addr
  in map_vdso() is always false.  It ought to be possible to clean
  this up further, but this fixes the immediate problem. ]
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/53B5863B02000078000204D5@mail.emea.novell.comAcked-by: NAndy Lutomirski <luto@amacapital.net>
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d093601b

21 5月, 2014 2 次提交

x86, mm: Improve _install_special_mapping and fix x86 vdso naming · a62c34bd

由 Andy Lutomirski 提交于 5月 19, 2014

Using arch_vma_name to give special mappings a name is awkward.  x86
currently implements it by comparing the start address of the vma to
the expected address of the vdso.  This requires tracking the start
address of special mappings and is probably buggy if a special vma
is split or moved.

Improve _install_special_mapping to just name the vma directly.  Use
it to give the x86 vvar area a name, which should make CRIU's life
easier.

As a side effect, the vvar area will show up in core dumps.  This
could be considered weird and is fixable.

[hpa: I say we accept this as-is but be prepared to deal with knocking
 out the vvars from core dumps if this becomes a problem.]

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

a62c34bd

x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET · 1e844fb4

由 Andy Lutomirski 提交于 5月 19, 2014

The oops can be triggered in qemu using -no-hpet (but not nohpet) by
reading a couple of pages past the end of the vdso text.  This
should send SIGBUS instead of OOPSing.

The bug was introduced by:

commit 7a59ed41
Author: Stefani Seibold <stefani@seibold.net>
Date:   Mon Mar 17 23:22:09 2014 +0100

    x86, vdso: Add 32 bit VDSO time support for 32 bit kernel

which is new in 3.15.

This will be fixed separately in 3.15, but that patch will not apply
to tip/x86/vdso.  This is the equivalent fix for tip/x86/vdso and,
presumably, 3.16.

Cc: Stefani Seibold <stefani@seibold.net>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/c8b0a9a0b8d011a8b273cbb2de88d37190ed2751.1400538962.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1e844fb4

06 5月, 2014 3 次提交

x86, vdso: Move the 32-bit vdso special pages after the text · 18d0a6fd

由 Andy Lutomirski 提交于 5月 05, 2014

This unifies the vdso mapping code and teaches it how to map special
pages at addresses corresponding to symbols in the vdso image. The
new code is used for all vdso variants, but so far only the 32-bit
variants use the new vvar page position.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

18d0a6fd

x86, vdso: Reimplement vdso.so preparation in build-time C · 6f121e54

由 Andy Lutomirski 提交于 5月 05, 2014

Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.

All five vdso images now generate .c files that are compiled and
linked in to the kernel image.

This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.

The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.

The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.

(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

6f121e54

x86, vdso: Clean up 32-bit vs 64-bit vdso params · 3d7ee969

由 Andy Lutomirski 提交于 5月 05, 2014

Rather than using 'vdso_enabled' and an awful #define, just call the
parameters vdso32_enabled and vdso64_enabled.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

3d7ee969

21 3月, 2014 2 次提交

x86, vdso: Move more vdso definitions into vdso.h · 9e6f450f

由 Andy Lutomirski 提交于 3月 20, 2014

This fixes the Xen build and gets rid of a silly header file.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1df77311795aff75f5742c787d277518314a38d3.1395366931.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

9e6f450f

x86: Load the 32-bit vdso in place, just like the 64-bit vdsos · b67e612c

由 Andy Lutomirski 提交于 3月 20, 2014

This replaces a decent amount of incomprehensible and buggy code
with much more straightforward code. It also brings the 32-bit vdso
more in line with the 64-bit vdsos, so maybe someday they can share
even more code.

This wastes a small amount of kernel .data and .text space, but it
avoids a couple of allocations on startup, so it should be more or
less a wash memory-wise.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/b8093933fad09ce181edb08a61dcd5d2592e9814.1395352498.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b67e612c

19 3月, 2014 1 次提交

x86, vdso: Patch alternatives in the 32-bit VDSO · b4b541a6

由 Andy Lutomirski 提交于 3月 17, 2014

We need the alternatives mechanism for rdtsc_barrier() to work.
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-9-git-send-email-stefani@seibold.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b4b541a6

12 12月, 2012 1 次提交

mm: use vm_unmapped_area() on x86_64 architecture · f9902472

由 Michel Lespinasse 提交于 12月 11, 2012

Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use
of vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9902472

24 3月, 2012 1 次提交

coredump: remove VM_ALWAYSDUMP flag · 909af768

由 Jason Baron 提交于 3月 23, 2012

The motivation for this patchset was that I was looking at a way for a
qemu-kvm process, to exclude the guest memory from its core dump, which
can be quite large.  There are already a number of filter flags in
/proc/<pid>/coredump_filter, however, these allow one to specify 'types'
of kernel memory, not specific address ranges (which is needed in this
case).

Since there are no more vma flags available, the first patch eliminates
the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
the kernel to mark vdso and vsyscall pages.  However, it is simple
enough to check if a vma covers a vdso or vsyscall page without the need
for this flag.

The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
continue to work the same as before unless 'MADV_DONTDUMP' is set on the
region.

The qemu code which implements this features is at:

  http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch

In my testing the qemu core dump shrunk from 383MB -> 13MB with this
patch.

I also believe that the 'MADV_DONTDUMP' flag might be useful for
security sensitive apps, which might want to select which areas are
dumped.

This patch:

The VM_ALWAYSDUMP flag is currently used by the coredump code to
indicate that a vma is part of a vsyscall or vdso section.  However, we
can determine if a vma is in one these sections by checking it against
the gate_vma and checking for a non-NULL return value from
arch_vma_name().  Thus, freeing a valuable vma bit.
Signed-off-by: NJason Baron <jbaron@redhat.com>
Acked-by: NRoland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

909af768

22 2月, 2012 1 次提交

x32: Fix coding style violations in the x32 VDSO code · 22e842d4

由 H. Peter Anvin 提交于 2月 21, 2012

Move the prototype for x32_setup_additional_pages() to a header file,
and adjust the coding style to match standard.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: H. J. Lu <hjl.tools@gmail.com>

22e842d4

21 2月, 2012 1 次提交

x32: Add x32 VDSO support · 1a21d4e0

由 H. J. Lu 提交于 2月 19, 2012

Add support for the x32 VDSO.  The x32 VDSO takes advantage of the
similarity between the x86-64 and the x32 ABIs to contain the same
content, only the container is different, as the x32 VDSO obviously is
an x32 shared object.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1a21d4e0

06 8月, 2011 1 次提交

x86, amd: Avoid cache aliasing penalties on AMD family 15h · dfb09f9b

由 Borislav Petkov 提交于 8月 05, 2011

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.

This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.

It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.

This change leaves virtual region address allocation on other families
and/or vendors unaffected.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

dfb09f9b

22 7月, 2011 1 次提交

x86-64, vdso: Do not allocate memory for the vDSO · aafade24

由 Andy Lutomirski 提交于 7月 21, 2011

We can map the vDSO straight from kernel data, saving a few page
allocations. As an added bonus, the deleted code contained a memory
leak.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/2c4ed5c2c2e93603790229e0c3403ae506ccc0cb.1311277573.git.luto@mit.eduSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

aafade24

14 7月, 2011 1 次提交

x86-64: Allow alternative patching in the vDSO · 1b3f2a72

由 Andy Lutomirski 提交于 7月 13, 2011

This code is short enough and different enough from the module
loader that it's not worth trying to share anything.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/e73112e4381fff29e31b882c2d0856822edaea53.1310563276.git.luto@mit.eduSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1b3f2a72

24 5月, 2011 1 次提交

x86-64: Clean up vdso/kernel shared variables · 8c49d9a7

由 Andy Lutomirski 提交于 5月 23, 2011

Variables that are shared between the vdso and the kernel are
currently a bit of a mess.  They are each defined with their own
magic, they are accessed differently in the kernel, the vsyscall page,
and the vdso, and one of them (vsyscall_clock) doesn't even really
exist.

This changes them all to use a common mechanism.  All of them are
delcared in vvar.h with a fixed address (validated by the linker
script).  In the kernel (as before), they look like ordinary
read-write variables.  In the vsyscall page and the vdso, they are
accessed through a new macro VVAR, which gives read-only access.

The vdso is now loaded verbatim into memory without any fixups.  As a
side bonus, access from the vdso is faster because a level of
indirection is removed.

While we're at it, pack jiffies and vgetcpu_mode into the same
cacheline.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>

8c49d9a7

03 8月, 2010 1 次提交

x86, vdso: Unmap vdso pages · be783a47

由 Shaohua Li 提交于 8月 02, 2010

We mapped vdso pages but never unmapped them and the virtual address
is lost after exiting from the function, so unmap vdso pages here.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100802004934.GA2505@sli10-desk.sh.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

be783a47

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功