提交 · 9026843952adac5b123c7b8dc961e5c15828d9e1 · openeuler / raspberrypi-kernel

20 12月, 2012 1 次提交

generic compat_sys_sigaltstack() · 90268439

由 Al Viro 提交于 12月 14, 2012

Again, conditional on CONFIG_GENERIC_SIGALTSTACK
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

90268439

29 11月, 2012 2 次提交
- A
  x86, um: switch to generic fork/vfork/clone · 1d4b4b29
  由 Al Viro 提交于 10月 22, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1d4b4b29
- A
  get rid of pt_regs argument of ->load_binary() · 71613c3b
  由 Al Viro 提交于 10月 20, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  71613c3b
01 10月, 2012 1 次提交

x86, um/x86: switch to generic sys_execve and kernel_execve · 6783eaa2

由 Al Viro 提交于 8月 02, 2012

32bit wrapper is lost on that; 64bit one is *not*, since
we need to arrange for full pt_regs on stack when we call
sys_execve() and we need to load callee-saved ones from
there afterwards.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6783eaa2

22 9月, 2012 2 次提交

x86, smap: Reduce the SMAP overhead for signal handling · 5e88353d

由 H. Peter Anvin 提交于 9月 21, 2012

Signal handling contains a bunch of accesses to individual user space
items, which causes an excessive number of STAC and CLAC
instructions.  Instead, let get/put_user_try ... get/put_user_catch()
contain the STAC and CLAC instructions.

This means that get/put_user_try no longer nests, and furthermore that
it is no longer legal to use user space access functions other than
__get/put_user_ex() inside those blocks.  However, these macros are
x86-specific anyway and are only used in the signal-handling paths; a
simple reordering of moving the larger subroutine calls out of the
try...catch blocks resolves that problem.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-12-git-send-email-hpa@linux.intel.com

5e88353d

x86, smap: Add STAC and CLAC instructions to control user space access · 63bcff2a

由 H. Peter Anvin 提交于 9月 21, 2012

When Supervisor Mode Access Prevention (SMAP) is enabled, access to
userspace from the kernel is controlled by the AC flag.  To make the
performance of manipulating that flag acceptable, there are two new
instructions, STAC and CLAC, to set and clear it.

This patch adds those instructions, via alternative(), when the SMAP
feature is enabled.  It also adds X86_EFLAGS_AC unconditionally to the
SYSCALL entry mask; there is simply no reason to make that one
conditional.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com

63bcff2a

19 9月, 2012 1 次提交

x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels · 72a671ce

由 Suresh Siddha 提交于 7月 24, 2012

Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.

And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.

Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.

Unify signal handling code paths for x86 and x86_64 kernels.

New strategy is as follows:

Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.

Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.

"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

72a671ce

05 9月, 2012 2 次提交

x86/signals: ia32_signal.c: add __user casts to fix sparse warnings · 0ff8fef4

由 Mathias Krause 提交于 9月 02, 2012

Fix the following sparse warnings by adding appropriate __user
casts and annotations:

  ia32_signal.c:165:38: warning: incorrect type in argument 1 (different address spaces)
   ia32_signal.c:165:38:    expected struct sigaltstack const [noderef] [usertype] <asn:1>*<noident>
  ia32_signal.c:165:38:    got struct sigaltstack *
  [...]
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1346621506-30857-4-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0ff8fef4

x86: Fix __user annotations in asm/sys_ia32.h · f0002627

由 Mathias Krause 提交于 9月 02, 2012

Fix the following sparse warnings:

sys_ia32.c:293:38: warning: incorrect type in argument 2 (different address spaces)
sys_ia32.c:293:38: expected unsigned int [noderef] [usertype] <asn:1>*stat_addr
sys_ia32.c:293:38: got unsigned int *stat_addr

Ironically, sys_ia32.h was introduced to fix sparse warnings but
missed that one.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/1346621506-30857-2-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f0002627

15 6月, 2012 1 次提交

x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery · 0b91f45b

由 Suresh Siddha 提交于 6月 14, 2012

Signal delivery compat path may not have the 'TS_COMPAT' flag (that
flag indicates how we entered the kernel).  So use
test_thread_flag(TIF_IA32) instead of is_ia32_task(): one of the
functions of TIF_IA32 is just what kind of signal frame we want.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1339722435.3475.57.camel@sbsiddha-desk.sc.intel.com
Cc: stable@kernel.org	# v3.4
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

0b91f45b

02 6月, 2012 1 次提交

most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set · 77097ae5

由 Al Viro 提交于 4月 27, 2012

Only 3 out of 63 do not.  Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77097ae5

22 5月, 2012 1 次提交

new helper: sigsuspend() · 68f3f16d

由 Al Viro 提交于 5月 21, 2012

guts of saved_sigmask-based sigsuspend/rt_sigsuspend.  Takes
kernel sigset_t *.

Open-coded instances replaced with calling it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

68f3f16d

16 5月, 2012 1 次提交

userns: Convert stat to return values mapped from kuids and kgids · a7c1938e

由 Eric W. Biederman 提交于 2月 09, 2012

- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

a7c1938e

08 5月, 2012 1 次提交

x86-64: Eliminate dead ia32 syscall handlers · fba60c62

由 Jan Beulich 提交于 5月 08, 2012

None of the three routines being removed here was actually
hooked up anywhere, so they all represented dead code.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FA947FE020000780008247F@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

fba60c62

07 5月, 2012 1 次提交

x86: fix broken TASK_SIZE for ia32_aout · ce7e5d2d

由 Al Viro 提交于 5月 06, 2012

Setting TIF_IA32 in load_aout_binary() used to be enough; these days
TASK_SIZE is controlled by TIF_ADDR32 and that one doesn't get set
there.  Switch to use of set_personality_ia32()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ce7e5d2d

21 4月, 2012 4 次提交

VM: add "vm_mmap()" helper function · 6be5ceb0

由 Linus Torvalds 提交于 4月 20, 2012

This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.

This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c.  But that way we don't have
to export our internal do_mmap_pgoff() function.

Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead.  We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6be5ceb0

VM: add "vm_brk()" helper function · e4eb1ff6

由 Linus Torvalds 提交于 4月 20, 2012

It does the same thing as "do_brk()", except it handles the VM locking
too.

It turns out that all external callers want that anyway, so we can make
do_brk() static to just mm/mmap.c while at it.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e4eb1ff6

x86, extable: Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S · a3e859fe

由 H. Peter Anvin 提交于 4月 20, 2012

Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S,
and replace them with _ASM_EXTABLE() macros; this will allow us to
change the format and type of the exception table entries.

This one was missed from the previous patch to this file.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: David Daney <david.daney@cavium.com>
Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com

a3e859fe

x86, extable: Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S · 1ce6f868

由 H. Peter Anvin 提交于 4月 20, 2012

Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S,
and replace them with _ASM_EXTABLE() macros; this will allow us to
change the format and type of the exception table entries.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: David Daney <david.daney@cavium.com>
Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com

1ce6f868

14 4月, 2012 1 次提交

signal, x86: add SIGSYS info and make it synchronous. · a0727e8c

由 Will Drewry 提交于 4月 12, 2012

This change enables SIGSYS, defines _sigfields._sigsys, and adds
x86 (compat) arch support.  _sigsys defines fields which allow
a signal handler to receive the triggering system call number,
the relevant AUDIT_ARCH_* value for that number, and the address
of the callsite.

SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
to have setup_frame() called for it. The goal is to ensure that
ucontext_t reflects the machine state from the time-of-syscall and not
from another signal handler.

The first consumer of SIGSYS would be seccomp filter.  In particular,
a filter program could specify a new return value, SECCOMP_RET_TRAP,
which would result in the system call being denied and the calling
thread signaled.  This also means that implementing arch-specific
support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NEric Paris <eparis@redhat.com>

v18: - added acked by, rebase
v17: - rebase and reviewed-by addition
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6
v12: - reworded changelog (oleg@redhat.com)
v11: - fix dropped words in the change description
     - added fallback copy_siginfo support.
     - added __ARCH_SIGSYS define to allow stepped arch support.
v10: - first version based on suggestion
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

a0727e8c

29 3月, 2012 1 次提交

Disintegrate asm/system.h for X86 · f05e798a

由 David Howells 提交于 3月 28, 2012

Disintegrate asm/system.h for X86.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
cc: x86@kernel.org

f05e798a

21 3月, 2012 2 次提交
- A
  take removal of PF_FORKNOEXEC to flush_old_exec() · 19e5109f
  由 Al Viro 提交于 2月 23, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  19e5109f
- A
  __register_binfmt() made void · 8fc3dc5a
  由 Al Viro 提交于 3月 17, 2012
```
Just don't pass NULL to it - nobody does, anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  8fc3dc5a
14 3月, 2012 1 次提交

x32: Fix stupid ia32/x32 inversion in the siginfo format · bb6fa8b2

由 H. Peter Anvin 提交于 3月 13, 2012

Fix a stray ! which flipped the sense if we were generating a signal
frame for ia32 vs. x32.

Introduced in:

e7084fd5 x32: Switch to a 64-bit clock_t
Reported-by: NH. J. Lu <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Gregory M. Lueck <gregory.m.lueck@intel.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com

bb6fa8b2

13 3月, 2012 1 次提交

x86: Rename trap_no to trap_nr in thread_struct · 51e7dc70

由 Srikar Dronamraju 提交于 3月 12, 2012

There are precedences of trap number being referred to as
trap_nr. However thread struct refers trap number as trap_no.
Change it to trap_nr.

Also use enum instead of left-over literals for trap values.

This is pure cleanup, no functional change intended.
Suggested-by: NIngo Molnar <mingo@eltu.hu>
Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com
[ Fixed the math-emu build ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

51e7dc70

06 3月, 2012 2 次提交

x32: Switch to a 64-bit clock_t · e7084fd5

由 H. Peter Anvin 提交于 3月 05, 2012

clock_t is used mainly to give the number of jiffies a certain process
has burned.  It is entirely feasible for a long-running process to
consume more than 2^32 jiffies especially in a multiprocess system.
As such, switch to a 64-bit clock_t for x32, just as we already
switched to a 64-bit time_t.

clock_t is only used in a handful of places, and as such it is really
not a very significant change.  The one that has the biggest impact is
in struct siginfo, but since the *size* of struct siginfo doesn't
change (it is padded to the hilt) it is fairly easy to make this a
localized change.

This also gets rid of sys_x32_times, however since this is a pretty
late change don't compactify the system call numbers; we can reuse
system call slot 521 next time we need an x32 system call.
Reported-by: NGregory M. Lueck <gregory.m.lueck@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com

e7084fd5

aout: move setup_arg_pages() prior to reading/mapping the binary · 6414fa6a

由 Al Viro 提交于 3月 05, 2012

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6414fa6a

22 2月, 2012 1 次提交

i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a

由 Linus Torvalds 提交于 2月 21, 2012

While various modules include <asm/i387.h> to get access to things we
actually *intend* for them to use, most of that header file was really
pretty low-level internal stuff that we really don't want to expose to
others.

So split the header file into two: the small exported interfaces remain
in <asm/i387.h>, while the internal definitions that are only used by
core architecture code are now in <asm/fpu-internal.h>.

The guiding principle for this was to expose functions that we export to
modules, and leave them in <asm/i387.h>, while stuff that is used by
task switching or was marked GPL-only is in <asm/fpu-internal.h>.

The fpu-internal.h file could be further split up too, especially since
arch/x86/kvm/ uses some of the remaining stuff for its module. But that
kvm usage should probably be abstracted out a bit, and at least now the
internal FPU accessor functions are much more contained. Even if it
isn't perhaps as contained as it _could_ be.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1361b83a

21 2月, 2012 2 次提交

x86: Move some signal-handling definitions to a common header · f28f0c23

由 H. Peter Anvin 提交于 2月 19, 2012

There are some definitions which are duplicated between
kernel/signal.c and ia32/ia32_signal.c; move them to a common header
file.

Rather than adding stuff to existing header files which contain data
structures, create a new header file; hence the slightly odd name
("all the good ones were taken.")

Note: nothing relied on signal_fault() being defined in
<asm/ptrace.h>.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f28f0c23

x86-64, ia32: Drop sys32_rt_sigprocmask · 2c73ce73

由 H. Peter Anvin 提交于 2月 19, 2012

On x86, the only difference between sys_rt_sigprocmask and
sys32_rt_sigprocmask is the alignment of the data structures.
However, x86 allows data accesses with arbitrary alignment, and
therefore there is no reason for this code to be different.
Reported-by: NGregory M. Lueck <gregory.m.lueck@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

2c73ce73

18 1月, 2012 3 次提交

audit: inline audit_syscall_entry to reduce burden on archs · b05d8447

由 Eric Paris 提交于 1月 03, 2012

Every arch calls:

if (unlikely(current->audit_context))
	audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.
Signed-off-by: NEric Paris <eparis@redhat.com>

b05d8447

audit: ia32entry.S sign extend error codes when calling 64 bit code · f031cd25

由 Eric Paris 提交于 1月 03, 2012

In the ia32entry syscall exit audit fastpath we have assembly code which calls
__audit_syscall_exit directly.  This code was, however, zeroes the upper 32
bits of the return code.  It then proceeded to call code which expects longs
to be 64bits long.  In order to handle code which expects longs to be 64bit we
sign extend the return code if that code is an error.  Thus the
__audit_syscall_exit function can correctly handle using the values in
snprintf("%ld").  This fixes the regression introduced in 5cbf1565.

Old record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=4294967283
New record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=-13
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>

f031cd25

Audit: push audit success and retcode into arch ptrace.h · d7e7528b

由 Eric Paris 提交于 1月 03, 2012

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]

d7e7528b

06 12月, 2011 2 次提交

x86-64: Cleanup some assembly entry points · f6b2bc84

由 Jan Beulich 提交于 11月 29, 2011

system_call_after_swapgs doesn't really benefit from forcing
alignment from it - quite the opposite, native code needlessly
so far got a big NOP instruction inserted in front of it. Xen
being the only user of the separate entry point can well live
with the branch going to three bytes into a cache line.

The compatibility mode ptregs entry points for one can make use
of the GLOBAL() macro, and should be suitably aligned. Their
shared continuation point (ia32_ptregs_common) otoh doesn't need
to be global at all, but should continue to be properly aligned.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

f6b2bc84

x86-64: Slightly shorten line system call entry and exit paths · 46db09d3

由 Jan Beulich 提交于 11月 29, 2011

GET_THREAD_INFO() involves a memory read immediately followed by
an "sub" on the value read, in turn (in several cases)
immediately followed by a use of the calculated value as the
base address of a memory access. This combination of
instructions has a non-negligible potential for stalls.

In the system call entry point code, however, the (fixed) offset
of the stack pointer from the end of the stack is generally
known, and hence we can instead avoid the memory load and
subtract, and instead do the memory reference using %rsp as the
base register. To do so in a legible fashion, introduce a
THREAD_INFO() macro which, provided a register (generally %rsp)
and the known offset from the end of the stack, produces a
suitable memory access operand.

The patch attempts to only touch the fast paths (no auditing and
alike), but manages to do so only in the 64-bit entry point
case; the compatibility mode entry points have so many
interdependencies between their various branch targets that it
was necessary to also adjust the slow paths to eliminate the
risk of having missed some register dependency during code
analysis.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

46db09d3

19 11月, 2011 1 次提交

x86, syscall: Re-fix typo in comment · 61f1e7e2

由 H. Peter Anvin 提交于 11月 18, 2011

Fix the same typo as was fixed in:

b7641d2c x86-64, syscall: Adjust comment spacing and remove typo

... for the new versions of this file (32-bit and IA32 compat).
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1321569446-20433-4-git-send-email-hpa@linux.intel.com

61f1e7e2

18 11月, 2011 2 次提交

x86: Generate system call tables and unistd_*.h from tables · 303395ac

由 H. Peter Anvin 提交于 11月 11, 2011

Generate system call tables and unistd_*.h automatically from the
tables in arch/x86/syscalls.  All other information, like NR_syscalls,
is auto-generated, some of which is in asm-offsets_*.c.

This allows us to keep all the system call information in one place,
and allows for kernel space and user space to see different
information; this is currently used for the ia32 system call numbers
when building the 64-bit kernel, but will be used by the x32 ABI in
the near future.

This also removes some gratuitious differences between i386, x86-64
and ia32; in particular, now all system call tables are generated with
the same mechanism.

Cc: H. J. Lu <hjl.tools@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

303395ac

x86-64, ia32: Move compat_ni_syscall into C and its own file · e79a7fcc

由 H. Peter Anvin 提交于 11月 11, 2011

Move compat_ni_syscall out of ia32entry.S and into its own .c file.
Although this is a trivial function, it is not performance-critical,
and this will simplify further cleanups.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e79a7fcc

01 11月, 2011 1 次提交

Cross Memory Attach · fcf63409

由 Christopher Yeoh 提交于 10月 31, 2011

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fcf63409

27 8月, 2011 1 次提交

All Arch: remove linkage for sys_nfsservctl system call · f5b94099

由 NeilBrown 提交于 8月 26, 2011

The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5b94099