1. 20 12月, 2012 1 次提交
  2. 29 11月, 2012 2 次提交
  3. 01 10月, 2012 1 次提交
  4. 22 9月, 2012 2 次提交
  5. 19 9月, 2012 1 次提交
    • S
      x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels · 72a671ce
      Suresh Siddha 提交于
      Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
      to/from the fpstate in the task struct.
      
      And in the case of signal delivery for x86_64 binaries, if the fpstate is live
      in the CPU registers, then the live state is copied directly to the user
      sigframe. Otherwise  fpstate in the task struct is copied to the user sigframe.
      During restore, fpstate in the user sigframe is restored directly to the live
      CPU registers.
      
      Historically, different code paths led to different bugs. For example,
      x86_64 code path was not preemption safe till recently. Also there is lot
      of code duplication for support of new features like xsave etc.
      
      Unify signal handling code paths for x86 and x86_64 kernels.
      
      New strategy is as follows:
      
      Signal delivery: Both for 32/64-bit frames, align the core math frame area to
      64bytes as needed by xsave (this where the main fpu/extended state gets copied
      to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
      frames). If the state is live, copy the register state directly to the user
      frame. If not live, copy the state in the thread struct to the user frame. And
      for 32-bit [f]xsave frames, construct the fsave header separately before
      the actual [f]xsave area.
      
      Signal return: As the 32-bit frames with [f]xstate has an additional
      'fsave' header, copy everything back from the user sigframe to the
      fpstate in the task structure and reconstruct the fxstate from the 'fsave'
      header (Also user passed pointers may not be correctly aligned for
      any attempt to directly restore any partial state). At the next fpstate usage,
      everything will be restored to the live CPU registers.
      For all the 64-bit frames and the 32-bit fsave frame, restore the state from
      the user sigframe directly to the live CPU registers. 64-bit signals always
      restored the math frame directly, so we can expect the math frame pointer
      to be correctly aligned. For 32-bit fsave frames, there are no alignment
      requirements, so we can restore the state directly.
      
      "lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
      with in the noise range with this change.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
      [ Merged in compilation fix ]
      Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      72a671ce
  6. 05 9月, 2012 2 次提交
  7. 15 6月, 2012 1 次提交
  8. 02 6月, 2012 1 次提交
  9. 22 5月, 2012 1 次提交
  10. 16 5月, 2012 1 次提交
  11. 08 5月, 2012 1 次提交
  12. 07 5月, 2012 1 次提交
  13. 21 4月, 2012 4 次提交
  14. 14 4月, 2012 1 次提交
    • W
      signal, x86: add SIGSYS info and make it synchronous. · a0727e8c
      Will Drewry 提交于
      This change enables SIGSYS, defines _sigfields._sigsys, and adds
      x86 (compat) arch support.  _sigsys defines fields which allow
      a signal handler to receive the triggering system call number,
      the relevant AUDIT_ARCH_* value for that number, and the address
      of the callsite.
      
      SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
      to have setup_frame() called for it. The goal is to ensure that
      ucontext_t reflects the machine state from the time-of-syscall and not
      from another signal handler.
      
      The first consumer of SIGSYS would be seccomp filter.  In particular,
      a filter program could specify a new return value, SECCOMP_RET_TRAP,
      which would result in the system call being denied and the calling
      thread signaled.  This also means that implementing arch-specific
      support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - added acked by, rebase
      v17: - rebase and reviewed-by addition
      v14: - rebase/nochanges
      v13: - rebase on to 88ebdda6
      v12: - reworded changelog (oleg@redhat.com)
      v11: - fix dropped words in the change description
           - added fallback copy_siginfo support.
           - added __ARCH_SIGSYS define to allow stepped arch support.
      v10: - first version based on suggestion
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      a0727e8c
  15. 29 3月, 2012 1 次提交
  16. 21 3月, 2012 2 次提交
  17. 14 3月, 2012 1 次提交
  18. 13 3月, 2012 1 次提交
  19. 06 3月, 2012 2 次提交
  20. 22 2月, 2012 1 次提交
    • L
      i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a
      Linus Torvalds 提交于
      While various modules include <asm/i387.h> to get access to things we
      actually *intend* for them to use, most of that header file was really
      pretty low-level internal stuff that we really don't want to expose to
      others.
      
      So split the header file into two: the small exported interfaces remain
      in <asm/i387.h>, while the internal definitions that are only used by
      core architecture code are now in <asm/fpu-internal.h>.
      
      The guiding principle for this was to expose functions that we export to
      modules, and leave them in <asm/i387.h>, while stuff that is used by
      task switching or was marked GPL-only is in <asm/fpu-internal.h>.
      
      The fpu-internal.h file could be further split up too, especially since
      arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
      kvm usage should probably be abstracted out a bit, and at least now the
      internal FPU accessor functions are much more contained.  Even if it
      isn't perhaps as contained as it _could_ be.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1361b83a
  21. 21 2月, 2012 2 次提交
  22. 18 1月, 2012 3 次提交
    • E
      audit: inline audit_syscall_entry to reduce burden on archs · b05d8447
      Eric Paris 提交于
      Every arch calls:
      
      if (unlikely(current->audit_context))
      	audit_syscall_entry()
      
      which requires knowledge about audit (the existance of audit_context) in
      the arch code.  Just do it all in static inline in audit.h so that arch's
      can remain blissfully ignorant.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b05d8447
    • E
      audit: ia32entry.S sign extend error codes when calling 64 bit code · f031cd25
      Eric Paris 提交于
      In the ia32entry syscall exit audit fastpath we have assembly code which calls
      __audit_syscall_exit directly.  This code was, however, zeroes the upper 32
      bits of the return code.  It then proceeded to call code which expects longs
      to be 64bits long.  In order to handle code which expects longs to be 64bit we
      sign extend the return code if that code is an error.  Thus the
      __audit_syscall_exit function can correctly handle using the values in
      snprintf("%ld").  This fixes the regression introduced in 5cbf1565.
      
      Old record:
      type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=4294967283
      New record:
      type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=-13
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      f031cd25
    • E
      Audit: push audit success and retcode into arch ptrace.h · d7e7528b
      Eric Paris 提交于
      The audit system previously expected arches calling to audit_syscall_exit to
      supply as arguments if the syscall was a success and what the return code was.
      Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
      by converting from negative retcodes to an audit internal magic value stating
      success or failure.  This helper was wrong and could indicate that a valid
      pointer returned to userspace was a failed syscall.  The fix is to fix the
      layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
      in turns calls back into arch code to collect the return value and to
      determine if the syscall was a success or failure.  We also define a generic
      is_syscall_success() macro which determines success/failure based on if the
      value is < -MAX_ERRNO.  This works for arches like x86 which do not use a
      separate mechanism to indicate syscall failure.
      
      We make both the is_syscall_success() and regs_return_value() static inlines
      instead of macros.  The reason is because the audit function must take a void*
      for the regs.  (uml calls theirs struct uml_pt_regs instead of just struct
      pt_regs so audit_syscall_exit can't take a struct pt_regs).  Since the audit
      function takes a void* we need to use static inlines to cast it back to the
      arch correct structure to dereference it.
      
      The other major change is that on some arches, like ia64, MIPS and ppc, we
      change regs_return_value() to give us the negative value on syscall failure.
      THE only other user of this macro, kretprobe_example.c, won't notice and it
      makes the value signed consistently for the audit functions across all archs.
      
      In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
      audit code as the return value.  But the ptrace_64.h code defined the macro
      regs_return_value() as regs[3].  I have no idea which one is correct, but this
      patch now uses the regs_return_value() function, so it now uses regs[3].
      
      For powerpc we previously used regs->result but now use the
      regs_return_value() function which uses regs->gprs[3].  regs->gprs[3] is
      always positive so the regs_return_value(), much like ia64 makes it negative
      before calling the audit code when appropriate.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
      Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
      Acked-by: Richard Weinberger <richard@nod.at> [for uml]
      Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
      Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
      d7e7528b
  23. 06 12月, 2011 2 次提交
    • J
      x86-64: Cleanup some assembly entry points · f6b2bc84
      Jan Beulich 提交于
      system_call_after_swapgs doesn't really benefit from forcing
      alignment from it - quite the opposite, native code needlessly
      so far got a big NOP instruction inserted in front of it. Xen
      being the only user of the separate entry point can well live
      with the branch going to three bytes into a cache line.
      
      The compatibility mode ptregs entry points for one can make use
      of the GLOBAL() macro, and should be suitably aligned. Their
      shared continuation point (ia32_ptregs_common) otoh doesn't need
      to be global at all, but should continue to be properly aligned.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f6b2bc84
    • J
      x86-64: Slightly shorten line system call entry and exit paths · 46db09d3
      Jan Beulich 提交于
      GET_THREAD_INFO() involves a memory read immediately followed by
      an "sub" on the value read, in turn (in several cases)
      immediately followed by a use of the calculated value as the
      base address of a memory access. This combination of
      instructions has a non-negligible potential for stalls.
      
      In the system call entry point code, however, the (fixed) offset
      of the stack pointer from the end of the stack is generally
      known, and hence we can instead avoid the memory load and
      subtract, and instead do the memory reference using %rsp as the
      base register. To do so in a legible fashion, introduce a
      THREAD_INFO() macro which, provided a register (generally %rsp)
      and the known offset from the end of the stack, produces a
      suitable memory access operand.
      
      The patch attempts to only touch the fast paths (no auditing and
      alike), but manages to do so only in the 64-bit entry point
      case; the compatibility mode entry points have so many
      interdependencies between their various branch targets that it
      was necessary to also adjust the slow paths to eliminate the
      risk of having missed some register dependency during code
      analysis.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      46db09d3
  24. 19 11月, 2011 1 次提交
  25. 18 11月, 2011 2 次提交
    • H
      x86: Generate system call tables and unistd_*.h from tables · 303395ac
      H. Peter Anvin 提交于
      Generate system call tables and unistd_*.h automatically from the
      tables in arch/x86/syscalls.  All other information, like NR_syscalls,
      is auto-generated, some of which is in asm-offsets_*.c.
      
      This allows us to keep all the system call information in one place,
      and allows for kernel space and user space to see different
      information; this is currently used for the ia32 system call numbers
      when building the 64-bit kernel, but will be used by the x32 ABI in
      the near future.
      
      This also removes some gratuitious differences between i386, x86-64
      and ia32; in particular, now all system call tables are generated with
      the same mechanism.
      
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      303395ac
    • H
      x86-64, ia32: Move compat_ni_syscall into C and its own file · e79a7fcc
      H. Peter Anvin 提交于
      Move compat_ni_syscall out of ia32entry.S and into its own .c file.
      Although this is a trivial function, it is not performance-critical,
      and this will simplify further cleanups.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e79a7fcc
  26. 01 11月, 2011 1 次提交
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
  27. 27 8月, 2011 1 次提交