1. 05 5月, 2016 1 次提交
  2. 24 4月, 2016 1 次提交
  3. 02 12月, 2015 1 次提交
    • Z
      vfs: add copy_file_range syscall and vfs helper · 29732938
      Zach Brown 提交于
      Add a copy_file_range() system call for offloading copies between
      regular files.
      
      This gives an interface to underlying layers of the storage stack which
      can copy without reading and writing all the data.  There are a few
      candidates that should support copy offloading in the nearer term:
      
      - btrfs shares extent references with its clone ioctl
      - NFS has patches to add a COPY command which copies on the server
      - SCSI has a family of XCOPY commands which copy in the device
      
      This system call avoids the complexity of also accelerating the creation
      of the destination file by operating on an existing destination file
      descriptor, not a path.
      
      Currently the high level vfs entry point limits copy offloading to files
      on the same mount and super (and not in the same file).  This can be
      relaxed if we get implementations which can copy between file systems
      safely.
      Signed-off-by: NZach Brown <zab@redhat.com>
      [Anna Schumaker: Change -EINVAL to -EBADF during file verification,
                       Change flags parameter from int to unsigned int,
                       Add function to include/linux/syscalls.h,
                       Check copy len after file open mode,
                       Don't forbid ranges inside the same file,
                       Use rw_verify_area() to veriy ranges,
                       Use file_out rather than file_in,
                       Add COPY_FR_REFLINK flag]
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      29732938
  4. 06 11月, 2015 1 次提交
  5. 23 9月, 2015 1 次提交
  6. 12 9月, 2015 1 次提交
    • M
      sys_membarrier(): system-wide memory barrier (generic, x86) · 5b25b13a
      Mathieu Desnoyers 提交于
      Here is an implementation of a new system call, sys_membarrier(), which
      executes a memory barrier on all threads running on the system.  It is
      implemented by calling synchronize_sched().  It can be used to
      distribute the cost of user-space memory barriers asymmetrically by
      transforming pairs of memory barriers into pairs consisting of
      sys_membarrier() and a compiler barrier.  For synchronization primitives
      that distinguish between read-side and write-side (e.g.  userspace RCU
      [1], rwlocks), the read-side can be accelerated significantly by moving
      the bulk of the memory barrier overhead to the write-side.
      
      The existing applications of which I am aware that would be improved by
      this system call are as follows:
      
      * Through Userspace RCU library (http://urcu.so)
        - DNS server (Knot DNS) https://www.knot-dns.cz/
        - Network sniffer (http://netsniff-ng.org/)
        - Distributed object storage (https://sheepdog.github.io/sheepdog/)
        - User-space tracing (http://lttng.org)
        - Network storage system (https://www.gluster.org/)
        - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
        - Financial software (https://lkml.org/lkml/2015/3/23/189)
      
      Those projects use RCU in userspace to increase read-side speed and
      scalability compared to locking.  Especially in the case of RCU used by
      libraries, sys_membarrier can speed up the read-side by moving the bulk of
      the memory barrier cost to synchronize_rcu().
      
      * Direct users of sys_membarrier
        - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)
      
      Microsoft core dotnet GC developers are planning to use the mprotect()
      side-effect of issuing memory barriers through IPIs as a way to implement
      Windows FlushProcessWriteBuffers() on Linux.  They are referring to
      sys_membarrier in their github thread, specifically stating that
      sys_membarrier() is what they are looking for.
      
      To explain the benefit of this scheme, let's introduce two example threads:
      
      Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
      Thread B (frequent, e.g. executing liburcu
      rcu_read_lock()/rcu_read_unlock())
      
      In a scheme where all smp_mb() in thread A are ordering memory accesses
      with respect to smp_mb() present in Thread B, we can change each
      smp_mb() within Thread A into calls to sys_membarrier() and each
      smp_mb() within Thread B into compiler barriers "barrier()".
      
      Before the change, we had, for each smp_mb() pairs:
      
      Thread A                    Thread B
      previous mem accesses       previous mem accesses
      smp_mb()                    smp_mb()
      following mem accesses      following mem accesses
      
      After the change, these pairs become:
      
      Thread A                    Thread B
      prev mem accesses           prev mem accesses
      sys_membarrier()            barrier()
      follow mem accesses         follow mem accesses
      
      As we can see, there are two possible scenarios: either Thread B memory
      accesses do not happen concurrently with Thread A accesses (1), or they
      do (2).
      
      1) Non-concurrent Thread A vs Thread B accesses:
      
      Thread A                    Thread B
      prev mem accesses
      sys_membarrier()
      follow mem accesses
                                  prev mem accesses
                                  barrier()
                                  follow mem accesses
      
      In this case, thread B accesses will be weakly ordered. This is OK,
      because at that point, thread A is not particularly interested in
      ordering them with respect to its own accesses.
      
      2) Concurrent Thread A vs Thread B accesses
      
      Thread A                    Thread B
      prev mem accesses           prev mem accesses
      sys_membarrier()            barrier()
      follow mem accesses         follow mem accesses
      
      In this case, thread B accesses, which are ensured to be in program
      order thanks to the compiler barrier, will be "upgraded" to full
      smp_mb() by synchronize_sched().
      
      * Benchmarks
      
      On Intel Xeon E5405 (8 cores)
      (one thread is calling sys_membarrier, the other 7 threads are busy
      looping)
      
      1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.
      
      * User-space user of this system call: Userspace RCU library
      
      Both the signal-based and the sys_membarrier userspace RCU schemes
      permit us to remove the memory barrier from the userspace RCU
      rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
      accelerating them. These memory barriers are replaced by compiler
      barriers on the read-side, and all matching memory barriers on the
      write-side are turned into an invocation of a memory barrier on all
      active threads in the process. By letting the kernel perform this
      synchronization rather than dumbly sending a signal to every process
      threads (as we currently do), we diminish the number of unnecessary wake
      ups and only issue the memory barriers on active threads. Non-running
      threads do not need to execute such barrier anyway, because these are
      implied by the scheduler context switches.
      
      Results in liburcu:
      
      Operations in 10s, 6 readers, 2 writers:
      
      memory barriers in reader:    1701557485 reads, 2202847 writes
      signal-based scheme:          9830061167 reads,    6700 writes
      sys_membarrier:               9952759104 reads,     425 writes
      sys_membarrier (dyn. check):  7970328887 reads,     425 writes
      
      The dynamic sys_membarrier availability check adds some overhead to
      the read-side compared to the signal-based scheme, but besides that,
      sys_membarrier slightly outperforms the signal-based scheme. However,
      this non-expedited sys_membarrier implementation has a much slower grace
      period than signal and memory barrier schemes.
      
      Besides diminishing the number of wake-ups, one major advantage of the
      membarrier system call over the signal-based scheme is that it does not
      need to reserve a signal. This plays much more nicely with libraries,
      and with processes injected into for tracing purposes, for which we
      cannot expect that signals will be unused by the application.
      
      An expedited version of this system call can be added later on to speed
      up the grace period. Its implementation will likely depend on reading
      the cpu_curr()->mm without holding each CPU's rq lock.
      
      This patch adds the system call to x86 and to asm-generic.
      
      [1] http://urcu.so
      
      membarrier(2) man page:
      
      MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)
      
      NAME
             membarrier - issue memory barriers on a set of threads
      
      SYNOPSIS
             #include <linux/membarrier.h>
      
             int membarrier(int cmd, int flags);
      
      DESCRIPTION
             The cmd argument is one of the following:
      
             MEMBARRIER_CMD_QUERY
                    Query  the  set  of  supported commands. It returns a bitmask of
                    supported commands.
      
             MEMBARRIER_CMD_SHARED
                    Execute a memory barrier on all threads running on  the  system.
                    Upon  return from system call, the caller thread is ensured that
                    all running threads have passed through a state where all memory
                    accesses  to  user-space  addresses  match program order between
                    entry to and return from the system  call  (non-running  threads
                    are de facto in such a state). This covers threads from all pro=E2=80=90
                    cesses running on the system.  This command returns 0.
      
             The flags argument needs to be 0. For future extensions.
      
             All memory accesses performed  in  program  order  from  each  targeted
             thread is guaranteed to be ordered with respect to sys_membarrier(). If
             we use the semantic "barrier()" to represent a compiler barrier forcing
             memory  accesses  to  be performed in program order across the barrier,
             and smp_mb() to represent explicit memory barriers forcing full  memory
             ordering  across  the barrier, we have the following ordering table for
             each pair of barrier(), sys_membarrier() and smp_mb():
      
             The pair ordering is detailed as (O: ordered, X: not ordered):
      
                                    barrier()   smp_mb() sys_membarrier()
                    barrier()          X           X            O
                    smp_mb()           X           O            O
                    sys_membarrier()   O           O            O
      
      RETURN VALUE
             On success, these system calls return zero.  On error, -1 is  returned,
             and errno is set appropriately. For a given command, with flags
             argument set to 0, this system call is guaranteed to always return the
             same value until reboot.
      
      ERRORS
             ENOSYS System call is not implemented.
      
             EINVAL Invalid arguments.
      
      Linux                             2015-04-15                     MEMBARRIER(2)
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Nicholas Miell <nmiell@comcast.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b25b13a
  7. 14 12月, 2014 1 次提交
    • D
      syscalls: implement execveat() system call · 51f39a1f
      David Drysdale 提交于
      This patchset adds execveat(2) for x86, and is derived from Meredydd
      Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
      
      The primary aim of adding an execveat syscall is to allow an
      implementation of fexecve(3) that does not rely on the /proc filesystem,
      at least for executables (rather than scripts).  The current glibc version
      of fexecve(3) is implemented via /proc, which causes problems in sandboxed
      or otherwise restricted environments.
      
      Given the desire for a /proc-free fexecve() implementation, HPA suggested
      (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
      an appropriate generalization.
      
      Also, having a new syscall means that it can take a flags argument without
      back-compatibility concerns.  The current implementation just defines the
      AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
      added in future -- for example, flags for new namespaces (as suggested at
      https://lkml.org/lkml/2006/7/11/474).
      
      Related history:
       - https://lkml.org/lkml/2006/12/27/123 is an example of someone
         realizing that fexecve() is likely to fail in a chroot environment.
       - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
         documenting the /proc requirement of fexecve(3) in its manpage, to
         "prevent other people from wasting their time".
       - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
         problem where a process that did setuid() could not fexecve()
         because it no longer had access to /proc/self/fd; this has since
         been fixed.
      
      This patch (of 4):
      
      Add a new execveat(2) system call.  execveat() is to execve() as openat()
      is to open(): it takes a file descriptor that refers to a directory, and
      resolves the filename relative to that.
      
      In addition, if the filename is empty and AT_EMPTY_PATH is specified,
      execveat() executes the file to which the file descriptor refers.  This
      replicates the functionality of fexecve(), which is a system call in other
      UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
      so relies on /proc being mounted).
      
      The filename fed to the executed program as argv[0] (or the name of the
      script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
      (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
      reflecting how the executable was found.  This does however mean that
      execution of a script in a /proc-less environment won't work; also, script
      execution via an O_CLOEXEC file descriptor fails (as the file will not be
      accessible after exec).
      
      Based on patches by Meredydd Luff.
      Signed-off-by: NDavid Drysdale <drysdale@google.com>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Rich Felker <dalias@aerifal.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51f39a1f
  8. 27 9月, 2014 1 次提交
  9. 19 8月, 2014 1 次提交
  10. 06 8月, 2014 1 次提交
    • T
      random: introduce getrandom(2) system call · c6e9d6f3
      Theodore Ts'o 提交于
      The getrandom(2) system call was requested by the LibreSSL Portable
      developers.  It is analoguous to the getentropy(2) system call in
      OpenBSD.
      
      The rationale of this system call is to provide resiliance against
      file descriptor exhaustion attacks, where the attacker consumes all
      available file descriptors, forcing the use of the fallback code where
      /dev/[u]random is not available.  Since the fallback code is often not
      well-tested, it is better to eliminate this potential failure mode
      entirely.
      
      The other feature provided by this new system call is the ability to
      request randomness from the /dev/urandom entropy pool, but to block
      until at least 128 bits of entropy has been accumulated in the
      /dev/urandom entropy pool.  Historically, the emphasis in the
      /dev/urandom development has been to ensure that urandom pool is
      initialized as quickly as possible after system boot, and preferably
      before the init scripts start execution.
      
      This is because changing /dev/urandom reads to block represents an
      interface change that could potentially break userspace which is not
      acceptable.  In practice, on most x86 desktop and server systems, in
      general the entropy pool can be initialized before it is needed (and
      in modern kernels, we will printk a warning message if not).  However,
      on an embedded system, this may not be the case.  And so with this new
      interface, we can provide the functionality of blocking until the
      urandom pool has been initialized.  Any userspace program which uses
      this new functionality must take care to assure that if it is used
      during the boot process, that it will not cause the init scripts or
      other portions of the system startup to hang indefinitely.
      
      SYNOPSIS
      	#include <linux/random.h>
      
      	int getrandom(void *buf, size_t buflen, unsigned int flags);
      
      DESCRIPTION
      	The system call getrandom() fills the buffer pointed to by buf
      	with up to buflen random bytes which can be used to seed user
      	space random number generators (i.e., DRBG's) or for other
      	cryptographic uses.  It should not be used for Monte Carlo
      	simulations or other programs/algorithms which are doing
      	probabilistic sampling.
      
      	If the GRND_RANDOM flags bit is set, then draw from the
      	/dev/random pool instead of the /dev/urandom pool.  The
      	/dev/random pool is limited based on the entropy that can be
      	obtained from environmental noise, so if there is insufficient
      	entropy, the requested number of bytes may not be returned.
      	If there is no entropy available at all, getrandom(2) will
      	either block, or return an error with errno set to EAGAIN if
      	the GRND_NONBLOCK bit is set in flags.
      
      	If the GRND_RANDOM bit is not set, then the /dev/urandom pool
      	will be used.  Unlike using read(2) to fetch data from
      	/dev/urandom, if the urandom pool has not been sufficiently
      	initialized, getrandom(2) will block (or return -1 with the
      	errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).
      
      	The getentropy(2) system call in OpenBSD can be emulated using
      	the following function:
      
                  int getentropy(void *buf, size_t buflen)
                  {
                          int     ret;
      
                          if (buflen > 256)
                                  goto failure;
                          ret = getrandom(buf, buflen, 0);
                          if (ret < 0)
                                  return ret;
                          if (ret == buflen)
                                  return 0;
                  failure:
                          errno = EIO;
                          return -1;
                  }
      
      RETURN VALUE
             On success, the number of bytes that was filled in the buf is
             returned.  This may not be all the bytes requested by the
             caller via buflen if insufficient entropy was present in the
             /dev/random pool, or if the system call was interrupted by a
             signal.
      
             On error, -1 is returned, and errno is set appropriately.
      
      ERRORS
      	EINVAL		An invalid flag was passed to getrandom(2)
      
      	EFAULT		buf is outside the accessible address space.
      
      	EAGAIN		The requested entropy was not available, and
      			getentropy(2) would have blocked if the
      			GRND_NONBLOCK flag was not set.
      
      	EINTR		While blocked waiting for entropy, the call was
      			interrupted by a signal handler; see the description
      			of how interrupted read(2) calls on "slow" devices
      			are handled with and without the SA_RESTART flag
      			in the signal(7) man page.
      
      NOTES
      	For small requests (buflen <= 256) getrandom(2) will not
      	return EINTR when reading from the urandom pool once the
      	entropy pool has been initialized, and it will return all of
      	the bytes that have been requested.  This is the recommended
      	way to use getrandom(2), and is designed for compatibility
      	with OpenBSD's getentropy() system call.
      
      	However, if you are using GRND_RANDOM, then getrandom(2) may
      	block until the entropy accounting determines that sufficient
      	environmental noise has been gathered such that getrandom(2)
      	will be operating as a NRBG instead of a DRBG for those people
      	who are working in the NIST SP 800-90 regime.  Since it may
      	block for a long time, these guarantees do *not* apply.  The
      	user may want to interrupt a hanging process using a signal,
      	so blocking until all of the requested bytes are returned
      	would be unfriendly.
      
      	For this reason, the user of getrandom(2) MUST always check
      	the return value, in case it returns some error, or if fewer
      	bytes than requested was returned.  In the case of
      	!GRND_RANDOM and small request, the latter should never
      	happen, but the careful userspace code (and all crypto code
      	should be careful) should check for this anyway!
      
      	Finally, unless you are doing long-term key generation (and
      	perhaps not even then), you probably shouldn't be using
      	GRND_RANDOM.  The cryptographic algorithms used for
      	/dev/urandom are quite conservative, and so should be
      	sufficient for all purposes.  The disadvantage of GRND_RANDOM
      	is that it can block, and the increased complexity required to
      	deal with partially fulfilled getrandom(2) requests.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NZach Brown <zab@zabbo.net>
      c6e9d6f3
  11. 19 7月, 2014 1 次提交
    • K
      seccomp: add "seccomp" syscall · 48dc92b9
      Kees Cook 提交于
      This adds the new "seccomp" syscall with both an "operation" and "flags"
      parameter for future expansion. The third argument is a pointer value,
      used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
      be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
      
      In addition to the TSYNC flag later in this patch series, there is a
      non-zero chance that this syscall could be used for configuring a fixed
      argument area for seccomp-tracer-aware processes to pass syscall arguments
      in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
      for this syscall. Additionally, this syscall uses operation, flags,
      and user pointer for arguments because strictly passing arguments via
      a user pointer would mean seccomp itself would be unable to trivially
      filter the seccomp syscall itself.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
      48dc92b9
  12. 20 5月, 2014 1 次提交
    • J
      asm-generic: Add renameat2 syscall · 63ba6000
      James Hogan 提交于
      Add the renameat2 syscall to the generic syscall list, which is used by the
      following architectures: arc, arm64, c6x, hexagon, metag, openrisc, score,
      tile, unicore32.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-metag@vger.kernel.org
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      63ba6000
  13. 04 3月, 2014 1 次提交
    • H
      compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64 · 0473c9b5
      Heiko Carstens 提交于
      For architecture dependent compat syscalls in common code an architecture
      must define something like __ARCH_WANT_<WHATEVER> if it wants to use the
      code.
      This however is not true for compat_sys_getdents64 for which architectures
      must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code.
      
      This leads to the situation where all architectures, except mips, get the
      compat code but only x86_64, arm64 and the generic syscall architectures
      actually use it.
      
      So invert the logic, so that architectures actively must do something to
      get the compat code.
      
      This way a couple of architectures get rid of otherwise dead code.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      0473c9b5
  14. 24 2月, 2014 1 次提交
    • J
      asm-generic: add sched_setattr/sched_getattr syscalls · e6cfc029
      James Hogan 提交于
      Add the sched_setattr and sched_getattr syscalls to the generic syscall
      list, which is used by the following architectures: arc, arm64, c6x,
      hexagon, metag, openrisc, score, tile, unicore32.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: linux-arch@vger.kernel.org
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-metag@vger.kernel.org
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: linux@lists.openrisc.net
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      e6cfc029
  15. 19 6月, 2013 1 次提交
  16. 04 2月, 2013 1 次提交
  17. 14 12月, 2012 1 次提交
  18. 05 10月, 2012 1 次提交
  19. 04 10月, 2012 1 次提交
    • D
      UAPI: Fix the guards on various asm/unistd.h files · 89013952
      David Howells 提交于
      asm-generic/unistd.h and a number of asm/unistd.h files have been given
      reinclusion guards that allow the guard to be overridden if __SYSCALL is
      defined.  Unfortunately, these files define __SYSCALL and don't undefine it
      when they've finished with it, thus rendering the guard ineffective.
      
      The reason for this override is to allow the file to be #included multiple
      times with different settings on __SYSCALL for purposes like generating syscall
      tables.
      
      The following guards are problematic:
      
      arch/arm64/include/asm/unistd.h:#if !defined(__ASM_UNISTD_H) || defined(__SYSCALL)
      arch/arm64/include/asm/unistd32.h:#if !defined(__ASM_UNISTD32_H) || defined(__SYSCALL)
      arch/c6x/include/asm/unistd.h:#if !defined(_ASM_C6X_UNISTD_H) || defined(__SYSCALL)
      arch/hexagon/include/asm/unistd.h:#if !defined(_ASM_HEXAGON_UNISTD_H) || defined(__SYSCALL)
      arch/openrisc/include/asm/unistd.h:#if !defined(__ASM_OPENRISC_UNISTD_H) || defined(__SYSCALL)
      arch/score/include/asm/unistd.h:#if !defined(_ASM_SCORE_UNISTD_H) || defined(__SYSCALL)
      arch/tile/include/asm/unistd.h:#if !defined(_ASM_TILE_UNISTD_H) || defined(__SYSCALL)
      arch/unicore32/include/asm/unistd.h:#if !defined(__UNICORE_UNISTD_H__) || defined(__SYSCALL)
      include/asm-generic/unistd.h:#if !defined(_ASM_GENERIC_UNISTD_H) || defined(__SYSCALL)
      
      On the assumption that the guards' ineffectiveness has passed unnoticed, just
      remove these guards entirely.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      89013952
  20. 27 9月, 2012 1 次提交
    • M
      syscalls: add __NR_kcmp syscall to generic unistd.h · 11ef4cfa
      Mark Salter 提交于
      Commit d97b46a6 ("syscalls, x86: add __NR_kcmp syscall" ) added a new
      syscall to support checkpoint restore. It is currently x86-only, but
      that restriction will be removed in a subsequent patch. Unfortunately,
      the kernel checksyscalls script had a bug which suppressed any warning
      to other architectures that the kcmp syscall was not implemented. A
      patch to checksyscalls is being tested in linux-next and other
      architectures are seeing warnings about kcmp being unimplemented.
      
      This patch adds __NR_kcmp to <asm-generic/unistd.h> so that kcmp is
      wired in for architectures using the generic syscall list.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      11ef4cfa
  21. 28 3月, 2012 1 次提交
    • C
      compat: use sys_sendfile64() implementation for sendfile syscall · 1631fcea
      Chris Metcalf 提交于
      <asm-generic/unistd.h> was set up to use sys_sendfile() for the 32-bit
      compat API instead of sys_sendfile64(), but in fact the right thing to
      do is to use sys_sendfile64() in all cases.  The 32-bit sendfile64() API
      in glibc uses the sendfile64 syscall, so it has to be capable of doing
      full 64-bit operations.  But the sys_sendfile() kernel implementation
      has a MAX_NON_LFS test in it which explicitly limits the offset to 2^32.
      So, we need to use the sys_sendfile64() implementation in the kernel
      for this case.
      
      Cc: <stable@kernel.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      1631fcea
  22. 04 12月, 2011 1 次提交
  23. 27 8月, 2011 1 次提交
  24. 03 6月, 2011 1 次提交
  25. 29 5月, 2011 1 次提交
    • E
      ns: Wire up the setns system call · 7b21fddd
      Eric W. Biederman 提交于
      32bit and 64bit on x86 are tested and working.  The rest I have looked
      at closely and I can't find any problems.
      
      setns is an easy system call to wire up.  It just takes two ints so I
      don't expect any weird architecture porting problems.
      
      While doing this I have noticed that we have some architectures that are
      very slow to get new system calls.  cris seems to be the slowest where
      the last system calls wired up were preadv and pwritev.  avr32 is weird
      in that recvmmsg was wired up but never declared in unistd.h.  frv is
      behind with perf_event_open being the last syscall wired up.  On h8300
      the last system call wired up was epoll_wait.  On m32r the last system
      call wired up was fallocate.  mn10300 has recvmmsg as the last system
      call wired up.  The rest seem to at least have syncfs wired up which was
      new in the 2.6.39.
      
      v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
      v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
      v4: Moved wiring up of the system call to another patch
      v5: ported to v2.6.39-rc6
      v6: rebased onto parisc-next and net-next to avoid syscall  conflicts.
      v7: ported to Linus's latest post 2.6.39 tree.
      
      >  arch/blackfin/include/asm/unistd.h     |    3 ++-
      >  arch/blackfin/mach-common/entry.S      |    1 +
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      
      Oh - ia64 wiring looks good.
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b21fddd
  26. 13 5月, 2011 1 次提交
    • C
      compat: fixes to allow working with tile arch · be84cb43
      Chris Metcalf 提交于
      The existing <asm-generic/unistd.h> mechanism doesn't really provide
      enough to create the 64-bit "compat" ABI properly in a generic way,
      since the compat ABI is a mix of things were you can re-use the 64-bit
      versions of syscalls and things where you need a compat wrapper.
      
      To provide this in the most direct way possible, I added two new macros
      to go along with the existing __SYSCALL and __SC_3264 macros: __SC_COMP
      and SC_COMP_3264.  These macros take an additional argument, typically a
      "compat_sys_xxx" function, which is passed to __SYSCALL if you define
      __SYSCALL_COMPAT when including the header, resulting in a pointer to
      the compat function being placed in the generated syscall table.
      
      The change also adds some missing definitions to <linux/compat.h> so that
      it actually has declarations for all the compat syscalls, since the
      "[nr] = ##call" approach requires proper C declarations for all the
      functions included in the syscall table.
      
      Finally, compat.c defines compat_sys_sigpending() and
      compat_sys_sigprocmask() even if the underlying architecture doesn't
      request it, which tries to pull in undefined compat_old_sigset_t defines.
      We need to guard those compat syscall definitions with appropriate
      __ARCH_WANT_SYS_xxx ifdefs.
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      be84cb43
  27. 23 3月, 2011 1 次提交
  28. 21 3月, 2011 1 次提交
    • S
      introduce sys_syncfs to sync a single file system · b7ed78f5
      Sage Weil 提交于
      It is frequently useful to sync a single file system, instead of all
      mounted file systems via sync(2):
      
       - On machines with many mounts, it is not at all uncommon for some of
         them to hang (e.g. unresponsive NFS server).  sync(2) will get stuck on
         those and may never get to the one you do care about (e.g., /).
       - Some applications write lots of data to the file system and then
         want to make sure it is flushed to disk.  Calling fsync(2) on each
         file introduces unnecessary ordering constraints that result in a large
         amount of sub-optimal writeback/flush/commit behavior by the file
         system.
      
      There are currently two ways (that I know of) to sync a single super_block:
      
       - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
         mapping, which isn't usually desirable, and doesn't work for non-block
         file systems.
       - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
         current implemention.  Relying on this little-known side effect for
         something like data safety sounds foolish.
      
      Both of these approaches require root privileges, which some applications
      do not have (nor should they need?) given that sync(2) is an unprivileged
      operation.
      
      This patch introduces a new system call syncfs(2) that takes an fd and
      syncs only the file system it references.  Maybe someday we can
      
       $ sync /some/path
      
      and not get
      
       sync: ignoring all arguments
      
      The syscall is motivated by comments by Al and Christoph at the last LSF.
      syncfs(2) seems like an appropriate name given statfs(2).
      
      A similar ioctl was also proposed a while back, see
      	http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b7ed78f5
  29. 20 3月, 2011 1 次提交
  30. 15 3月, 2011 1 次提交
  31. 13 8月, 2010 1 次提交
  32. 16 7月, 2010 1 次提交
  33. 26 6月, 2010 1 次提交
    • C
      Add wait4() back to the set of <asm-generic/unistd.h> syscalls. · b51cae21
      Chris Metcalf 提交于
      The initial pass at the generic ABI assumed that wait4() could be
      easily expressed using waitid().  Although it's true that wait4()
      can be built on waitid(), it's awkward enough that it makes more
      sense to continue to include wait4 in the generic syscall ABI.
      
      Since there is already a deprecated wait4 in the ABI, this change
      converts that wait4 into old_wait, and puts wait4 in the next
      available slot for new supported syscalls, after the platform-specific
      syscalls at number 260.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      b51cae21
  34. 05 6月, 2010 1 次提交
    • C
      Fix up the "generic" unistd.h ABI to be more useful. · 5360bd77
      Chris Metcalf 提交于
      Reserve 16 "architecture-specific" syscall numbers starting at 244.
      
      Allow use of the sys_sync_file_range2() API with the generic unistd.h
      by specifying __ARCH_WANT_SYNC_FILE_RANGE2 before including it.
      
      Allow using the generic unistd.h to create the "compat" syscall table
      by specifying __SYSCALL_COMPAT before including it.
      
      Use sys_fadvise64_64 for __NR3264_fadvise64 in both 32- and 64-bit mode.
      
      Request the appropriate __ARCH_WANT_COMPAT_SYS_xxx values when
      some deprecated syscall modes are selected.
      
      As part of this change to fix up the syscalls, also provide a couple
      of missing signal-related syscall prototypes in <linux/syscalls.h>.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      5360bd77
  35. 11 12月, 2009 2 次提交
  36. 04 12月, 2009 1 次提交
  37. 03 11月, 2009 1 次提交
  38. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  39. 19 6月, 2009 1 次提交
    • A
      asm-generic: hook up new system calls · fcec9bf1
      Arnd Bergmann 提交于
      sys_rt_tgsigqueueinfo and sys_perf_counter_open
      have been added in 2.6.31, so hook them up in the
      generic unistd.h file.
      
      Since the file is now in the mainline kernel, we
      are no longer reordering the numbers but just add
      system calls at the end.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      fcec9bf1