1. 09 10月, 2014 1 次提交
    • T
      fs: namespace: suppress 'may be used uninitialized' warnings · b8850d1f
      Tim Gardner 提交于
      The gcc version 4.9.1 compiler complains Even though it isn't possible for
      these variables to not get initialized before they are used.
      
      fs/namespace.c: In function ‘SyS_mount’:
      fs/namespace.c:2720:8: warning: ‘kernel_dev’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
              ^
      fs/namespace.c:2699:8: note: ‘kernel_dev’ was declared here
        char *kernel_dev;
              ^
      fs/namespace.c:2720:8: warning: ‘kernel_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
              ^
      fs/namespace.c:2697:8: note: ‘kernel_type’ was declared here
        char *kernel_type;
              ^
      
      Fix the warnings by simplifying copy_mount_string() as suggested by Al Viro.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b8850d1f
  2. 22 4月, 2014 1 次提交
    • J
      locks: rename file-private locks to "open file description locks" · 0d3f7a2d
      Jeff Layton 提交于
      File-private locks have been merged into Linux for v3.15, and *now*
      people are commenting that the name and macro definitions for the new
      file-private locks suck.
      
      ...and I can't even disagree. The names and command macros do suck.
      
      We're going to have to live with these for a long time, so it's
      important that we be happy with the names before we're stuck with them.
      The consensus on the lists so far is that they should be rechristened as
      "open file description locks".
      
      The name isn't a big deal for the kernel, but the command macros are not
      visually distinct enough from the traditional POSIX lock macros. The
      glibc and documentation folks are recommending that we change them to
      look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
      programmer will typo one of the commands wrong, and also makes it easier
      to spot this difference when reading code.
      
      This patch makes the following changes that I think are necessary before
      v3.15 ships:
      
      1) rename the command macros to their new names. These end up in the uapi
         headers and so are part of the external-facing API. It turns out that
         glibc doesn't actually use the fcntl.h uapi header, but it's hard to
         be sure that something else won't. Changing it now is safest.
      
      2) make the the /proc/locks output display these as type "OFDLCK"
      
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Carlos O'Donell <carlos@redhat.com>
      Cc: Stefan Metzmacher <metze@samba.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Frank Filz <ffilzlnx@mindspring.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      0d3f7a2d
  3. 31 3月, 2014 1 次提交
    • J
      locks: add new fcntl cmd values for handling file private locks · 5d50ffd7
      Jeff Layton 提交于
      Due to some unfortunate history, POSIX locks have very strange and
      unhelpful semantics. The thing that usually catches people by surprise
      is that they are dropped whenever the process closes any file descriptor
      associated with the inode.
      
      This is extremely problematic for people developing file servers that
      need to implement byte-range locks. Developers often need a "lock
      management" facility to ensure that file descriptors are not closed
      until all of the locks associated with the inode are finished.
      
      Additionally, "classic" POSIX locks are owned by the process. Locks
      taken between threads within the same process won't conflict with one
      another, which renders them useless for synchronization between threads.
      
      This patchset adds a new type of lock that attempts to address these
      issues. These locks conflict with classic POSIX read/write locks, but
      have semantics that are more like BSD locks with respect to inheritance
      and behavior on close.
      
      This is implemented primarily by changing how fl_owner field is set for
      these locks. Instead of having them owned by the files_struct of the
      process, they are instead owned by the filp on which they were acquired.
      Thus, they are inherited across fork() and are only released when the
      last reference to a filp is put.
      
      These new semantics prevent them from being merged with classic POSIX
      locks, even if they are acquired by the same process. These locks will
      also conflict with classic POSIX locks even if they are acquired by
      the same process or on the same file descriptor.
      
      The new locks are managed using a new set of cmd values to the fcntl()
      syscall. The initial implementation of this converts these values to
      "classic" cmd values at a fairly high level, and the details are not
      exposed to the underlying filesystem. We may eventually want to push
      this handing out to the lower filesystem code but for now I don't
      see any need for it.
      
      Also, note that with this implementation the new cmd values are only
      available via fcntl64() on 32-bit arches. There's little need to
      add support for legacy apps on a new interface like this.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      5d50ffd7
  4. 06 3月, 2014 2 次提交
    • H
      fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types · 932602e2
      Heiko Carstens 提交于
      Some fs compat system calls have unsigned long parameters instead of
      compat_ulong_t.
      In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
      performs proper zero and sign extension convert all 64 bit parameters
      their corresponding 32 bit counterparts.
      
      compat_sys_io_getevents() is a bit different: the non-compat version
      has signed parameters for the "min_nr" and "nr" parameters while the
      compat version has unsigned parameters.
      So change this as well. For all practical purposes this shouldn't make
      any difference (doesn't fix a real bug).
      Also introduce a generic compat_aio_context_t type which can be used
      everywhere.
      The access_ok() check within compat_sys_io_getevents() got also removed
      since the non-compat sys_io_getevents() should be able to handle
      everything anyway.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      932602e2
    • H
      fs/compat: convert to COMPAT_SYSCALL_DEFINE · 625b1d7e
      Heiko Carstens 提交于
      Convert all compat system call functions where all parameter types
      have a size of four or less than four bytes, or are pointer types
      to COMPAT_SYSCALL_DEFINE.
      The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
      zero and sign extension to 64 bit of all parameters if needed.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      625b1d7e
  5. 04 3月, 2014 1 次提交
    • H
      compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64 · 0473c9b5
      Heiko Carstens 提交于
      For architecture dependent compat syscalls in common code an architecture
      must define something like __ARCH_WANT_<WHATEVER> if it wants to use the
      code.
      This however is not true for compat_sys_getdents64 for which architectures
      must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code.
      
      This leads to the situation where all architectures, except mips, get the
      compat code but only x86_64, arm64 and the generic syscall architectures
      actually use it.
      
      So invert the logic, so that architectures actively must do something to
      get the compat code.
      
      This way a couple of architectures get rid of otherwise dead code.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      0473c9b5
  6. 03 2月, 2014 1 次提交
    • H
      compat: Get rid of (get|put)_compat_time(val|spec) · 81993e81
      H. Peter Anvin 提交于
      We have two APIs for compatiblity timespec/val, with confusingly
      similar names.  compat_(get|put)_time(val|spec) *do* handle the case
      where COMPAT_USE_64BIT_TIME is set, whereas
      (get|put)_compat_time(val|spec) do not.  This is an accident waiting
      to happen.
      
      Clean it up by favoring the full-service version; the limited version
      is replaced with double-underscore versions static to kernel/compat.c.
      
      A common pattern is to convert a struct timespec to kernel format in
      an allocation on the user stack.  Unfortunately it is open-coded in
      several places.  Since this allocation isn't actually needed if
      COMPAT_USE_64BIT_TIME is true (since user format == kernel format)
      encapsulate that whole pattern into the function
      compat_convert_timespec().  An equivalent function should be written
      for struct timeval if it is needed in the future.
      
      Finally, get rid of compat_(get|put)_timeval_convert(): each was only
      used once, and the latter was not even doing what the function said
      (no conversion actually was being done.)  Moving the conversion into
      compat_sys_settimeofday() itself makes the code much more similar to
      sys_settimeofday() itself.
      
      v3: Remove unused compat_convert_timeval().
      
      v2: Drop bogus "const" in the destination argument for
          compat_convert_time*().
      
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Tested-by: NH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      81993e81
  7. 29 6月, 2013 3 次提交
  8. 08 5月, 2013 1 次提交
  9. 05 5月, 2013 1 次提交
  10. 10 4月, 2013 2 次提交
  11. 13 3月, 2013 1 次提交
    • M
      Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys · 8aec0f5d
      Mathieu Desnoyers 提交于
      Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
      compat_process_vm_rw() shows that the compatibility code requires an
      explicit "access_ok()" check before calling
      compat_rw_copy_check_uvector(). The same difference seems to appear when
      we compare fs/read_write.c:do_readv_writev() to
      fs/compat.c:compat_do_readv_writev().
      
      This subtle difference between the compat and non-compat requirements
      should probably be debated, as it seems to be error-prone. In fact,
      there are two others sites that use this function in the Linux kernel,
      and they both seem to get it wrong:
      
      Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
      also ends up calling compat_rw_copy_check_uvector() through
      aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
      be missing. Same situation for
      security/keys/compat.c:compat_keyctl_instantiate_key_iov().
      
      I propose that we add the access_ok() check directly into
      compat_rw_copy_check_uvector(), so callers don't have to worry about it,
      and it therefore makes the compat call code similar to its non-compat
      counterpart. Place the access_ok() check in the same location where
      copy_from_user() can trigger a -EFAULT error in the non-compat code, so
      the ABI behaviors are alike on both compat and non-compat.
      
      While we are here, fix compat_do_readv_writev() so it checks for
      compat_rw_copy_check_uvector() negative return values.
      
      And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
      handling.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aec0f5d
  12. 04 3月, 2013 4 次提交
  13. 04 2月, 2013 2 次提交
  14. 13 10月, 2012 1 次提交
    • J
      vfs: define struct filename and have getname() return it · 91a27b2a
      Jeff Layton 提交于
      getname() is intended to copy pathname strings from userspace into a
      kernel buffer. The result is just a string in kernel space. It would
      however be quite helpful to be able to attach some ancillary info to
      the string.
      
      For instance, we could attach some audit-related info to reduce the
      amount of audit-related processing needed. When auditing is enabled,
      we could also call getname() on the string more than once and not
      need to recopy it from userspace.
      
      This patchset converts the getname()/putname() interfaces to return
      a struct instead of a string. For now, the struct just tracks the
      string in kernel space and the original userland pointer for it.
      
      Later, we'll add other information to the struct as it becomes
      convenient.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      91a27b2a
  15. 03 10月, 2012 1 次提交
    • C
      compat: fs: Generic compat_sys_sendfile implementation · 8f9c0119
      Catalin Marinas 提交于
      This function is used by sparc, powerpc and arm64 for compat support.
      The patch adds a generic implementation which calls do_sendfile()
      directly and avoids set_fs().
      
      The sparc architecture has wrappers for the sign extensions while
      powerpc relies on the compiler to do the this. The patch adds wrappers
      for powerpc to handle the u32->int type conversion.
      
      compat_sys_sendfile64() can be replaced by a sys_sendfile() call since
      compat_loff_t has the same size as off_t on a 64-bit system.
      
      On powerpc, the patch also changes the 64-bit sendfile call from
      sys_sendile64 to sys_sendfile.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8f9c0119
  16. 27 9月, 2012 1 次提交
  17. 21 8月, 2012 1 次提交
  18. 02 6月, 2012 1 次提交
  19. 01 6月, 2012 1 次提交
  20. 30 5月, 2012 1 次提交
  21. 16 5月, 2012 1 次提交
  22. 29 2月, 2012 1 次提交
  23. 21 2月, 2012 1 次提交
  24. 14 2月, 2012 1 次提交
  25. 04 1月, 2012 2 次提交
  26. 01 11月, 2011 1 次提交
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
  27. 28 10月, 2011 1 次提交
    • E
      compat: sync compat_stats with statfs. · 1448c721
      Eric W. Biederman 提交于
      This was found by inspection while tracking a similar
      bug in compat_statfs64, that has been fixed in mainline
      since decemeber.
      
      - This fixes a bug where not all of the f_spare fields
        were cleared on mips and s390.
      - Add the f_flags field to struct compat_statfs
      - Copy f_flags to userspace in case someone cares.
      - Use __clear_user to copy the f_spare field to userspace
        to ensure that all of the elements of f_spare are cleared.
        On some architectures f_spare is has 5 ints and on some
        architectures f_spare only has 4 ints.  Which makes
        the previous technique of clearing each int individually
        broken.
      
      I don't expect anyone actually uses the old statfs system
      call anymore but if they do let them benefit from having
      the compat and the native version working the same.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      1448c721
  28. 31 8月, 2011 1 次提交
  29. 27 8月, 2011 1 次提交
  30. 16 7月, 2011 1 次提交
    • N
      nfsd: Remove deprecated nfsctl system call and related code. · 49b28684
      NeilBrown 提交于
      As promised in feature-removal-schedule.txt it is time to
      remove the nfsctl system call.
      
      Userspace has perferred to not use this call throughout 2.6 and it has been
      excluded in the default configuration since 2.6.36 (9 months ago).
      
      So this patch removes all the code that was being compiled out.
      
      There are still references to sys_nfsctl in various arch systemcall tables
      and related code.  These should be cleaned out too, probably in the next
      merge window.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      49b28684
  31. 09 4月, 2011 1 次提交