1. 25 7月, 2008 40 次提交
    • I
      autofs4: check kernel communication pipe is valid for write · e64be33c
      Ian Kent 提交于
      It is possible for an autofs mount to become catatonic (and for the daemon
      communication pipe to become NULL) after a wait has been initiallized but
      before the request has been sent to the daemon.  We need to check for this
      before sending the request packet.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e64be33c
    • I
      autofs4: add missing kfree · f4c7da02
      Ian Kent 提交于
      It see that the patch tittled "autofs4 - fix pending mount race" is
      missing a change that I had recently made.
      
      It's missing a kfree for the case mutex_lock_interruptible() fails
      to aquire the wait queue mutex.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4c7da02
    • I
      autofs4: fix pending mount race · a1362fe9
      Ian Kent 提交于
      Close a race between a pending mount that is about to finish and a new
      lookup for the same directory.
      
      Process P1 triggers a mount of directory foo.  It sets
      DCACHE_AUTOFS_PENDING in the ->lookup routine, creates a waitq entry for
      'foo', and calls out to the daemon to perform the mount.  The autofs
      daemon will then create the directory 'foo', using a new dentry that will
      be hashed in the dcache.
      
      Before the mount completes, another process, P2, tries to walk into the
      'foo' directory.  The vfs path walking code finds an entry for 'foo' and
      calls the revalidate method.  Revalidate finds that the entry is not
      PENDING (because PENDING was never set on the dentry created by the
      mkdir), but it does find the directory is empty.  Revalidate calls
      try_to_fill_dentry, which sets the PENDING flag and then calls into the
      autofs4 wait code to trigger or wait for a mount of 'foo'.  The wait code
      finds the entry for 'foo' and goes to sleep waiting for the completion of
      the mount.
      
      Yet another process, P3, tries to walk into the 'foo' directory.  This
      process again finds a dentry in the dcache for 'foo', and calls into the
      autofs revalidate code.
      
      The revalidate code finds that the PENDING flag is set, and so calls
      try_to_fill_dentry.
      
      a) try_to_fill_dentry sets the PENDING flag redundantly for this
         dentry, then calls into the autofs4 wait code.
      
      b) the autofs4 wait code takes the waitq mutex and searches for an
         entry for 'foo'
      
      Between a and b, P1 is woken up because the mount completed.  P1 takes the
      wait queue mutex, clears the PENDING flag from the dentry, and removes the
      waitqueue entry for 'foo' from the list.
      
      When it releases the waitq mutex, P3 (eventually) acquires it.  At this
      time, it looks for an existing waitq for 'foo', finds none, and so creates
      a new one and calls out to the daemon to mount the 'foo' directory.
      
      Now, the reason that three processes are required to trigger this race is
      that, because the PENDING flag is not set on the dentry created by mkdir,
      the window for the race would be way to slim for it to ever occur.
      Basically, between the testing of d_mountpoint(dentry) and the taking of
      the waitq mutex, the mount would have to complete and the daemon would
      have to be woken up, and that in turn would have to wake up P1.  This is
      simply impossible.  Add the third process, though, and it becomes slightly
      more likely.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1362fe9
    • I
      autofs4: fix waitq locking · 5a11d4d0
      Ian Kent 提交于
      The autofs4_catatonic_mode() function accesses the wait queue without any
      locking but can be called at any time.  This could lead to a possible
      double free of the name field of the wait and a double fput of the daemon
      communication pipe or an fput of a NULL file pointer.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a11d4d0
    • J
      autofs4: use struct qstr in waitq.c · 70b52a0a
      Jeff Moyer 提交于
      The autofs_wait_queue already contains all of the fields of the
      struct qstr, so change it into a qstr.
      
      This patch, from Jeff Moyer, has been modified a liitle by myself.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b52a0a
    • I
      autofs4: use lookup intent flags to trigger mounts · 6d5cb926
      Ian Kent 提交于
      When an open(2) call is made on an autofs mount point directory that
      already exists and the O_DIRECTORY flag is not used the needed mount
      callback to the daemon is not done. This leads to the path walk
      continuing resulting in a callback to the daemon with an incorrect
      key. open(2) is called without O_DIRECTORY by the "find" utility but
      this should be handled properly anyway.
      
      This happens because autofs needs to use the lookup flags to decide
      when to callback to the daemon to perform a mount to prevent mount
      storms. For example, an autofs indirect mount map that has the "browse"
      option will have the mount point directories are pre-created and the
      stat(2) call made by a color ls against each directory will cause all
      these directories to be mounted. It is unfortunate we need to resort
      to this but mount maps can be quite large. Additionally, if a user
      manually umounts an autofs indirect mount the directory isn't removed
      which also leads to this situation.
      
      To resolve this autofs needs to use the lookup intent flags to enable
      it to make this decision. This patch adds this check and triggers a
      call back if any of the lookup intent flags are set as all these calls
      warrant a mount attempt be requested.
      
      I know that external VFS code which uses the lookup flags is something
      that the VFS would like to eliminate but I have no choice as I can't
      see any other way to do this. A VFS dentry or inode operation callback
      which returns the lookup "type" (requires a definition) would be
      sufficient. But this change is needed now and I'm not aware of the form
      that coming VFS changes will take so I'm not willing to propose anything
      along these lines.
      
      If anyone can provide an alternate method I would be happy to use it.
      
      [akpm@linux-foundation.org: fix build for concurrent VFS changes]
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d5cb926
    • I
      autofs4: don't release directory mutex if called in oz_mode · c432c258
      Ian Kent 提交于
      Since we now delay hashing of dentrys until the ->mkdir() call, droping
      and re-taking the directory mutex within the ->lookup() function when we
      are being called by user space is not needed.  This can lead to a race
      when other processes are attempting to access the same directory during
      mount point directory creation.
      
      In this case we need to hang onto the mutex to ensure we don't get user
      processes trying to create a mount request for a newly created dentry
      after the mount point entry has already been created.  This ensures that
      when we need to check a dentry passed to autofs4_wait(), if it is hashed,
      it is always the mount point dentry and not a new dentry created by
      another lookup during directory creation.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c432c258
    • I
      autofs4: fix symlink name allocation · ef581a74
      Ian Kent 提交于
      The length of the symlink name has been moved but it needs to be set
      before allocating space for it in the dentry info struct.  This corrects a
      mistake in a recent patch.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef581a74
    • I
      autofs4: use look aside list for lookups · 25767378
      Ian Kent 提交于
      A while ago a patch to resolve a deadlock during directory creation was
      merged.  This delayed the hashing of lookup dentrys until the ->mkdir()
      (or ->symlink()) operation completed to ensure we always went through
      ->lookup() instead of also having processes go through ->revalidate() so
      our VFS locking remained consistent.
      
      Now we are seeing a couple of side affects of that change in situations
      with heavy mount activity.
      
      Two cases have been identified:
      
      1) When a mount request is triggered, due to the delayed hashing, the
         directory created by user space for the mount point doesn't have the
         DCACHE_AUTOFS_PENDING flag set.  In the case of an autofs multi-mount
         where a tree of mount point directories are created this can lead to
         the path walk continuing rather than the dentry being sent to the wait
         queue to wait for request completion.  This is because, if the pending
         flag isn't set, the criteria for deciding this is a mount in progress
         fails to hold, namely that the dentry is not a mount point and has no
         subdirectories.
      
      2) A mount request dentry is initially created negative and unhashed.
         It remains this way until the ->mkdir() callback completes.  Since it
         is unhashed a fresh dentry is used when the user space mount request
         creates the mount point directory.  This leaves the original dentry
         negative and unhashed.  But revalidate has no way to tell the VFS that
         the dentry has changed, other than to force another ->lookup() by
         returning false, which is at best wastefull and at worst not possible.
         This results in an -ENOENT return from the original path walk when in
         fact the mount succeeded.
      
      To resolve this we need to ensure that the same dentry is used in all
      calls to ->lookup() during the course of a mount request.  This patch
      achieves that by adding the initial dentry to a look aside list and
      removes it at ->mkdir() or ->symlink() completion (or when the dentry is
      released), since these are the only create operations autofs4 supports.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25767378
    • I
      autofs4: revert - redo lookup in ttfd · caf7da3d
      Ian Kent 提交于
      This patch series enables the use of a single dentry for lookups prior to
      the dentry being hashed and so we no longer need to redo the lookup.  This
      patch reverts the patch of commit
      03379044.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caf7da3d
    • I
      autofs4: don't make expiring dentry negative · 5f6f4f28
      Ian Kent 提交于
      Correct the error of making a positive dentry negative after it has been
      instantiated.
      
      The code that makes this error attempts to re-use the dentry from a
      concurrent expire and mount to resolve a race and the dentry used for the
      lookup must be negative for mounts to trigger in the required cases.  The
      fact is that the dentry doesn't need to be re-used because all that is
      needed is to preserve the flag that indicates an expire is still
      incomplete at the time of the mount request.
      
      This change uses the the dentry to check the flag and wait for the expire
      to complete then discards it instead of attempting to re-use it.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f6f4f28
    • M
      eCryptfs: Make all persistent file opens delayed · 391b52f9
      Michael Halcrow 提交于
      There is no good reason to immediately open the lower file, and that can
      cause problems with files that the user does not intend to immediately
      open, such as device nodes.
      
      This patch removes the persistent file open from the interpose step and
      pushes that to the locations where eCryptfs really does need the lower
      persistent file, such as just before reading or writing the metadata
      stored in the lower file header.
      
      Two functions are jumping to out_dput when they should just be jumping to
      out on error paths.  This patch also fixes these.
      Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      391b52f9
    • M
      eCryptfs: do not try to open device files on mknod · 72b55fff
      Michael Halcrow 提交于
      When creating device nodes, eCryptfs needs to delay actually opening the lower
      persistent file until an application tries to open.  Device handles may not be
      backed by anything when they first come into existence.
      
      [Valdis.Kletnieks@vt.edu: build fix]
      Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
      Cc: <Valdis.Kletnieks@vt.edu}
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72b55fff
    • H
      ecryptfs: inode.c mmap.c use unaligned byteorder helpers · 0a688ad7
      Harvey Harrison 提交于
      Fixe sparse warnings:
      fs/ecryptfs/inode.c:368:15: warning: cast to restricted __be64
      fs/ecryptfs/mmap.c:385:12: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/mmap.c:385:12:    expected unsigned long long [unsigned] [assigned] [usertype] file_size
      fs/ecryptfs/mmap.c:385:12:    got restricted __be64 [usertype] <noident>
      fs/ecryptfs/mmap.c:428:12: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/mmap.c:428:12:    expected unsigned long long [unsigned] [assigned] [usertype] file_size
      fs/ecryptfs/mmap.c:428:12:    got restricted __be64 [usertype] <noident>
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a688ad7
    • H
      ecryptfs: crypto.c use unaligned byteorder helpers · 29335c6a
      Harvey Harrison 提交于
      Fixes the following sparse warnings:
      fs/ecryptfs/crypto.c:1036:8: warning: cast to restricted __be32
      fs/ecryptfs/crypto.c:1038:8: warning: cast to restricted __be32
      fs/ecryptfs/crypto.c:1077:10: warning: cast to restricted __be32
      fs/ecryptfs/crypto.c:1103:6: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/crypto.c:1105:6: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/crypto.c:1124:8: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/crypto.c:1241:21: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/crypto.c:1244:30: warning: incorrect type in assignment (different base types)
      fs/ecryptfs/crypto.c:1414:23: warning: cast to restricted __be32
      fs/ecryptfs/crypto.c:1417:32: warning: cast to restricted __be16
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29335c6a
    • M
      ecryptfs: string copy cleanup · 8f236809
      Miklos Szeredi 提交于
      Clean up overcomplicated string copy, which also gets rid of this
      bogus warning:
      
      fs/ecryptfs/main.c: In function 'ecryptfs_parse_options':
      include/asm/arch/string_32.h:75: warning: array subscript is above array bounds
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f236809
    • E
      ecryptfs: propagate key errors up at mount time · 982363c9
      Eric Sandeen 提交于
      Mounting with invalid key signatures should probably fail, if they were
      specifically requested but not available.
      
      Also fix case checks in process_request_key_err() for the right sign of
      the errnos, as spotted by Jan Tluka.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJan Tluka <jtluka@redhat.com>
      Acked-by: NMichael Halcrow <mhalcrow@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      982363c9
    • T
      ecryptfs: discard ecryptfsd registration messages in miscdev · 6c4c17b0
      Tyler Hicks 提交于
      The userspace eCryptfs daemon sends HELO and QUIT messages to the kernel
      for per-user daemon (un)registration.  These messages are required when
      netlink is used as the transport, but (un)registration is handled by
      opening and closing the device file when miscdev is the transport.  These
      messages should be discarded in the miscdev transport so that a daemon
      isn't registered twice.
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c4c17b0
    • M
      eCryptfs: Privileged kthread for lower file opens · 746f1e55
      Michael Halcrow 提交于
      eCryptfs would really like to have read-write access to all files in the
      lower filesystem.  Right now, the persistent lower file may be opened
      read-only if the attempt to open it read-write fails.  One way to keep
      from having to do that is to have a privileged kthread that can open the
      lower persistent file on behalf of the user opening the eCryptfs file;
      this patch implements this functionality.
      
      This patch will properly allow a less-privileged user to open the eCryptfs
      file, followed by a more-privileged user opening the eCryptfs file, with
      the first user only being able to read and the second user being able to
      both read and write.  eCryptfs currently does this wrong; it will wind up
      calling vfs_write() on a file that was opened read-only.  This is fixed in
      this patch.
      Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Eric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      746f1e55
    • U
      flag parameters add-on: remove epoll_create size param · 9fe5ad9c
      Ulrich Drepper 提交于
      Remove the size parameter from the new epoll_create syscall and renames the
      syscall itself.  The updated test program follows.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fe5ad9c
    • U
      flag parameters: check magic constants · e38b36f3
      Ulrich Drepper 提交于
      This patch adds test that ensure the boundary conditions for the various
      constants introduced in the previous patches is met.  No code is generated.
      
      [akpm@linux-foundation.org: fix alpha]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e38b36f3
    • U
      flag parameters: NONBLOCK in inotify_init · 510df2dd
      Ulrich Drepper 提交于
      This patch adds non-blocking support for inotify_init1.  The
      additional changes needed are minimal.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_NONBLOCK O_NONBLOCK
      
      int
      main (void)
      {
        int fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("inotify_init1(0) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_NONBLOCK);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("inotify_init1(IN_NONBLOCK) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      510df2dd
    • U
      flag parameters: NONBLOCK in pipe · be61a86d
      Ulrich Drepper 提交于
      This patch adds O_NONBLOCK support to pipe2.  It is minimally more involved
      than the patches for eventfd et.al but still trivial.  The interfaces of the
      create_write_pipe and create_read_pipe helper functions were changed and the
      one other caller as well.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fds[2];
        if (syscall (__NR_pipe2, fds, 0) == -1)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
          {
            puts ("pipe2(O_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be61a86d
    • U
      flag parameters: NONBLOCK in timerfd_create · 6b1ef0e6
      Ulrich Drepper 提交于
      This patch adds support for the TFD_NONBLOCK flag to timerfd_create.  The
      additional changes needed are minimal.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_timerfd_create
      # ifdef __x86_64__
      #  define __NR_timerfd_create 283
      # elif defined __i386__
      #  define __NR_timerfd_create 322
      # else
      #  error "need __NR_timerfd_create"
      # endif
      #endif
      
      #define TFD_NONBLOCK O_NONBLOCK
      
      int
      main (void)
      {
        int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0);
        if (fd == -1)
          {
            puts ("timerfd_create(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("timerfd_create(0) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_NONBLOCK);
        if (fd == -1)
          {
            puts ("timerfd_create(TFD_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("timerfd_create(TFD_NONBLOCK) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b1ef0e6
    • U
      flag parameters: NONBLOCK in eventfd · e7d476df
      Ulrich Drepper 提交于
      This patch adds support for the EFD_NONBLOCK flag to eventfd2.  The
      additional changes needed are minimal.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_eventfd2
      # ifdef __x86_64__
      #  define __NR_eventfd2 290
      # elif defined __i386__
      #  define __NR_eventfd2 328
      # else
      #  error "need __NR_eventfd2"
      # endif
      #endif
      
      #define EFD_NONBLOCK O_NONBLOCK
      
      int
      main (void)
      {
        int fd = syscall (__NR_eventfd2, 1, 0);
        if (fd == -1)
          {
            puts ("eventfd2(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("eventfd2(0) sets non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_eventfd2, 1, EFD_NONBLOCK);
        if (fd == -1)
          {
            puts ("eventfd2(EFD_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("eventfd2(EFD_NONBLOCK) does not set non-blocking mode");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d476df
    • U
      flag parameters: NONBLOCK in signalfd · 5fb5e049
      Ulrich Drepper 提交于
      This patch adds support for the SFD_NONBLOCK flag to signalfd4.  The
      additional changes needed are minimal.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <signal.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_signalfd4
      # ifdef __x86_64__
      #  define __NR_signalfd4 289
      # elif defined __i386__
      #  define __NR_signalfd4 327
      # else
      #  error "need __NR_signalfd4"
      # endif
      #endif
      
      #define SFD_NONBLOCK O_NONBLOCK
      
      int
      main (void)
      {
        sigset_t ss;
        sigemptyset (&ss);
        sigaddset (&ss, SIGUSR1);
        int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
        if (fd == -1)
          {
            puts ("signalfd4(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("signalfd4(0) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_NONBLOCK);
        if (fd == -1)
          {
            puts ("signalfd4(SFD_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("signalfd4(SFD_NONBLOCK) does not set non-blocking mode");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fb5e049
    • U
      flag parameters: NONBLOCK in anon_inode_getfd · 99829b83
      Ulrich Drepper 提交于
      Building on the previous change to anon_inode_getfd, this patch introduces
      support for handling of O_NONBLOCK in addition to the already supported
      O_CLOEXEC.  Following patches will take advantage of this support.  As can be
      seen, the additional support for supporting this functionality is minimal.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99829b83
    • U
      flag parameters: inotify_init · 4006553b
      Ulrich Drepper 提交于
      This patch introduces the new syscall inotify_init1 (note: the 1 stands for
      the one parameter the syscall takes, as opposed to no parameter before).  The
      values accepted for this parameter are function-specific and defined in the
      inotify.h header.  Here the values must match the O_* flags, though.  In this
      patch CLOEXEC support is introduced.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("inotify_init1(0) set close-on-exit");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4006553b
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
    • U
      flag parameters: dup2 · 336dd1f7
      Ulrich Drepper 提交于
      This patch adds the new dup3 syscall.  It extends the old dup2 syscall by one
      parameter which is meant to hold a flag value.  Support for the O_CLOEXEC flag
      is added in this patch.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_dup3
      # ifdef __x86_64__
      #  define __NR_dup3 292
      # elif defined __i386__
      #  define __NR_dup3 330
      # else
      #  error "need __NR_dup3"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd = syscall (__NR_dup3, 1, 4, 0);
        if (fd == -1)
          {
            puts ("dup3(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("dup3(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
        if (fd == -1)
          {
            puts ("dup3(O_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("dup3(O_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      336dd1f7
    • U
      flag parameters: epoll_create · a0998b50
      Ulrich Drepper 提交于
      This patch adds the new epoll_create2 syscall.  It extends the old epoll_create
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name EPOLL_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 1, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0998b50
    • U
      flag parameters: timerfd_create · 11fcb6c1
      Ulrich Drepper 提交于
      The timerfd_create syscall already has a flags parameter.  It just is
      unused so far.  This patch changes this by introducing the TFD_CLOEXEC
      flag to set the close-on-exec flag for the returned file descriptor.
      
      A new name TFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_timerfd_create
      # ifdef __x86_64__
      #  define __NR_timerfd_create 283
      # elif defined __i386__
      #  define __NR_timerfd_create 322
      # else
      #  error "need __NR_timerfd_create"
      # endif
      #endif
      
      #define TFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0);
        if (fd == -1)
          {
            puts ("timerfd_create(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("timerfd_create(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("timerfd_create(TFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("timerfd_create(TFD_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11fcb6c1
    • U
      flag parameters: eventfd · b087498e
      Ulrich Drepper 提交于
      This patch adds the new eventfd2 syscall.  It extends the old eventfd
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is EFD_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name EFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_eventfd2
      # ifdef __x86_64__
      #  define __NR_eventfd2 290
      # elif defined __i386__
      #  define __NR_eventfd2 328
      # else
      #  error "need __NR_eventfd2"
      # endif
      #endif
      
      #define EFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_eventfd2, 1, 0);
        if (fd == -1)
          {
            puts ("eventfd2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("eventfd2(0) sets close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("eventfd2(EFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b087498e
    • U
      flag parameters: signalfd · 9deb27ba
      Ulrich Drepper 提交于
      This patch adds the new signalfd4 syscall.  It extends the old signalfd
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name SFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <signal.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_signalfd4
      # ifdef __x86_64__
      #  define __NR_signalfd4 289
      # elif defined __i386__
      #  define __NR_signalfd4 327
      # else
      #  error "need __NR_signalfd4"
      # endif
      #endif
      
      #define SFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        sigset_t ss;
        sigemptyset (&ss);
        sigaddset (&ss, SIGUSR1);
        int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
        if (fd == -1)
          {
            puts ("signalfd4(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("signalfd4(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("signalfd4(SFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9deb27ba
    • U
      flag parameters: anon_inode_getfd extension · 7d9dbca3
      Ulrich Drepper 提交于
      This patch just extends the anon_inode_getfd interface to take an additional
      parameter with a flag value.  The flag value is passed on to
      get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag.
      
      No actual semantic changes here, the changed callers all pass 0 for now.
      
      [akpm@linux-foundation.org: KVM fix]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d9dbca3
    • A
    • J
      fs: check for statfs overflow · f4a67cce
      Jon Tollefson 提交于
      Adds a check for an overflow in the filesystem size so if someone is
      checking with statfs() on a 16G blocksize hugetlbfs in a 32bit binary that
      it will report back EOVERFLOW instead of a size of 0.
      Acked-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4a67cce
    • A
      hugetlbfs: per mount huge page sizes · a137e1cc
      Andi Kleen 提交于
      Add the ability to configure the hugetlb hstate used on a per mount basis.
      
      - Add a new pagesize= option to the hugetlbfs mount that allows setting
        the page size
      - This option causes the mount code to find the hstate corresponding to the
        specified size, and sets up a pointer to the hstate in the mount's
        superblock.
      - Change the hstate accessors to use this information rather than the
        global_hstate they were using (requires a slight change in mm/memory.c
        so we don't NULL deref in the error-unmap path -- see comments).
      
      [np: take hstate out of hugetlbfs inode and vma->vm_private_data]
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Acked-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a137e1cc
    • A
      hugetlb: modular state for hugetlb page size · a5516438
      Andi Kleen 提交于
      The goal of this patchset is to support multiple hugetlb page sizes.  This
      is achieved by introducing a new struct hstate structure, which
      encapsulates the important hugetlb state and constants (eg.  huge page
      size, number of huge pages currently allocated, etc).
      
      The hstate structure is then passed around the code which requires these
      fields, they will do the right thing regardless of the exact hstate they
      are operating on.
      
      This patch adds the hstate structure, with a single global instance of it
      (default_hstate), and does the basic work of converting hugetlb to use the
      hstate.
      
      Future patches will add more hstate structures to allow for different
      hugetlbfs mounts to have different page sizes.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Acked-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5516438
    • E
      vmallocinfo: add NUMA information · a47a126a
      Eric Dumazet 提交于
      Christoph recently added /proc/vmallocinfo file to get information about
      vmalloc allocations.
      
      This patch adds NUMA specific information, giving number of pages
      allocated on each memory node.
      
      This should help to check that vmalloc() is able to respect NUMA policies.
      
      Example of output on a four nodes machine (one cpu per node)
      
      1) network hash tables are evenly spreaded on four nodes (OK) (Same
         point for inodes and dentries hash tables)
      
      2) iptables tables (x_tables) are correctly allocated on each cpu node
         (OK).
      
      3) sys_swapon() allocates its memory from one node only.
      
      4) each loaded module is using memory on one node.
      
      Sysadmins could tune their setup to change points 3) and 4) if necessary.
      
      grep "pages="  /proc/vmallocinfo
      0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
      0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
      0xffffc2000031a000-0xffffc2000031d000   12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
      0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
      0xffffc2000033e000-0xffffc20000341000   12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2
      0xffffc20000341000-0xffffc20000344000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
      0xffffc20000344000-0xffffc20000347000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
      0xffffc20000347000-0xffffc2000034a000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
      0xffffc2000034a000-0xffffc2000034d000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
      0xffffc20004381000-0xffffc20004402000  528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32
      0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256
      0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
      0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743
      0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
      0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
      0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
      0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
      0xffffffffa0022000-0xffffffffa0028000   24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
      0xffffffffa0028000-0xffffffffa0050000  163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
      0xffffffffa0050000-0xffffffffa0052000    8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
      0xffffffffa0052000-0xffffffffa0056000   16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
      0xffffffffa0056000-0xffffffffa0081000  176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
      0xffffffffa0081000-0xffffffffa00ae000  184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
      0xffffffffa00ae000-0xffffffffa00b1000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
      0xffffffffa00b1000-0xffffffffa00b9000   32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
      0xffffffffa00b9000-0xffffffffa00c4000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
      0xffffffffa00c6000-0xffffffffa00e0000  106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
      0xffffffffa00e0000-0xffffffffa00f1000   69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
      0xffffffffa00f1000-0xffffffffa00f4000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
      0xffffffffa00f4000-0xffffffffa00f7000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
      
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47a126a