1. 17 10月, 2007 2 次提交
  2. 01 8月, 2007 1 次提交
  3. 25 7月, 2007 1 次提交
    • U
      fallocate syscall interface deficiency · 0d786d4a
      Ulrich Drepper 提交于
      The fallocate syscall returns ENOSYS in case the filesystem does not support
      the operation and expects the userlevel code to fill in.  This is good in
      concept.
      
      The problem is that the libc code for old kernels should be able to
      distinguish the case where the syscall is not at all available vs not
      functioning for a specific mount point.  As is this is not possible and we
      always have to invoke the syscall even if the kernel doesn't support it.
      
      I suggest the following patch.  Using EOPNOTSUPP is IMO the right thing to do.
      
      Cc: Amit Arora <aarora@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d786d4a
  4. 18 7月, 2007 1 次提交
    • A
      sys_fallocate() implementation on i386, x86_64 and powerpc · 97ac7350
      Amit Arora 提交于
      fallocate() is a new system call being proposed here which will allow
      applications to preallocate space to any file(s) in a file system.
      Each file system implementation that wants to use this feature will need
      to support an inode operation called ->fallocate().
      Applications can use this feature to avoid fragmentation to certain
      level and thus get faster access speed. With preallocation, applications
      also get a guarantee of space for particular file(s) - even if later the
      the system becomes full.
      
      Currently, glibc provides an interface called posix_fallocate() which
      can be used for similar cause. Though this has the advantage of working
      on all file systems, but it is quite slow (since it writes zeroes to
      each block that has to be preallocated). Without a doubt, file systems
      can do this more efficiently within the kernel, by implementing
      the proposed fallocate() system call. It is expected that
      posix_fallocate() will be modified to call this new system call first
      and incase the kernel/filesystem does not implement it, it should fall
      back to the current implementation of writing zeroes to the new blocks.
      ToDos:
      1. Implementation on other architectures (other than i386, x86_64,
         and ppc). Patches for s390(x) and ia64 are already available from
         previous posts, but it was decided that they should be added later
         once fallocate is in the mainline. Hence not including those patches
         in this take.
      2. Changes to glibc,
         a) to support fallocate() system call
         b) to make posix_fallocate() and posix_fallocate64() call fallocate()
      Signed-off-by: NAmit Arora <aarora@in.ibm.com>
      97ac7350
  5. 17 7月, 2007 2 次提交
    • U
      O_CLOEXEC for SCM_RIGHTS · 4a19542e
      Ulrich Drepper 提交于
      Part two in the O_CLOEXEC saga: adding support for file descriptors received
      through Unix domain sockets.
      
      The patch is once again pretty minimal, it introduces a new flag for recvmsg
      and passes it just like the existing MSG_CMSG_COMPAT flag.  I think this bit
      is not used otherwise but the networking people will know better.
      
      This new flag is not recognized by recvfrom and recv.  These functions cannot
      be used for that purpose and the asymmetry this introduces is not worse than
      the already existing MSG_CMSG_COMPAT situations.
      
      The patch must be applied on the patch which introduced O_CLOEXEC.  It has to
      remove static from the new get_unused_fd_flags function but since scm.c cannot
      live in a module the function still hasn't to be exported.
      
      Here's a test program to make sure the code works.  It's so much longer than
      the actual patch...
      
      #include <errno.h>
      #include <error.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/socket.h>
      #include <sys/un.h>
      
      #ifndef O_CLOEXEC
      # define O_CLOEXEC 02000000
      #endif
      #ifndef MSG_CMSG_CLOEXEC
      # define MSG_CMSG_CLOEXEC 0x40000000
      #endif
      
      int
      main (int argc, char *argv[])
      {
        if (argc > 1)
          {
            int fd = atol (argv[1]);
            printf ("child: fd = %d\n", fd);
            if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
              {
                puts ("file descriptor valid in child");
                return 1;
              }
            return 0;
      
          }
      
        struct sockaddr_un sun;
        strcpy (sun.sun_path, "./testsocket");
        sun.sun_family = AF_UNIX;
      
        char databuf[] = "hello";
        struct iovec iov[1];
        iov[0].iov_base = databuf;
        iov[0].iov_len = sizeof (databuf);
      
        union
        {
          struct cmsghdr hdr;
          char bytes[CMSG_SPACE (sizeof (int))];
        } buf;
        struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
                              .msg_control = buf.bytes,
                              .msg_controllen = sizeof (buf) };
        struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);
      
        cmsg->cmsg_level = SOL_SOCKET;
        cmsg->cmsg_type = SCM_RIGHTS;
        cmsg->cmsg_len = CMSG_LEN (sizeof (int));
      
        msg.msg_controllen = cmsg->cmsg_len;
      
        pid_t child = fork ();
        if (child == -1)
          error (1, errno, "fork");
        if (child == 0)
          {
            int sock = socket (PF_UNIX, SOCK_STREAM, 0);
            if (sock < 0)
              error (1, errno, "socket");
      
            if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
              error (1, errno, "bind");
            if (listen (sock, SOMAXCONN) < 0)
              error (1, errno, "listen");
      
            int conn = accept (sock, NULL, NULL);
            if (conn == -1)
              error (1, errno, "accept");
      
            *(int *) CMSG_DATA (cmsg) = sock;
            if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
              error (1, errno, "sendmsg");
      
            return 0;
          }
      
        /* For a test suite this should be more robust like a
           barrier in shared memory.  */
        sleep (1);
      
        int sock = socket (PF_UNIX, SOCK_STREAM, 0);
        if (sock < 0)
          error (1, errno, "socket");
      
        if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
          error (1, errno, "connect");
        unlink (sun.sun_path);
      
        *(int *) CMSG_DATA (cmsg) = -1;
      
        if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
          error (1, errno, "recvmsg");
      
        int fd = *(int *) CMSG_DATA (cmsg);
        if (fd == -1)
          error (1, 0, "no descriptor received");
      
        char fdname[20];
        snprintf (fdname, sizeof (fdname), "%d", fd);
        execl ("/proc/self/exe", argv[0], fdname, NULL);
        puts ("execl failed");
        return 1;
      }
      
      [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Michael Buesch <mb@bu3sch.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a19542e
    • U
      Introduce O_CLOEXEC · f23513e8
      Ulrich Drepper 提交于
      The problem is as follows: in multi-threaded code (or more correctly: all
      code using clone() with CLONE_FILES) we have a race when exec'ing.
      
         thread #1                       thread #2
      
         fd=open()
      
                                         fork + exec
      
        fcntl(fd,F_SETFD,FD_CLOEXEC)
      
      In some applications this can happen frequently.  Take a web browser.  One
      thread opens a file and another thread starts, say, an external PDF viewer.
       The result can even be a security issue if that open file descriptor
      refers to a sensitive file and the external program can somehow be tricked
      into using that descriptor.
      
      Just adding O_CLOEXEC support to open() doesn't solve the whole set of
      problems.  There are other ways to create file descriptors (socket,
      epoll_create, Unix domain socket transfer, etc).  These can and should be
      addressed separately though.  open() is such an easy case that it makes not
      much sense putting the fix off.
      
      The test program:
      
      #include <errno.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      
      #ifndef O_CLOEXEC
      # define O_CLOEXEC 02000000
      #endif
      
      int
      main (int argc, char *argv[])
      {
        int fd;
        if (argc > 1)
          {
            fd = atol (argv[1]);
            printf ("child: fd = %d\n", fd);
            if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
              {
                puts ("file descriptor valid in child");
                return 1;
              }
            return 0;
          }
      
        fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC);
        printf ("in parent: new fd = %d\n", fd);
        char buf[20];
        snprintf (buf, sizeof (buf), "%d", fd);
        execl ("/proc/self/exe", argv[0], buf, NULL);
        puts ("execl failed");
        return 1;
      }
      
      [kyle@parisc-linux.org: parisc fix]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NKyle McMartin <kyle@parisc-linux.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f23513e8
  6. 09 5月, 2007 2 次提交
  7. 11 12月, 2006 1 次提交
    • V
      [PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69
      Vadim Lobanov 提交于
      Currently, each fdtable supports three dynamically-sized arrays of data: the
      fdarray and two fdsets.  The code allows the number of fds supported by the
      fdarray (fdtable->max_fds) to differ from the number of fds supported by each
      of the fdsets (fdtable->max_fdset).
      
      In practice, it is wasteful for these two sizes to differ: whenever we hit a
      limit on the smaller-capacity structure, we will reallocate the entire fdtable
      and all the dynamic arrays within it, so any delta in the memory used by the
      larger-capacity structure will never be touched at all.
      
      Rather than hogging this excess, we shouldn't even allocate it in the first
      place, and keep the capacities of the fdarray and the fdsets equal.  This
      patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
      code becomes simpler.
      Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bbea9f69
  8. 09 12月, 2006 2 次提交
    • J
      [PATCH] VFS: change struct file to use struct path · 0f7fc9e4
      Josef "Jeff" Sipek 提交于
      This patch changes struct file to use struct path instead of having
      independent pointers to struct dentry and struct vfsmount, and converts all
      users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.
      
      Additionally, it adds two #define's to make the transition easier for users of
      the f_dentry and f_vfsmnt.
      Signed-off-by: NJosef "Jeff" Sipek <jsipek@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0f7fc9e4
    • P
      [PATCH] tty: ->signal->tty locking · 24ec839c
      Peter Zijlstra 提交于
      Fix the locking of signal->tty.
      
      Use ->sighand->siglock to protect ->signal->tty; this lock is already used
      by most other members of ->signal/->sighand.  And unless we are 'current'
      or the tasklist_lock is held we need ->siglock to access ->signal anyway.
      
      (NOTE: sys_unshare() is broken wrt ->sighand locking rules)
      
      Note that tty_mutex is held over tty destruction, so while holding
      tty_mutex any tty pointer remains valid.  Otherwise the lifetime of ttys
      are governed by their open file handles.  This leaves some holes for tty
      access from signal->tty (or any other non file related tty access).
      
      It solves the tty SLAB scribbles we were seeing.
      
      (NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
             be examined by someone familiar with the security framework, I think
             it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
             invocations)
      
      [schwidefsky@de.ibm.com: 3270 fix]
      [akpm@osdl.org: various post-viro fixes]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NAlan Cox <alan@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      24ec839c
  9. 01 10月, 2006 2 次提交
  10. 30 9月, 2006 2 次提交
    • E
      [PATCH] fix wrong error code on interrupted close syscalls · ee731f4f
      Ernie Petrides 提交于
      The problem is that close() syscalls can call a file system's flush
      handler, which in turn might sleep interruptibly and ultimately pass back
      an -ERESTARTSYS return value.  This happens for files backed by an
      interruptible NFS mount under nfs_file_flush() when a large file has just
      been written and nfs_wait_bit_interruptible() detects that there is a
      signal pending.
      
      I have a test case where the "strace" command is used to attach to a
      process sleeping in such a close().  Since the SIGSTOP is forced onto the
      victim process (removing it from the thread's "blocked" mask in
      force_sig_info()), the RPC wait is interrupted and the close() is
      terminated early.
      
      But the file table entry has already been cleared before the flush handler
      was called.  Thus, when the syscall is restarted, the file descriptor
      appears closed and an EBADF error is returned (which is wrong).  What's
      worse, there is the hypothetical case where another thread of a
      multi-threaded application might have reused the file descriptor, in which
      case that file would be mistakenly closed.
      
      The bottom line is that close() syscalls are not restartable, and thus
      -ERESTARTSYS return values should be mapped to -EINTR.  This is consistent
      with the close(2) manual page.  The fix is below.
      Signed-off-by: NErnie Petrides <petrides@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee731f4f
    • M
      [PATCH] vfs: define new lookup flag for chdir · 650a8983
      Miklos Szeredi 提交于
      In the "operation does permission checking" model used by fuse, chdir
      permission is not checked, since there's no chdir method.
      
      For this case set a lookup flag, which will be passed to ->permission(), so
      fuse can distinguish it from permission checks for other operations.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      650a8983
  11. 26 6月, 2006 1 次提交
    • P
      [PATCH] ftruncate does not always update m/ctime · 6e656be8
      Peter Staubach 提交于
      In the course of trying to track down a bug where a file mtime was not
      being updated correctly, it was discovered that the m/ctime updates were
      not quite being handled correctly for ftruncate() calls.
      
      Quoth SUSv3:
      
      open(2):
      
              If O_TRUNC is set and the file did previously exist, upon
              successful completion, open() shall mark for update the st_ctime
              and st_mtime fields of the file.
      
      truncate(2):
      
              Upon successful completion, if the file size is changed, this
              function shall mark for update the st_ctime and st_mtime fields
              of the file, and the S_ISUID and S_ISGID bits of the file mode
              may be cleared.
      
      ftruncate(2):
      
              Upon successful completion, if fildes refers to a regular file,
              the ftruncate() function shall mark for update the st_ctime and
              st_mtime fields of the file and the S_ISUID and S_ISGID bits of
              the file mode may be cleared. If the ftruncate() function is
              unsuccessful, the file is unaffected.
      
      The open(O_TRUNC) and truncate cases were being handled correctly, but the
      ftruncate case was being handled like the truncate case.  The semantics of
      truncate and ftruncate don't quite match, so ftruncate needs to be handled
      slightly differently.
      
      The attached patch addresses this issue for ftruncate(2).
      
      My thanx to Stephen Tweedie and Trond Myklebust for their help in
      understanding the situation and semantics.
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      Cc: "Stephen C. Tweedie" <sct@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e656be8
  12. 23 6月, 2006 2 次提交
  13. 20 6月, 2006 1 次提交
    • A
      [PATCH] log more info for directory entry change events · 9c937dcc
      Amy Griffis 提交于
      When an audit event involves changes to a directory entry, include
      a PATH record for the directory itself.  A few other notable changes:
      
          - fixed audit_inode_child() hooks in fsnotify_move()
          - removed unused flags arg from audit_inode()
          - added audit log routines for logging a portion of a string
      
      Here's some sample output.
      
      before patch:
      type=SYSCALL msg=audit(1149821605.320:26): arch=40000003 syscall=39 success=yes exit=0 a0=bf8d3c7c a1=1ff a2=804e1b8 a3=bf8d3c7c items=1 ppid=739 pid=800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
      type=CWD msg=audit(1149821605.320:26):  cwd="/root"
      type=PATH msg=audit(1149821605.320:26): item=0 name="foo" parent=164068 inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0
      
      after patch:
      type=SYSCALL msg=audit(1149822032.332:24): arch=40000003 syscall=39 success=yes exit=0 a0=bfdd9c7c a1=1ff a2=804e1b8 a3=bfdd9c7c items=2 ppid=714 pid=777 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
      type=CWD msg=audit(1149822032.332:24):  cwd="/root"
      type=PATH msg=audit(1149822032.332:24): item=0 name="/root" inode=164068 dev=03:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_dir_t:s0
      type=PATH msg=audit(1149822032.332:24): item=1 name="foo" inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0
      Signed-off-by: NAmy Griffis <amy.griffis@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9c937dcc
  14. 16 5月, 2006 1 次提交
  15. 19 4月, 2006 2 次提交
    • L
      x86: be careful about tailcall breakage for sys_open[at] too · 385910f2
      Linus Torvalds 提交于
      Came up through a quick grep for other cases similar to the ftruncate()
      one in commit 0a489cb3.
      
      Also, add a comment, so that people who read the code understand why we
      do what looks like a no-op.
      
      (Again, this won't actually matter to any sane user, since libc will
      save and restore the register gcc stomps on, but it's still wrong to
      stomp on it)
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      385910f2
    • L
      x86: don't allow tail-calls in sys_ftruncate[64]() · 0a489cb3
      Linus Torvalds 提交于
      Gcc thinks it owns the incoming argument stack, but that's not true for
      "asmlinkage" functions, and it corrupts the caller-set-up argument stack
      when it pushes the third argument onto the stack.  Which can result in
      %ebx getting corrupted in user space.
      
      Now, normally nobody sane would ever notice, since libc will save and
      restore %ebx anyway over the system call, but it's still wrong.
      
      I'd much rather have "asmlinkage" tell gcc directly that it doesn't own
      the stack, but no such attribute exists, so we're stuck with our hacky
      manual "prevent_tail_call()" macro once more (we've had the same issue
      before with sys_waitpid() and sys_wait4()).
      
      Thanks to Hans-Werner Hilse <hilse@sub.uni-goettingen.de> for reporting
      the issue and testing the fix.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0a489cb3
  16. 26 3月, 2006 1 次提交
  17. 23 3月, 2006 1 次提交
    • E
      [PATCH] Shrinks sizeof(files_struct) and better layout · 0c9e63fd
      Eric Dumazet 提交于
      1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
         platforms, lowering kmalloc() allocated space by 50%.
      
      2) Reduce the size of (files_struct), using a special 32 bits (or
         64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
         close_on_exec_init and open_fds_init fields.  This save some ram (248
         bytes per task) as most tasks dont open more than 32 files.  D-Cache
         footprint for such tasks is also reduced to the minimum.
      
      3) Reduce size of allocated fdset.  Currently two full pages are
         allocated, that is 32768 bits on x86 for example, and way too much.  The
         minimum is now L1_CACHE_BYTES.
      
      UP and SMP should benefit from this patch, because most tasks will touch
      only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
      (next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
      same cache line)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0c9e63fd
  18. 21 3月, 2006 1 次提交
    • A
      [PATCH] Collect more inode information during syscall processing. · 73241ccc
      Amy Griffis 提交于
      This patch augments the collection of inode info during syscall
      processing. It represents part of the functionality that was provided
      by the auditfs patch included in RHEL4.
      
      Specifically, it:
      
      - Collects information for target inodes created or removed during
        syscalls.  Previous code only collects information for the target
        inode's parent.
      
      - Adds the audit_inode() hook to syscalls that operate on a file
        descriptor (e.g. fchown), enabling audit to do inode filtering for
        these calls.
      
      - Modifies filtering code to check audit context for either an inode #
        or a parent inode # matching a given rule.
      
      - Modifies logging to provide inode # for both parent and child.
      
      - Protect debug info from NULL audit_names.name.
      
      [AV: folded a later typo fix from the same author]
      Signed-off-by: NAmy Griffis <amy.griffis@hp.com>
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73241ccc
  19. 19 1月, 2006 1 次提交
    • U
      [PATCH] vfs: *at functions: core · 5590ff0d
      Ulrich Drepper 提交于
      Here is a series of patches which introduce in total 13 new system calls
      which take a file descriptor/filename pair instead of a single file
      name.  These functions, openat etc, have been discussed on numerous
      occasions.  They are needed to implement race-free filesystem traversal,
      they are necessary to implement a virtual per-thread current working
      directory (think multi-threaded backup software), etc.
      
      We have in glibc today implementations of the interfaces which use the
      /proc/self/fd magic.  But this code is rather expensive.  Here are some
      results (similar to what Jim Meyering posted before).
      
      The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
      rm -fr is used to remove all directories.  Without syscall support I get
      this:
      
      real    0m31.921s
      user    0m0.688s
      sys     0m31.234s
      
      With syscall support the results are much better:
      
      real    0m20.699s
      user    0m0.536s
      sys     0m20.149s
      
      The interfaces are for obvious reasons currently not much used.  But they'll
      be used.  coreutils (and Jeff's posixutils) are already using them.
      Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
      them.  I expect a patch to make follow soon.  Every program which is walking
      the filesystem tree will benefit.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5590ff0d
  20. 12 1月, 2006 1 次提交
  21. 10 1月, 2006 1 次提交
  22. 09 1月, 2006 2 次提交
    • M
      [PATCH] tiny: Uninline some open.c functions · b01ec0ef
      Matt Mackall 提交于
      uninline some open.c functions
      
      add/remove: 3/0 grow/shrink: 0/6 up/down: 679/-1166 (-487)
      function                                     old     new   delta
      do_sys_truncate                                -     336    +336
      do_sys_ftruncate                               -     317    +317
      __put_unused_fd                                -      26     +26
      put_unused_fd                                 57      49      -8
      sys_close                                    150     119     -31
      sys_ftruncate64                              260      26    -234
      sys_ftruncate                                272      24    -248
      sys_truncate                                 339      25    -314
      sys_truncate64                               336       5    -331
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b01ec0ef
    • N
      [PATCH] Fix some problems with truncate and mtime semantics. · 4a30131e
      NeilBrown 提交于
      SUS requires that when truncating a file to the size that it currently
      is:
        truncate and ftruncate should NOT modify ctime or mtime
        O_TRUNC SHOULD modify ctime and mtime.
      
      Currently mtime and ctime are always modified on most local
      filesystems (side effect of ->truncate) or never modified (on NFS).
      
      With this patch:
        ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
          an update of these times is required whether size changes or not
          (via a new argument to do_truncate).  This allows NFS to do
          the right thing for O_TRUNC.
        inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
          sets the size to it's current value.  This allows local filesystems
          to do the right thing for f?truncate.
      
      Also, the logic in inode_setattr is changed a bit so there are two return
      points.  One returns the error from vmtruncate if it failed, the other
      returns 0 (there can be no other failure).
      
      Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
      requested, we now fall-through and mark_inode_dirty.  If a filesystem did
      not have a ->truncate function, then vmtruncate will have changed i_size,
      without marking the inode as 'dirty', and I think this is wrong.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4a30131e
  23. 09 11月, 2005 2 次提交
  24. 07 11月, 2005 2 次提交
  25. 19 10月, 2005 1 次提交
  26. 13 9月, 2005 1 次提交
    • P
      [PATCH] open returns ENFILE but creates file anyway · a1a5b3d9
      Peter Staubach 提交于
      When open(O_CREAT) is called and the error, ENFILE, is returned, the file
      may be created anyway.  This is counter intuitive, against the SUS V3
      specification, and may cause applications to misbehave if they are not
      coded correctly to handle this semantic.  The SUS V3 specification
      explicitly states "No files shall be created or modified if the function
      returns -1.".
      
      The error, ENFILE, is used to indicate the system wide open file table is
      full and no more file structs can be allocated.
      
      This is due to an ordering problem.  The entry in the directory is created
      before the file struct is allocated.  If the allocation for the file struct
      fails, then the system call must return an error, but the directory entry
      was already created and can not be safely removed.
      
      The solution to this situation is relatively easy.  The file struct should
      be allocated before the directory entry is created.  If the allocation
      fails, then the error can be returned directly.  If the creation of the
      directory entry fails, then the file struct can be easily freed.
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a1a5b3d9
  27. 10 9月, 2005 2 次提交
    • D
      [PATCH] files: files struct with RCU · ab2af1f5
      Dipankar Sarma 提交于
      Patch to eliminate struct files_struct.file_lock spinlock on the reader side
      and use rcu refcounting rcuref_xxx api for the f_count refcounter.  The
      updates to the fdtable are done by allocating a new fdtable structure and
      setting files->fdt to point to the new structure.  The fdtable structure is
      protected by RCU thereby allowing lock-free lookup.  For fd arrays/sets that
      are vmalloced, we use keventd to free them since RCU callbacks can't sleep.  A
      global list of fdtable to be freed is not scalable, so we use a per-cpu list.
      If keventd is already handling the current cpu's work, we use a timer to defer
      queueing of that work.
      
      Since the last publication, this patch has been re-written to avoid using
      explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
      premitives instead.  This required that the fd information is kept in a
      separate structure (fdtable) and updated atomically.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab2af1f5
    • D
      [PATCH] files: break up files struct · badf1662
      Dipankar Sarma 提交于
      In order for the RCU to work, the file table array, sets and their sizes must
      be updated atomically.  Instead of ensuring this through too many memory
      barriers, we put the arrays and their sizes in a separate structure.  This
      patch takes the first step of putting the file table elements in a separate
      structure fdtable that is embedded withing files_struct.  It also changes all
      the users to refer to the file table using files_fdtable() macro.  Subsequent
      applciation of RCU becomes easier after this.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      badf1662
  28. 08 9月, 2005 1 次提交