1. 05 6月, 2010 1 次提交
  2. 22 5月, 2010 1 次提交
  3. 22 4月, 2010 1 次提交
    • E
      fasync: RCU and fine grained locking · 989a2979
      Eric Dumazet 提交于
      kill_fasync() uses a central rwlock, candidate for RCU conversion, to
      avoid cache line ping pongs on SMP.
      
      fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short
      section instead during whole list scan.
      
      Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and
      fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync()
      doesnt need its own implementation and can use fasync_helper(), to
      reduce code size and complexity.
      
      We can remove __kill_fasync() direct use in net/socket.c, and rename it
      to kill_fasync_rcu().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      989a2979
  4. 07 3月, 2010 1 次提交
  5. 08 2月, 2010 1 次提交
    • L
      Fix race in tty_fasync() properly · 80e1e823
      Linus Torvalds 提交于
      This reverts commit 70362511 ("tty: fix race in tty_fasync") and
      commit b04da8bf ("fnctl: f_modown should call write_lock_irqsave/
      restore") that tried to fix up some of the fallout but was incomplete.
      
      It turns out that we really cannot hold 'tty->ctrl_lock' over calling
      __f_setown, because not only did that cause problems with interrupt
      disables (which the second commit fixed), it also causes a potential
      ABBA deadlock due to lock ordering.
      
      Thanks to Tetsuo Handa for following up on the issue, and running
      lockdep to show the problem.  It goes roughly like this:
      
       - f_getown gets filp->f_owner.lock for reading without interrupts
         disabled, so an interrupt that happens while that lock is held can
         cause a lockdep chain from f_owner.lock -> sighand->siglock.
      
       - at the same time, the tty->ctrl_lock -> f_owner.lock chain that
         commit 70362511 introduced, together with the pre-existing
         sighand->siglock -> tty->ctrl_lock chain means that we have a lock
         dependency the other way too.
      
      So instead of extending tty->ctrl_lock over the whole __f_setown() call,
      we now just take a reference to the 'pid' structure while holding the
      lock, and then release it after having done the __f_setown.  That still
      guarantees that 'struct pid' won't go away from under us, which is all
      we really ever needed.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: NAmérico Wang <xiyou.wangcong@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80e1e823
  6. 27 1月, 2010 1 次提交
  7. 17 12月, 2009 1 次提交
    • L
      fasync: split 'fasync_helper()' into separate add/remove functions · 53281b6d
      Linus Torvalds 提交于
      Yes, the add and remove cases do share the same basic loop and the
      locking, but the compiler can inline and then CSE some of the end result
      anyway.  And splitting it up makes the code way easier to follow,
      and makes it clearer exactly what the semantics are.
      
      In particular, we must make sure that the FASYNC flag in file->f_flags
      exactly matches the state of "is this file on any fasync list", since
      not only is that flag visible to user space (F_GETFL), but we also use
      that flag to check whether we need to remove any fasync entries on file
      close.
      
      We got that wrong for the case of a mixed use of file locking (which
      tries to remove any fasync entries for file leases) and fasync.
      
      Splitting the function up also makes it possible to do some future
      optimizations without making the function even messier.  In particular,
      since the FASYNC flag has to match the state of "is this on a list", we
      can do the following future optimizations:
      
       - on remove, we don't even need to get the locks and traverse the list
         if FASYNC isn't set, since we can know a priori that there is no
         point (this is effectively the same optimization that we already do
         in __fput() wrt removing fasync on file close)
      
       - on add, we can use the FASYNC flag to decide whether we are changing
         an existing entry or need to allocate a new one.
      
      but this is just the cleanup + fix for the FASYNC flag.
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Tested-by: NTavis Ormandy <taviso@google.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53281b6d
  8. 18 11月, 2009 1 次提交
  9. 24 9月, 2009 2 次提交
    • P
      fcntl: add F_[SG]ETOWN_EX · ba0a6c9f
      Peter Zijlstra 提交于
      In order to direct the SIGIO signal to a particular thread of a
      multi-threaded application we cannot, like suggested by the manpage, put a
      TID into the regular fcntl(F_SETOWN) call.  It will still be send to the
      whole process of which that thread is part.
      
      Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.
      
      The need to direct SIGIO comes from self-monitoring profiling such as with
      perf-counters.  Perf-counters uses SIGIO to notify that new sample data is
      available.  If the signal is delivered to the same task that generated the
      new sample it can augment that data by inspecting the task's user-space
      state right after it returns from the kernel.  This is esp.  convenient
      for interpreted or virtual machine driven environments.
      
      Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
      as argument:
      
      struct f_owner_ex {
      	int   type;
      	pid_t pid;
      };
      
      Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: Nstephane eranian <eranian@googlemail.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba0a6c9f
    • O
      signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() · 06f1631a
      Oleg Nesterov 提交于
      group_send_sig_info()->check_kill_permission() assumes that current is the
      sender and uses current_cred().
      
      This is not true in send_sigio_to_task() case.  From the security pov the
      sender is not current, but the task which did fcntl(F_SETOWN), that is why
      we have sigio_perm() which uses the right creds to check.
      
      Fortunately, send_sigio() always sends either SEND_SIG_PRIV or
      SI_FROMKERNEL() signal, so check_kill_permission() does nothing.  But
      still it would be tidier to avoid this bogus security check and save a
      couple of cycles.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06f1631a
  10. 13 7月, 2009 1 次提交
  11. 17 6月, 2009 2 次提交
  12. 12 5月, 2009 1 次提交
  13. 30 3月, 2009 1 次提交
  14. 16 3月, 2009 3 次提交
    • J
      Rationalize fasync return values · 60aa4924
      Jonathan Corbet 提交于
      Most fasync implementations do something like:
      
           return fasync_helper(...);
      
      But fasync_helper() will return a positive value at times - a feature used
      in at least one place.  Thus, a number of other drivers do:
      
           err = fasync_helper(...);
           if (err < 0)
                   return err;
           return 0;
      
      In the interests of consistency and more concise code, it makes sense to
      map positive return values onto zero where ->fasync() is called.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      60aa4924
    • J
      Move FASYNC bit handling to f_op->fasync() · 76398425
      Jonathan Corbet 提交于
      Removing the BKL from FASYNC handling ran into the challenge of keeping the
      setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
      the underlying fasync() function.  Andi Kleen suggested moving the handling
      of that bit into fasync(); this patch does exactly that.  As a result, we
      have a couple of internal API changes: fasync() must now manage the FASYNC
      bit, and it will be called without the BKL held.
      
      As it happens, every fasync() implementation in the kernel with one
      exception calls fasync_helper().  So, if we make fasync_helper() set the
      FASYNC bit, we can avoid making any changes to the other fasync()
      functions - as long as those functions, themselves, have proper locking.
      Most fasync() implementations do nothing but call fasync_helper() - which
      has its own lock - so they are easily verified as correct.  The BKL had
      already been pushed down into the rest.
      
      The networking code has its own version of fasync_helper(), so that code
      has been augmented with explicit FASYNC bit handling.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      76398425
    • J
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet 提交于
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  15. 14 1月, 2009 1 次提交
  16. 06 12月, 2008 1 次提交
  17. 14 11月, 2008 4 次提交
  18. 01 8月, 2008 2 次提交
  19. 27 7月, 2008 3 次提交
    • A
      [PATCH] fix RLIM_NOFILE handling · 4e1e018e
      Al Viro 提交于
      * dup2() should return -EBADF on exceeded sysctl_nr_open
      * dup() should *not* return -EINVAL even if you have rlimit set to 0;
        it should get -EMFILE instead.
      
      Check for orig_start exceeding rlimit taken to sys_fcntl().
      Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF.
      Consequently, remaining checks for rlimit are taken to expand_files().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4e1e018e
    • A
      [PATCH] get rid of corner case in dup3() entirely · 6c5d0512
      Al Viro 提交于
      Since Ulrich is OK with getting rid of dup3(fd, fd, flags) completely,
      to hell the damn thing goes.  Corner case for dup2() is handled in
      sys_dup2() (complete with -EBADF if dup2(fd, fd) is called with fd
      that is not open), the rest is done in dup3().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6c5d0512
    • U
      [PATCH] dup3 fix · 3c333937
      Ulrich Drepper 提交于
      Al Viro notice one cornercase that the new dup3() code.  The dup2()
      function, as a special case, handles dup-ing to the same file
      descriptor.  In this case the current dup3() code does nothing at
      all.  I.e., it ingnores the flags parameter.  This shouldn't happen,
      the close-on-exec flag should be set if requested.
      
      In case the O_CLOEXEC bit in the flags parameter is not set the
      dup3() function should behave in this respect identical to dup2().
      This means dup3(fd, fd, 0) should not actively reset the c-o-e
      flag.
      
      The patch below implements this minor change.
      
      [AV: credits to Artur Grabowski for bringing that up as potential subtle point
      in dup2() behaviour]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3c333937
  20. 25 7月, 2008 1 次提交
    • U
      flag parameters: dup2 · 336dd1f7
      Ulrich Drepper 提交于
      This patch adds the new dup3 syscall.  It extends the old dup2 syscall by one
      parameter which is meant to hold a flag value.  Support for the O_CLOEXEC flag
      is added in this patch.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_dup3
      # ifdef __x86_64__
      #  define __NR_dup3 292
      # elif defined __i386__
      #  define __NR_dup3 330
      # else
      #  error "need __NR_dup3"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd = syscall (__NR_dup3, 1, 4, 0);
        if (fd == -1)
          {
            puts ("dup3(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("dup3(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
        if (fd == -1)
          {
            puts ("dup3(O_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("dup3(O_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      336dd1f7
  21. 03 7月, 2008 1 次提交
  22. 02 5月, 2008 1 次提交
  23. 25 4月, 2008 1 次提交
    • A
      [PATCH] sanitize locate_fd() · f8f95702
      Al Viro 提交于
      * 'file' argument is unused; lose it.
      * move setting flags from the caller (dupfd()) to locate_fd();
        pass cloexec flag as new argument.  Note that files_fdtable()
        that used to be in dupfd() isn't needed in the place in
        locate_fd() where the moved code ends up - we know that ->file_lock
        hadn't been dropped since the last time we calculated fdt because
        we can get there only if expand_files() returns 0 and it doesn't
        drop/reacquire in that case.
      * move getting/dropping ->file_lock into locate_fd().  Now the caller
        doesn't need to do anything with files_struct *files anymore and
        we can move that inside locate_fd() as well, killing the
        struct files_struct * argument.
      
      At that point locate_fd() is extremely similar to get_unused_fd_flags()
      and the next patches will merge those two.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f8f95702
  24. 09 2月, 2008 2 次提交
  25. 20 10月, 2007 1 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
  26. 17 10月, 2007 1 次提交
    • U
      F_DUPFD_CLOEXEC implementation · 22d2b35b
      Ulrich Drepper 提交于
      One more small change to extend the availability of creation of file
      descriptors with FD_CLOEXEC set.  Adding a new command to fcntl() requires
      no new system call and the overall impact on code size if minimal.
      
      If this patch gets accepted we will also add this change to the next
      revision of the POSIX spec.
      
      To test the patch, use the following little program.  Adjust the value of
      F_DUPFD_CLOEXEC appropriately.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <errno.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      
      #ifndef F_DUPFD_CLOEXEC
      # define F_DUPFD_CLOEXEC 12
      #endif
      
      int
      main (int argc, char *argv[])
      {
        if  (argc > 1)
          {
            if (fcntl (3, F_GETFD) == 0)
      	{
      	  puts ("descriptor not closed");
      	  exit (1);
      	}
            if (errno != EBADF)
      	{
      	  puts ("error not EBADF");
      	  exit (1);
      	}
      
            exit (0);
          }
        int fd = fcntl (STDOUT_FILENO, F_DUPFD_CLOEXEC, 0);
        if (fd == -1 && errno == EINVAL)
          {
            puts ("F_DUPFD_CLOEXEC not supported");
            return 0;
          }
        if (fd != 3)
          {
            puts ("program called with descriptors other than 0,1,2");
            return 1;
          }
      
        execl ("/proc/self/exe", "/proc/self/exe", "1", NULL);
        puts ("execl failed");
        return 1;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22d2b35b
  27. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  28. 18 7月, 2007 1 次提交
    • S
      Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check · 3bd858ab
      Satyam Sharma 提交于
      Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
      users to it. This is done because we want to avoid bugs in the future
      where we check for only effective fsuid of the current task against a
      file's owning uid, without simultaneously checking for CAP_FOWNER as
      well, thus violating its semantics.
      [ XFS uses special macros and structures, and in general looked ...
      untouchable, so we leave it alone -- but it has been looked over. ]
      
      The (current->fsuid != inode->i_uid) check in generic_permission() and
      exec_permission_lite() is left alone, because those operations are
      covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
      falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.
      Signed-off-by: NSatyam Sharma <ssatyam@cse.iitk.ac.in>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Acked-by: NSerge E. Hallyn <serge@hallyn.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bd858ab
  29. 11 12月, 2006 1 次提交
    • V
      [PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69
      Vadim Lobanov 提交于
      Currently, each fdtable supports three dynamically-sized arrays of data: the
      fdarray and two fdsets.  The code allows the number of fds supported by the
      fdarray (fdtable->max_fds) to differ from the number of fds supported by each
      of the fdsets (fdtable->max_fdset).
      
      In practice, it is wasteful for these two sizes to differ: whenever we hit a
      limit on the smaller-capacity structure, we will reallocate the entire fdtable
      and all the dynamic arrays within it, so any delta in the memory used by the
      larger-capacity structure will never be touched at all.
      
      Rather than hogging this excess, we shouldn't even allocate it in the first
      place, and keep the capacities of the fdarray and the fdsets equal.  This
      patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
      code becomes simpler.
      Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bbea9f69