1. 01 7月, 2015 2 次提交
    • E
      fs/file.c: __fget() and dup2() atomicity rules · 5ba97d28
      Eric Dumazet 提交于
      __fget() does lockless fetch of pointer from the descriptor
      table, attempts to grab a reference and treats "it was already
      zero" as "it's already gone from the table, we just hadn't
      seen the store, let's fail".  Unfortunately, that breaks the
      atomicity of dup2() - __fget() might see the old pointer,
      notice that it's been already dropped and treat that as
      "it's closed".  What we should be getting is either the
      old file or new one, depending whether we come before or after
      dup2().
      
      Dmitry had following test failing sometimes :
      
      int fd;
      void *Thread(void *x) {
        char buf;
        int n = read(fd, &buf, 1);
        if (n != 1)
          exit(printf("read failed: n=%d errno=%d\n", n, errno));
        return 0;
      }
      
      int main()
      {
        fd = open("/dev/urandom", O_RDONLY);
        int fd2 = open("/dev/urandom", O_RDONLY);
        if (fd == -1 || fd2 == -1)
          exit(printf("open failed\n"));
        pthread_t th;
        pthread_create(&th, 0, Thread, 0);
        if (dup2(fd2, fd) == -1)
          exit(printf("dup2 failed\n"));
        pthread_join(th, 0);
        if (close(fd) == -1)
          exit(printf("close failed\n"));
        if (close(fd2) == -1)
          exit(printf("close failed\n"));
        printf("DONE\n");
        return 0;
      }
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ba97d28
    • E
      fs/file.c: don't acquire files->file_lock in fd_install() · 8a81252b
      Eric Dumazet 提交于
      Mateusz Guzik reported :
      
       Currently obtaining a new file descriptor results in locking fdtable
       twice - once in order to reserve a slot and second time to fill it.
      
      Holding the spinlock in __fd_install() is needed in case a resize is
      done, or to prevent a resize.
      
      Mateusz provided an RFC patch and a micro benchmark :
        http://people.redhat.com/~mguzik/pipebench.c
      
      A resize is an unlikely operation in a process lifetime,
      as table size is at least doubled at every resize.
      
      We can use RCU instead of the spinlock.
      
      __fd_install() must wait if a resize is in progress.
      
      The resize must block new __fd_install() callers from starting,
      and wait that ongoing install are finished (synchronize_sched())
      
      resize should be attempted by a single thread to not waste resources.
      
      rcu_sched variant is used, as __fd_install() and expand_fdtable() run
      from process context.
      
      It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
      CPU E5-2696 v2 @ 2.50GHz
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMateusz Guzik <mguzik@redhat.com>
      Acked-by: NMateusz Guzik <mguzik@redhat.com>
      Tested-by: NMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8a81252b
  2. 17 4月, 2015 1 次提交
  3. 11 12月, 2014 1 次提交
  4. 09 10月, 2014 1 次提交
  5. 08 9月, 2014 1 次提交
  6. 07 5月, 2014 1 次提交
  7. 02 4月, 2014 1 次提交
  8. 23 3月, 2014 1 次提交
    • E
      vfs: Don't let __fdget_pos() get FMODE_PATH files · 99aea681
      Eric Biggers 提交于
      Commit bd2a31d5 ("get rid of fget_light()") introduced the
      __fdget_pos() function, which returns the resulting file pointer and
      fdput flags combined in an 'unsigned long'.  However, it also changed the
      behavior to return files with FMODE_PATH set, which shouldn't happen
      because read(), write(), lseek(), etc. aren't allowed on such files.
      This commit restores the old behavior.
      
      This regression actually had no effect on read() and write() since
      FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
      O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
      to fail with ESPIPE rather than EBADF.
      Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99aea681
  9. 10 3月, 2014 1 次提交
  10. 18 2月, 2014 1 次提交
  11. 11 2月, 2014 1 次提交
    • E
      fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem · 96c7a2ff
      Eric W. Biederman 提交于
      Recently due to a spike in connections per second memcached on 3
      separate boxes triggered the OOM killer from accept.  At the time the
      OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
      problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
      hold a bitmap, and there was sufficient fragmentation that the largest
      page available was 8KiB.
      
      I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
      but I do agree that order 3 allocations are very likely to succeed.
      
      There are always pathologies where order > 0 allocations can fail when
      there are copious amounts of free memory available.  Using the pigeon
      hole principle it is easy to show that it requires 1 page more than 50%
      of the pages being free to guarantee an order 1 (8KiB) allocation will
      succeed, 1 page more than 75% of the pages being free to guarantee an
      order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
      the pages being free to guarantee an order 3 allocate will succeed.
      
      A server churning memory with a lot of small requests and replies like
      memcached is a common case that if anything can will skew the odds
      against large pages being available.
      
      Therefore let's not give external applications a practical way to kill
      linux server applications, and specify __GFP_NORETRY to the kmalloc in
      alloc_fdmem.  Unless I am misreading the code and by the time the code
      reaches should_alloc_retry in __alloc_pages_slowpath (where
      __GFP_NORETRY becomes signification).  We have already tried everything
      reasonable to allocate a page and the only thing left to do is wait.  So
      not waiting and falling back to vmalloc immediately seems like the
      reasonable thing to do even if there wasn't a chance of triggering the
      OOM killer.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96c7a2ff
  12. 25 1月, 2014 5 次提交
  13. 02 5月, 2013 1 次提交
  14. 19 2月, 2013 1 次提交
  15. 04 1月, 2013 1 次提交
    • G
      misc: remove __dev* attributes. · 6ae14171
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the last of the __dev* markings from the kernel from
      a variety of different, tiny, places.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ae14171
  16. 30 11月, 2012 1 次提交
  17. 29 11月, 2012 1 次提交
  18. 12 11月, 2012 1 次提交
  19. 31 10月, 2012 1 次提交
  20. 10 10月, 2012 1 次提交
    • R
      dup3: Return an error when oldfd == newfd. · aed97647
      Richard W.M. Jones 提交于
      I have tested the attached patch to fix the dup3 regression.
      
      Rich.
      
      From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
      From: "Richard W.M. Jones" <rjones@redhat.com>
      Date: Tue, 9 Oct 2012 14:42:45 +0100
      Subject: [PATCH] dup3: Return an error when oldfd == newfd.
      
      The following commit:
      
        commit fe17f22d
        Author: Al Viro <viro@zeniv.linux.org.uk>
        Date:   Tue Aug 21 11:48:11 2012 -0400
      
          take purely descriptor-related stuff from fcntl.c to file.c
      
      was supposed to be just code motion, but it dropped the following two
      lines:
      
        if (unlikely(oldfd == newfd))
                return -EINVAL;
      
      from the dup3 system call.  dup3 is not specified by POSIX, so Linux
      can do what it likes.  However the POSIX proposal for dup3 [1] states
      that it should return an error if oldfd == newfd.
      
      [1] http://austingroupbugs.net/view.php?id=411Signed-off-by: NRichard W.M. Jones <rjones@redhat.com>
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aed97647
  21. 27 9月, 2012 15 次提交