1. 03 5月, 2016 1 次提交
  2. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  3. 07 12月, 2015 1 次提交
  4. 06 11月, 2015 1 次提交
  5. 01 11月, 2015 2 次提交
    • L
      vfs: conditionally clear close-on-exec flag · fc90888d
      Linus Torvalds 提交于
      We clear the close-on-exec flag when opening and closing files, and the
      bit was almost always already clear before.  Avoid dirtying the
      cacheline if the clearning isn't necessary.  That avoids unnecessary
      cacheline dirtying and bouncing in multi-socket environments.
      
      Eric Dumazet has a file descriptor benchmark that goes 4% faster from
      this on his two-socket machine.  It's probably partly superlinear
      improvement due to getting slightly less spinlock contention on the
      file_lock spinlock due to less work in the critical section.
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc90888d
    • L
      vfs: Fix pathological performance case for __alloc_fd() · f3f86e33
      Linus Torvalds 提交于
      Al Viro points out that:
      > >     * [Linux-specific aside] our __alloc_fd() can degrade quite badly
      > > with some use patterns.  The cacheline pingpong in the bitmap is probably
      > > inevitable, unless we accept considerably heavier memory footprint,
      > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard
      > > to trigger - close(3);open(...); will have the next open() after that
      > > scanning the entire in-use bitmap.
      
      And Eric Dumazet has a somewhat realistic multithreaded microbenchmark
      that opens and closes a lot of sockets with minimal work per socket.
      
      This patch largely fixes it.  We keep a 2nd-level bitmap of the open
      file bitmaps, showing which words are already full.  So then we can
      traverse that second-level bitmap to efficiently skip already allocated
      file descriptors.
      
      On his benchmark, this improves performance by up to an order of
      magnitude, by avoiding the excessive open file bitmap scanning.
      Tested-and-acked-by: NEric Dumazet <edumazet@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3f86e33
  6. 01 7月, 2015 2 次提交
    • E
      fs/file.c: __fget() and dup2() atomicity rules · 5ba97d28
      Eric Dumazet 提交于
      __fget() does lockless fetch of pointer from the descriptor
      table, attempts to grab a reference and treats "it was already
      zero" as "it's already gone from the table, we just hadn't
      seen the store, let's fail".  Unfortunately, that breaks the
      atomicity of dup2() - __fget() might see the old pointer,
      notice that it's been already dropped and treat that as
      "it's closed".  What we should be getting is either the
      old file or new one, depending whether we come before or after
      dup2().
      
      Dmitry had following test failing sometimes :
      
      int fd;
      void *Thread(void *x) {
        char buf;
        int n = read(fd, &buf, 1);
        if (n != 1)
          exit(printf("read failed: n=%d errno=%d\n", n, errno));
        return 0;
      }
      
      int main()
      {
        fd = open("/dev/urandom", O_RDONLY);
        int fd2 = open("/dev/urandom", O_RDONLY);
        if (fd == -1 || fd2 == -1)
          exit(printf("open failed\n"));
        pthread_t th;
        pthread_create(&th, 0, Thread, 0);
        if (dup2(fd2, fd) == -1)
          exit(printf("dup2 failed\n"));
        pthread_join(th, 0);
        if (close(fd) == -1)
          exit(printf("close failed\n"));
        if (close(fd2) == -1)
          exit(printf("close failed\n"));
        printf("DONE\n");
        return 0;
      }
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ba97d28
    • E
      fs/file.c: don't acquire files->file_lock in fd_install() · 8a81252b
      Eric Dumazet 提交于
      Mateusz Guzik reported :
      
       Currently obtaining a new file descriptor results in locking fdtable
       twice - once in order to reserve a slot and second time to fill it.
      
      Holding the spinlock in __fd_install() is needed in case a resize is
      done, or to prevent a resize.
      
      Mateusz provided an RFC patch and a micro benchmark :
        http://people.redhat.com/~mguzik/pipebench.c
      
      A resize is an unlikely operation in a process lifetime,
      as table size is at least doubled at every resize.
      
      We can use RCU instead of the spinlock.
      
      __fd_install() must wait if a resize is in progress.
      
      The resize must block new __fd_install() callers from starting,
      and wait that ongoing install are finished (synchronize_sched())
      
      resize should be attempted by a single thread to not waste resources.
      
      rcu_sched variant is used, as __fd_install() and expand_fdtable() run
      from process context.
      
      It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
      CPU E5-2696 v2 @ 2.50GHz
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMateusz Guzik <mguzik@redhat.com>
      Acked-by: NMateusz Guzik <mguzik@redhat.com>
      Tested-by: NMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8a81252b
  7. 17 4月, 2015 1 次提交
  8. 11 12月, 2014 1 次提交
  9. 09 10月, 2014 1 次提交
  10. 08 9月, 2014 1 次提交
  11. 07 5月, 2014 1 次提交
  12. 02 4月, 2014 1 次提交
  13. 23 3月, 2014 1 次提交
    • E
      vfs: Don't let __fdget_pos() get FMODE_PATH files · 99aea681
      Eric Biggers 提交于
      Commit bd2a31d5 ("get rid of fget_light()") introduced the
      __fdget_pos() function, which returns the resulting file pointer and
      fdput flags combined in an 'unsigned long'.  However, it also changed the
      behavior to return files with FMODE_PATH set, which shouldn't happen
      because read(), write(), lseek(), etc. aren't allowed on such files.
      This commit restores the old behavior.
      
      This regression actually had no effect on read() and write() since
      FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
      O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
      to fail with ESPIPE rather than EBADF.
      Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99aea681
  14. 10 3月, 2014 1 次提交
  15. 18 2月, 2014 1 次提交
  16. 11 2月, 2014 1 次提交
    • E
      fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem · 96c7a2ff
      Eric W. Biederman 提交于
      Recently due to a spike in connections per second memcached on 3
      separate boxes triggered the OOM killer from accept.  At the time the
      OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
      problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
      hold a bitmap, and there was sufficient fragmentation that the largest
      page available was 8KiB.
      
      I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
      but I do agree that order 3 allocations are very likely to succeed.
      
      There are always pathologies where order > 0 allocations can fail when
      there are copious amounts of free memory available.  Using the pigeon
      hole principle it is easy to show that it requires 1 page more than 50%
      of the pages being free to guarantee an order 1 (8KiB) allocation will
      succeed, 1 page more than 75% of the pages being free to guarantee an
      order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
      the pages being free to guarantee an order 3 allocate will succeed.
      
      A server churning memory with a lot of small requests and replies like
      memcached is a common case that if anything can will skew the odds
      against large pages being available.
      
      Therefore let's not give external applications a practical way to kill
      linux server applications, and specify __GFP_NORETRY to the kmalloc in
      alloc_fdmem.  Unless I am misreading the code and by the time the code
      reaches should_alloc_retry in __alloc_pages_slowpath (where
      __GFP_NORETRY becomes signification).  We have already tried everything
      reasonable to allocate a page and the only thing left to do is wait.  So
      not waiting and falling back to vmalloc immediately seems like the
      reasonable thing to do even if there wasn't a chance of triggering the
      OOM killer.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96c7a2ff
  17. 25 1月, 2014 5 次提交
  18. 02 5月, 2013 1 次提交
  19. 19 2月, 2013 1 次提交
  20. 04 1月, 2013 1 次提交
    • G
      misc: remove __dev* attributes. · 6ae14171
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the last of the __dev* markings from the kernel from
      a variety of different, tiny, places.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ae14171
  21. 30 11月, 2012 1 次提交
  22. 29 11月, 2012 1 次提交
  23. 12 11月, 2012 1 次提交
  24. 31 10月, 2012 1 次提交
  25. 10 10月, 2012 1 次提交
    • R
      dup3: Return an error when oldfd == newfd. · aed97647
      Richard W.M. Jones 提交于
      I have tested the attached patch to fix the dup3 regression.
      
      Rich.
      
      From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
      From: "Richard W.M. Jones" <rjones@redhat.com>
      Date: Tue, 9 Oct 2012 14:42:45 +0100
      Subject: [PATCH] dup3: Return an error when oldfd == newfd.
      
      The following commit:
      
        commit fe17f22d
        Author: Al Viro <viro@zeniv.linux.org.uk>
        Date:   Tue Aug 21 11:48:11 2012 -0400
      
          take purely descriptor-related stuff from fcntl.c to file.c
      
      was supposed to be just code motion, but it dropped the following two
      lines:
      
        if (unlikely(oldfd == newfd))
                return -EINVAL;
      
      from the dup3 system call.  dup3 is not specified by POSIX, so Linux
      can do what it likes.  However the POSIX proposal for dup3 [1] states
      that it should return an error if oldfd == newfd.
      
      [1] http://austingroupbugs.net/view.php?id=411Signed-off-by: NRichard W.M. Jones <rjones@redhat.com>
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aed97647
  26. 27 9月, 2012 9 次提交