1. 09 5月, 2017 1 次提交
  2. 25 2月, 2017 1 次提交
  3. 13 2月, 2017 1 次提交
    • K
      proc/sysctl: prune stale dentries during unregistering · d6cffbbe
      Konstantin Khlebnikov 提交于
      Currently unregistering sysctl table does not prune its dentries.
      Stale dentries could slowdown sysctl operations significantly.
      
      For example, command:
      
       # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
       creates a millions of stale denties around sysctls of loopback interface:
      
       # sysctl fs.dentry-state
       fs.dentry-state = 25812579  24724135        45      0       0       0
      
       All of them have matching names thus lookup have to scan though whole
       hash chain and call d_compare (proc_sys_compare) which checks them
       under system-wide spinlock (sysctl_lock).
      
       # time sysctl -a > /dev/null
       real    1m12.806s
       user    0m0.016s
       sys     1m12.400s
      
      Currently only memory reclaimer could remove this garbage.
      But without significant memory pressure this never happens.
      
      This patch collects sysctl inodes into list on sysctl table header and
      prunes all their dentries once that table unregisters.
      
      Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
      > On 10.02.2017 10:47, Al Viro wrote:
      >> how about >> the matching stats *after* that patch?
      >
      > dcache size doesn't grow endlessly, so stats are fine
      >
      > # sysctl fs.dentry-state
      > fs.dentry-state = 92712	58376	45	0	0	0
      >
      > # time sysctl -a &>/dev/null
      >
      > real	0m0.013s
      > user	0m0.004s
      > sys	0m0.008s
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      d6cffbbe
  4. 25 12月, 2016 1 次提交
  5. 13 12月, 2016 4 次提交
  6. 09 12月, 2016 1 次提交
  7. 28 9月, 2016 2 次提交
  8. 24 6月, 2016 3 次提交
  9. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  10. 31 12月, 2015 1 次提交
  11. 09 12月, 2015 1 次提交
    • A
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro 提交于
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b255391
  12. 01 7月, 2015 1 次提交
  13. 11 5月, 2015 3 次提交
    • A
      switch ->put_link() from dentry to inode · 5f2c4179
      Al Viro 提交于
      only one instance looks at that argument at all; that sole
      exception wants inode rather than dentry.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5f2c4179
    • A
      don't pass nameidata to ->follow_link() · 6e77137b
      Al Viro 提交于
      its only use is getting passed to nd_jump_link(), which can obtain
      it from current->nameidata
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e77137b
    • A
      new ->follow_link() and ->put_link() calling conventions · 680baacb
      Al Viro 提交于
      a) instead of storing the symlink body (via nd_set_link()) and returning
      an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
      that opaque pointer (into void * passed by address by caller) and returns
      the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
      symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
      is ignored in all cases except the last one.
      
      Storing NULL for opaque pointer (or not storing it at all) means no call
      of ->put_link().
      
      b) the body used to be passed to ->put_link() implicitly (via nameidata).
      Now only the opaque pointer is.  In the cases when we used the symlink body
      to free stuff, ->follow_link() now should store it as opaque pointer in addition
      to returning it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      680baacb
  14. 16 4月, 2015 1 次提交
  15. 23 2月, 2015 1 次提交
  16. 13 2月, 2015 1 次提交
  17. 11 12月, 2014 2 次提交
    • A
      kill proc_ns completely · 3d3d35b1
      Al Viro 提交于
      procfs inodes need only the ns_ops part; nsfs inodes don't need it at all
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3d3d35b1
    • A
      take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b
      Al Viro 提交于
      New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
      It's not mountable (not even registered, so it's not in /proc/filesystems,
      etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().
      
      This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
      get_proc_ns() is a macro now (it's simply returning ->i_private; would
      have been an inline, if not for header ordering headache).
      proc_ns_inode() is an ex-parrot.  The interface used in procfs is
      ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
      
      Dentries and inodes are never hashed; a non-counting reference to dentry
      is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
      if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
      of that mechanism.
      
      As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
      it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
      from ns_get_path().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e149ed2b
  18. 05 12月, 2014 2 次提交
  19. 05 8月, 2014 1 次提交
    • E
      proc: Implement /proc/thread-self to point at the directory of the current thread · 0097875b
      Eric W. Biederman 提交于
      /proc/thread-self is derived from /proc/self.  /proc/thread-self
      points to the directory in proc containing information about the
      current thread.
      
      This funtionality has been missing for a long time, and is tricky to
      implement in userspace as gettid() is not exported by glibc.  More
      importantly this allows fixing defects in /proc/mounts and /proc/net
      where in a threaded application today they wind up being empty files
      when only the initial pthread has exited, causing problems for other
      threads.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0097875b
  20. 08 4月, 2014 1 次提交
  21. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  22. 13 12月, 2013 1 次提交
    • J
      procfs: also fix proc_reg_get_unmapped_area() for !MMU case · ae5758a1
      Jan Beulich 提交于
      Commit fad1a86e ("procfs: call default get_unmapped_area on
      MMU-present architectures"), as its title says, took care of only the
      MMU case, leaving the !MMU side still in the regressed state (returning
      -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
      
      From the fad1a86e changelog:
      
       "Commit c4fe2448 ("sparc: fix PCI device proc file mmap(2)") added
        proc_reg_get_unmapped_area in proc_reg_file_ops and
        proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
        get_unmapped_area method is not defined for the target procfs file, which
        causes regression of mmap on /proc/vmcore.
      
        To address this issue, like get_unmapped_area(), call default
        current->mm->get_unmapped_area on MMU-present architectures if
        pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
        in the procfs file, is not defined"
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae5758a1
  23. 13 11月, 2013 1 次提交
  24. 17 10月, 2013 2 次提交
  25. 06 9月, 2013 1 次提交
  26. 02 5月, 2013 1 次提交
  27. 30 4月, 2013 2 次提交
  28. 10 4月, 2013 1 次提交