1. 01 2月, 2020 1 次提交
  2. 18 1月, 2020 3 次提交
  3. 09 11月, 2019 1 次提交
  4. 02 7月, 2019 1 次提交
  5. 01 6月, 2019 1 次提交
    • J
      mm: fix page cache convergence regression · 7b785645
      Johannes Weiner 提交于
      Since a2833486 ("page cache: Finish XArray conversion"), on most
      major Linux distributions, the page cache doesn't correctly transition
      when the hot data set is changing, and leaves the new pages thrashing
      indefinitely instead of kicking out the cold ones.
      
      On a freshly booted, freshly ssh'd into virtual machine with 1G RAM
      running stock Arch Linux:
      
      [root@ham ~]# ./reclaimtest.sh
      + dd of=workingset-a bs=1M count=0 seek=600
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + ./mincore workingset-a
      153600/153600 workingset-a
      + dd of=workingset-b bs=1M count=0 seek=600
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      104029/153600 workingset-a
      120086/153600 workingset-b
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      104029/153600 workingset-a
      120268/153600 workingset-b
      
      workingset-b is a 600M file on a 1G host that is otherwise entirely
      idle. No matter how often it's being accessed, it won't get cached.
      
      While investigating, I noticed that the non-resident information gets
      aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is
      a problem because a workingset transition like this relies on the
      non-resident information tracked in the page cache tree of evicted
      file ranges: when the cache faults are refaults of recently evicted
      cache, we challenge the existing active set, and that allows a new
      workingset to establish itself.
      
      Tracing the shrinker that maintains this memory revealed that all page
      cache tree nodes were allocated to the root cgroup. This is a problem,
      because 1) the shrinker sizes the amount of non-resident information
      it keeps to the size of the cgroup's other memory and 2) on most major
      Linux distributions, only kernel threads live in the root cgroup and
      everything else gets put into services or session groups:
      
      [root@ham ~]# cat /proc/self/cgroup
      0::/user.slice/user-0.slice/session-c1.scope
      
      As a result, we basically maintain no non-resident information for the
      workloads running on the system, thus breaking the caching algorithm.
      
      Looking through the code, I found the culprit in the above-mentioned
      patch: when switching from the radix tree to xarray, it dropped the
      __GFP_ACCOUNT flag from the tree node allocations - the flag that
      makes sure the allocated memory gets charged to and tracked by the
      cgroup of the calling process - in this case, the one doing the fault.
      
      To fix this, allow xarray users to specify per-tree flag that makes
      xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache
      tree annotation to request such cgroup tracking for the cache nodes.
      
      With this patch applied, the page cache correctly converges on new
      workingsets again after just a few iterations:
      
      [root@ham ~]# ./reclaimtest.sh
      + dd of=workingset-a bs=1M count=0 seek=600
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + ./mincore workingset-a
      153600/153600 workingset-a
      + dd of=workingset-b bs=1M count=0 seek=600
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      124607/153600 workingset-a
      87876/153600 workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      81313/153600 workingset-a
      133321/153600 workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      63036/153600 workingset-a
      153600/153600 workingset-b
      
      Cc: stable@vger.kernel.org # 4.20+
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      7b785645
  6. 22 2月, 2019 2 次提交
  7. 21 2月, 2019 2 次提交
  8. 07 2月, 2019 4 次提交
    • M
      XArray: Add cyclic allocation · 2fa044e5
      Matthew Wilcox 提交于
      This differs slightly from the IDR equivalent in five ways.
      
      1. It can allocate up to UINT_MAX instead of being limited to INT_MAX,
         like xa_alloc().  Also like xa_alloc(), it will write to the 'id'
         pointer before placing the entry in the XArray.
      2. The 'next' cursor is allocated separately from the XArray instead
         of being part of the IDR.  This saves memory for all the users which
         do not use the cyclic allocation API and suits some users better.
      3. It returns -EBUSY instead of -ENOSPC.
      4. It will attempt to wrap back to the minimum value on memory allocation
         failure as well as on an -EBUSY error, assuming that a user would
         rather allocate a small ID than suffer an ID allocation failure.
      5. It reports whether it has wrapped, which is important to some users.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      2fa044e5
    • M
      XArray: Redesign xa_alloc API · a3e4d3f9
      Matthew Wilcox 提交于
      It was too easy to forget to initialise the start index.  Add an
      xa_limit data structure which can be used to pass min & max, and
      define a couple of special values for common cases.  Also add some
      more tests cribbed from the IDR test suite.  Change the return value
      from -ENOSPC to -EBUSY to match xa_insert().
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      a3e4d3f9
    • M
      XArray: Add support for 1s-based allocation · 3ccaf57a
      Matthew Wilcox 提交于
      A lot of places want to allocate IDs starting at 1 instead of 0.
      While the xa_alloc() API supports this, it's not very efficient if lots
      of IDs are allocated, due to having to walk down to the bottom of the
      tree to see if ID 1 is available, then all the way over to the next
      non-allocated ID.  This method marks ID 0 as being occupied which wastes
      one slot in the XArray, but preserves xa_empty() as working.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      3ccaf57a
    • M
      XArray: Change xa_insert to return -EBUSY · fd9dc93e
      Matthew Wilcox 提交于
      Userspace translates EEXIST to "File exists" which isn't a very good
      error message for the problem.  "Device or resource busy" is a better
      indication of what went wrong.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      fd9dc93e
  9. 05 2月, 2019 1 次提交
  10. 07 1月, 2019 3 次提交
  11. 14 12月, 2018 1 次提交
  12. 17 11月, 2018 1 次提交
  13. 06 11月, 2018 8 次提交
  14. 21 10月, 2018 11 次提交
    • M
      xarray: Add range store functionality · 0e9446c3
      Matthew Wilcox 提交于
      This version of xa_store_range() really only supports load and store.
      Our only user only needs basic load and store functionality, so there's
      no need to do the extra work to support marking and overlapping stores
      correctly yet.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      0e9446c3
    • M
      xarray: Track free entries in an XArray · 371c752d
      Matthew Wilcox 提交于
      Add the optional ability to track which entries in an XArray are free
      and provide xa_alloc() to replace most of the functionality of the IDR.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      371c752d
    • M
      xarray: Add xa_reserve and xa_release · 9f14d4f1
      Matthew Wilcox 提交于
      This function reserves a slot in the XArray for users which need
      to acquire multiple locks before storing their entry in the tree and
      so cannot use a plain xa_store().
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      9f14d4f1
    • M
      xarray: Add xas_create_range · 2264f513
      Matthew Wilcox 提交于
      This hopefully temporary function is useful for users who have not yet
      been converted to multi-index entries.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      2264f513
    • M
      xarray: Add xas_for_each_conflict · 4e99d4e9
      Matthew Wilcox 提交于
      This iterator iterates over each entry that is stored in the index or
      indices specified by the xa_state.  This is intended for use for a
      conditional store of a multiindex entry, or to allow entries which are
      about to be removed from the xarray to be disposed of properly.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      4e99d4e9
    • M
      xarray: Step through an XArray · 64d3e9a9
      Matthew Wilcox 提交于
      The xas_next and xas_prev functions move the xas index by one position,
      and adjust the rest of the iterator state to match it.  This is more
      efficient than calling xas_set() as it keeps the iterator at the leaves
      of the tree instead of walking the iterator from the root each time.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      64d3e9a9
    • M
      xarray: Destroy an XArray · 687149fc
      Matthew Wilcox 提交于
      This function frees all the internal memory allocated to the xarray
      and reinitialises it to be empty.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      687149fc
    • M
      xarray: Extract entries from an XArray · 80a0a1a9
      Matthew Wilcox 提交于
      The xa_extract function combines the functionality of
      radix_tree_gang_lookup() and radix_tree_gang_lookup_tagged().
      It extracts entries matching the specified filter into a normal array.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      80a0a1a9
    • M
      xarray: Add XArray iterators · b803b428
      Matthew Wilcox 提交于
      The xa_for_each iterator allows the user to efficiently walk a range
      of the array, executing the loop body once for each entry in that
      range that matches the filter.  This commit also includes xa_find()
      and xa_find_after() which are helper functions for xa_for_each() but
      may also be useful in their own right.
      
      In the xas family of functions, we have xas_for_each(), xas_find(),
      xas_next_entry(), xas_for_each_tagged(), xas_find_tagged(),
      xas_next_tagged() and xas_pause().
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      b803b428
    • M
      xarray: Add XArray conditional store operations · 41aec91f
      Matthew Wilcox 提交于
      Like cmpxchg(), xa_cmpxchg will only store to the index if the current
      entry matches the old entry.  It returns the current entry, which is
      usually more useful than the errno returned by radix_tree_insert().
      For the users who really only want the errno, the xa_insert() wrapper
      provides a more convenient calling convention.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      41aec91f
    • M
      xarray: Add XArray unconditional store operations · 58d6ea30
      Matthew Wilcox 提交于
      xa_store() differs from radix_tree_insert() in that it will overwrite an
      existing element in the array rather than returning an error.  This is
      the behaviour which most users want, and those that want more complex
      behaviour generally want to use the xas family of routines anyway.
      
      For memory allocation, xa_store() will first attempt to request memory
      from the slab allocator; if memory is not immediately available, it will
      drop the xa_lock and allocate memory, keeping a pointer in the xa_state.
      It does not use the per-CPU cache, although those will continue to exist
      until all radix tree users are converted to the xarray.
      
      This patch also includes xa_erase() and __xa_erase() for a streamlined
      way to store NULL.  Since there is no need to allocate memory in order
      to store a NULL in the XArray, we do not need to trouble the user with
      deciding what memory allocation flags to use.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      58d6ea30