提交 · 1ec6995d1290bfb87cc3a51f0836c889e857cef9 · gsplhtlxg / clone-Linux

12 4月, 2018 1 次提交

由 Xidong Wang 提交于 4月 10, 2018

In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
released on the error path that pool->compact_wq , which holds the
return value of create_singlethread_workqueue(), is NULL.  This will
result in a memory leak bug.

[akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.comSigned-off-by: NXidong Wang <wangxidong_97@163.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ec6995d

06 4月, 2018 1 次提交

z3fold: limit use of stale list for allocation · 5c9bab59

由 Vitaly Wool 提交于 4月 05, 2018

Currently if z3fold couldn't find an unbuddied page it would first try
to pull a page off the stale list. The problem with this approach is
that we can't 100% guarantee that the page is not processed by the
workqueue thread at the same time unless we run cancel_work_sync() on
it, which we can't do if we're in an atomic context. So let's just
limit stale list usage to non-atomic contexts only.

Link: http://lkml.kernel.org/r/47ab51e7-e9c1-d30e-ab17-f734dbc3abce@gmail.comSigned-off-by: NVitaly Vul <vitaly.vul@sony.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c9bab59

07 2月, 2018 1 次提交

mm: docs: fix parameter names mismatch · f144c390

由 Mike Rapoport 提交于 2月 06, 2018

There are several places where parameter descriptions do no match the
actual code. Fix it.

Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f144c390

18 11月, 2017 1 次提交

mm/z3fold.c: use kref to prevent page free/compact race · 5d03a661

由 Vitaly Wool 提交于 11月 17, 2017

There is a race in the current z3fold implementation between
do_compact() called in a work queue context and the page release
procedure when page's kref goes to 0.

do_compact() may be waiting for page lock, which is released by
release_z3fold_page_locked right before putting the page onto the
"stale" list, and then the page may be freed as do_compact() modifies
its contents.

The mechanism currently implemented to handle that (checking the
PAGE_STALE flag) is not reliable enough.  Instead, we'll use page's kref
counter to guarantee that the page is not released if its compaction is
scheduled.  It then becomes compaction function's responsibility to
decrease the counter and quit immediately if the page was actually
freed.

Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.comSigned-off-by: NVitaly Wool <vitaly.wool@sonymobile.com>
Cc: <Oleksiy.Avramchenko@sony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d03a661

04 10月, 2017 2 次提交

z3fold: fix stale list handling · 35529357

由 Vitaly Wool 提交于 10月 03, 2017

Fix the situation when clear_bit() is called for page->private before
the page pointer is actually assigned.  While at it, remove work_busy()
check because it is costly and does not give 100% guarantee anyway.
Signed-off-by: NVitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35529357

z3fold: fix potential race in z3fold_reclaim_page · d5567c9d

由 Vitaly Wool 提交于 10月 03, 2017

It is possible that on a (partially) unsuccessful page reclaim,
kref_put() called in z3fold_reclaim_page() does not yield page release,
but the page is released shortly afterwards by another thread.  Then
z3fold_reclaim_page() would try to list_add() that (released) page again
which is obviously a bug.

To avoid that, spin_lock() has to be taken earlier, before the
kref_put() call mentioned earlier.

Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5567c9d

07 9月, 2017 1 次提交

z3fold: use per-cpu unbuddied lists · d30561c5

由 Vitaly Wool 提交于 9月 06, 2017

It's been noted that z3fold doesn't scale well when it's run in a large
number of threads on many cores, which can be easily reproduced with fio
'randrw' test with --numjobs=32.  E.g.  the result for 1 cluster (4 cores)
is:

Run status group 0 (all jobs):
   READ: io=244785MB, aggrb=496883KB/s, minb=15527KB/s, ...
  WRITE: io=246735MB, aggrb=500841KB/s, minb=15651KB/s, ...

While for 8 cores (2 clusters) the result is:

Run status group 0 (all jobs):
   READ: io=244785MB, aggrb=265942KB/s, minb=8310KB/s, ...
  WRITE: io=246735MB, aggrb=268060KB/s, minb=8376KB/s, ...

The bottleneck here is the pool lock which many threads become waiting
upon.  To reduce that spin lock contention, z3fold can operate only on
the lists local to the current CPU whenever possible.  Due to the nature
of z3fold unbuddied list handling (it only takes the first entry off the
list on a hot path), if the z3fold pool is big enough and balanced well
enough, limiting search to only local unbuddied list doesn't lead to a
significant compression ratio degrade (2.57x vs 2.65x in our
measurements).

This patch also introduces two worker threads: one for async in-page
object layout optimization and one for releasing freed pages.  This is
done to speed up z3fold_free() which is often on a hot path.

The fio results for 8-core case are now the following:

Run status group 0 (all jobs):
   READ: io=244785MB, aggrb=1568.3MB/s, minb=50182KB/s, ...
  WRITE: io=246735MB, aggrb=1580.8MB/s, minb=50582KB/s, ...

So we're in for almost 6x performance increase.

Link: http://lkml.kernel.org/r/20170806181443.f9b65018f8bde25ef990f9e8@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d30561c5

14 4月, 2017 1 次提交

z3fold: fix page locking in z3fold_alloc() · 76e32a2a

由 Vitaly Wool 提交于 4月 13, 2017

Stress testing of the current z3fold implementation on a 8-core system
revealed it was possible that a z3fold page deleted from its unbuddied
list in z3fold_alloc() would be put on another unbuddied list by
z3fold_free() while z3fold_alloc() is still processing it.  This has
been introduced with commit 5a27aa82 ("z3fold: add kref refcounting")
due to the removal of special handling of a z3fold page not on any list
in z3fold_free().

To fix this, the z3fold page lock should be taken in z3fold_alloc()
before the pool lock is released.  To avoid deadlocking, we just try to
lock the page as soon as we get a hold of it, and if trylock fails, we
drop this page and take the next one.
Signed-off-by: NVitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76e32a2a

17 3月, 2017 1 次提交

z3fold: fix spinlock unlocking in page reclaim · 271df90e

由 Vitaly Wool 提交于 3月 16, 2017

Commmit 5a27aa82 ("z3fold: add kref refcounting") introduced a bug
in z3fold_reclaim_page() with function exit that may leave pool->lock
spinlock held. Here comes the trivial fix.

Fixes: 5a27aa82 ("z3fold: add kref refcounting")
Link: http://lkml.kernel.org/r/20170311222239.7b83d8e7ef1914e05497649f@gmail.comReported-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: NVitaly Wool <vitalywool@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

271df90e

25 2月, 2017 5 次提交

z3fold: add kref refcounting · 5a27aa82

由 Vitaly Wool 提交于 2月 24, 2017

With both coming and already present locking optimizations, introducing
kref to reference-count z3fold objects is the right thing to do.
Moreover, it makes buddied list no longer necessary, and allows for a
simpler handling of headless pages.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170131214650.8ea78033d91ded233f552bc0@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a27aa82

z3fold: use per-page spinlock · 2f1e5e4d

由 Vitaly Wool 提交于 2月 24, 2017

Most of z3fold operations are in-page, such as modifying z3fold page
header or moving z3fold objects within a page. Taking per-pool spinlock
to protect per-page objects is therefore suboptimal, and the idea of
having a per-page spinlock (or rwlock) has been around for some time.

This patch implements spinlock-based per-page locking mechanism which is
lightweight enough to normally fit ok into the z3fold header.

Link: http://lkml.kernel.org/r/20170131214438.433e0a5fda908337b63206d3@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f1e5e4d

z3fold: extend compaction function · 1b096e5a

由 Vitaly Wool 提交于 2月 24, 2017

z3fold_compact_page() currently only handles the situation when there's
a single middle chunk within the z3fold page. However it may be worth
it to move middle chunk closer to either first or last chunk, whichever
is there, if the gap between them is big enough.

This patch adds the relevant code, using BIG_CHUNK_GAP define as a
threshold for middle chunk to be worth moving.

Link: http://lkml.kernel.org/r/20170131214334.c4f3eac9a477af0fa9a22c46@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b096e5a

z3fold: fix header size related issues · ede93213

由 Vitaly Wool 提交于 2月 24, 2017

Currently the whole kernel build will be stopped if the size of struct
z3fold_header is greater than the size of one chunk, which is 64 bytes
by default. This patch instead defines the offset for z3fold objects as
the size of the z3fold header in chunks.

Fixed also are the calculation of num_free_chunks() and the address to
move the middle chunk to in case of in-page compaction in
z3fold_compact_page().

Link: http://lkml.kernel.org/r/20170131214057.d98677032bc7b1c6c59a80c9@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ede93213

z3fold: make pages_nr atomic · 12d59ae6

由 Vitaly Wool 提交于 2月 24, 2017

Convert pages_nr per-pool counter to atomic64_t.

Link: http://lkml.kernel.org/r/20170131213946.b828676ab17bbea42022c213@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12d59ae6

23 2月, 2017 1 次提交

mm/z3fold.c: limit first_num to the actual range of possible buddy indexes · f201ebd8

由 zhong jiang 提交于 2月 22, 2017

At present, Tying the first_num size to NCHUNKS_ORDER is confusing. the
number of chunks is completely unrelated to the number of buddies.

The patch limits the first_num to actual range of possible buddy indexes.
and that is more reasonable and obvious without functional change.

Link: http://lkml.kernel.org/r/1476776569-29504-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com>
Suggested-by: NDan Streetman <ddstreet@ieee.org>
Acked-by: NDan Streetman <ddstreet@ieee.org>
Acked-by: NVitaly Wool <vitalywool@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f201ebd8

04 6月, 2016 1 次提交

mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup · 43afc194

由 Vitaly Wool 提交于 6月 03, 2016

Fix erroneous z3fold header access in a HEADLESS page in reclaim
function, and change one remaining direct handle-to-buddy conversion to
use the appropriate helper.

Link: http://lkml.kernel.org/r/5748706F.9020208@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

43afc194

21 5月, 2016 1 次提交

z3fold: the 3-fold allocator for compressed pages · 9a001fc1

由 Vitaly Wool 提交于 5月 20, 2016

This patch introduces z3fold, a special purpose allocator for storing
compressed pages.  It is designed to store up to three compressed pages
per physical page.  It is a ZBUD derivative which allows for higher
compression ratio keeping the simplicity and determinism of its
predecessor.

This patch comes as a follow-up to the discussions at the Embedded Linux
Conference in San-Diego related to the talk [1].  The outcome of these
discussions was that it would be good to have a compressed page
allocator as stable and deterministic as zbud with with higher
compression ratio.

To keep the determinism and simplicity, z3fold, just like zbud, always
stores an integral number of compressed pages per page, but it can store
up to 3 pages unlike zbud which can store at most 2.  Therefore the
compression ratio goes to around 2.6x while zbud's one is around 1.7x.

The patch is based on the latest linux.git tree.

This version has been updated after testing on various simulators (e.g.
ARM Versatile Express, MIPS Malta, x86_64/Haswell) and basing on
comments from Dan Streetman [3].

[1] https://openiotelc2016.sched.org/event/6DAC/swapping-and-embedded-compression-relieves-the-pressure-vitaly-wool-softprise-consulting-ou
[2] https://lkml.org/lkml/2016/4/21/799
[3] https://lkml.org/lkml/2016/5/4/852

Link: http://lkml.kernel.org/r/20160509151753.ec3f9fda3c9898d31ff52a32@gmail.comSigned-off-by: NVitaly Wool <vitalywool@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a001fc1