提交 · 3eba0c6a56c04f2b017b43641a821f1ebfb7fb4c · openeuler / Kernel

13 2月, 2015 1 次提交

mm/zpool: add name argument to create zpool · 3eba0c6a

由 Ganesh Mahendran 提交于 2月 12, 2015

Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them.  There is not a method to let zsmalloc/zbud find which caller they
belong to.

Now we want to add statistics collection in zsmalloc.  We need to name the
debugfs dir for each pool created.  The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.

    /sys/kernel/debug/zsmalloc/zram0

This patch adds an argument `name' to zs_create_pool() and other related
functions.
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3eba0c6a

19 12月, 2014 1 次提交

mm/zsmalloc: adjust order of functions · 66cdef66

由 Ganesh Mahendran 提交于 12月 18, 2014

Currently functions in zsmalloc.c does not arranged in a readable and
reasonable sequence.  With the more and more functions added, we may
meet below inconvenience.  For example:

Current functions:

    void zs_init()
    {
    }

    static void get_maxobj_per_zspage()
    {
    }

Then I want to add a func_1() which is called from zs_init(), and this
new added function func_1() will used get_maxobj_per_zspage() which is
defined below zs_init().

    void func_1()
    {
        get_maxobj_per_zspage()
    }

    void zs_init()
    {
        func_1()
    }

    static void get_maxobj_per_zspage()
    {
    }

This will cause compiling issue. So we must add a declaration:

    static void get_maxobj_per_zspage();

before func_1() if we do not put get_maxobj_per_zspage() before
func_1().

In addition, puting module_[init|exit] functions at the bottom of the
file conforms to our habit.

So, this patch ajusts function sequence as:

    /* helper functions */
    ...
    obj_location_to_handle()
    ...

    /* Some exported functions */
    ...

    zs_map_object()
    zs_unmap_object()

    zs_malloc()
    zs_free()

    zs_init()
    zs_exit()
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

66cdef66

14 12月, 2014 6 次提交

mm/zsmalloc: allocate exactly size of struct zs_pool · 18136656

由 Ganesh Mahendran 提交于 12月 12, 2014

In zs_create_pool(), we allocate memory more then sizeof(struct zs_pool)
  ovhd_size = roundup(sizeof(*pool), PAGE_SIZE);

This patch allocate memory of exactly needed size.
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18136656

mm/zsmalloc: avoid duplicate assignment of prev_class · df8b5bb9

由 Ganesh Mahendran 提交于 12月 12, 2014

In zs_create_pool(), prev_class is assigned (ZS_SIZE_CLASSES - 1) times.
And the prev_class only references to the previous size_class.  So we do
not need unnecessary assignement.

This patch assigns *prev_class* when a new size_class structure is
allocated and uses prev_class to check whether the first class has been
allocated.

[akpm@linux-foundation.org: remove now-unused ZS_SIZE_CLASSES]
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df8b5bb9

mm/zsmalloc: support allocating obj with size of ZS_MAX_ALLOC_SIZE · 40f9fb8c

由 Mahendran Ganesh 提交于 12月 12, 2014

I sent a patch [1] for unnecessary check in zsmalloc.  And Minchan Kim
found zsmalloc even does not support allocating an obj with the size of
ZS_MAX_ALLOC_SIZE in some situations.

For example:
   In system with 64KB PAGE_SIZE and 32 bit of physical addr. Then:
   ZS_MIN_ALLOC_SIZE is 32 bytes which is calculated by:
      MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
   ZS_MAX_ALLOC_SIZE is 64KB(in current code, is PAGE_SIZE)
   ZS_SIZE_CLASS_DELTA is 256 bytes
   So, ZS_SIZE_CLASSES = (ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) /
                          ZS_SIZE_CLASS_DELTA + 1
                       = 256

   In zs_create_pool(), the max size obj which can be allocated will be:
      ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA = 32 + 255*256 = 65312

   We can see that 65312 < 65536 (ZS_MAX_ALLOC_SIZE). So we can NOT
   allocate objs with size ZS_MAX_ALLOC_SIZE(65536) which we promise upper
   users we can do.

 [1]  http://lkml.iu.edu/hypermail/linux/kernel/1411.2/03835.html
 [2]  http://lkml.iu.edu/hypermail/linux/kernel/1411.2/04534.html

This patch fixes this issue by dynamiclly calculating zs_size_classes when
module is loaded, allocates buffer with size ZS_MAX_ALLOC_SIZE.  Then the
max obj(size is ZS_MAX_ALLOC_SIZE) can be stored in it.

[akpm@linux-foundation.org: restore ZS_SIZE_CLASSES to fix bisectability]
Signed-off-by: NMahendran Ganesh <opensource.ganesh@gmail.com>
Suggested-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40f9fb8c

zsmalloc: correct fragile [kmap|kunmap]_atomic use · af4ee5e9

由 Minchan Kim 提交于 12月 12, 2014

The kunmap_atomic should use virtual address getting by kmap_atomic.
However, some pieces of code in zsmalloc uses modified address, not the
one got by kmap_atomic for kunmap_atomic.

It's okay for working because zsmalloc modifies the address inner
PAGE_SIZE bounday so it works with current kmap_atomic's implementation.
But it's still fragile with potential changing of kmap_atomic so let's
correct it.

I got a subtle bug when I implemented a new feature of zsmalloc
(compaction) due to a link's mishandling (the link was over page
boundary).  Although it was totally my mistake, it took a while to find
the cause because an unpredictable kmapped address was unmapped causing an
almost random crash.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af4ee5e9

zsmalloc: fix zs_init cpu notifier error handling · b1b00a5b

由 Sergey Senozhatsky 提交于 12月 12, 2014

Mahendran Ganesh reported that zpool-enabled zsmalloc should not call
zpool_unregister_driver() from zs_init() if cpu notifier registration has
failed, because error handling is performed before we register the driver
via zpool_register_driver() call.

Factor out cpu notifier registration and unregistration code and fix
zs_init() error handling.

link: http://lkml.iu.edu//hypermail/linux/kernel/1411.1/04156.html
[akpm@linux-foundation.org: squash bogus gcc warning]
[akpm@linux-foundation.org: use __init and __exit]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NMahendran Ganesh <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b1b00a5b

zsmalloc: merge size_class to reduce fragmentation · 9eec4cd5

由 Joonsoo Kim 提交于 12月 12, 2014

zsmalloc has many size_classes to reduce fragmentation and they are in 16
bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096.  And,
zsmalloc has constraint that each zspage has 4 pages at maximum.

In this situation, we can see interesting aspect.  Let's think about
size_class for 1488, 1472, ..., 1376.  To prevent external fragmentation,
they uses 4 pages per zspage and so all they can contain 11 objects at
maximum.

16384 (4096 * 4) = 1488 * 11 + remains
16384 (4096 * 4) = 1472 * 11 + remains
16384 (4096 * 4) = ...
16384 (4096 * 4) = 1376 * 11 + remains

It means that they have same characteristics and classification between
them isn't needed.  If we use one size_class for them, we can reduce
fragementation and save some memory since both the 1488 and 1472 sized
classes can only fit 11 objects into 4 pages, and an object that's 1472
bytes can fit into an object that's 1488 bytes, merging these classes to
always use objects that are 1488 bytes will reduce the total number of
size classes.  And reducing the total number of size classes reduces
overall fragmentation, because a wider range of compressed pages can fit
into a single size class, leaving less unused objects in each size class.

For this purpose, this patch implement size_class merging.  If there is
size_class that have same pages_per_zspage and same number of objects per
zspage with previous size_class, we don't create new size_class.  Instead,
we use previous, same characteristic size_class.  With this way, above
example sizes (1488, 1472, ..., 1376) use just one size_class so we can
get much more memory utilization.

Below is result of my simple test.

TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
source code, remove directory in descending order in size.  (drivers arch
fs sound include net Documentation firmware kernel tools)

Each line represents orig_data_size, compr_data_size, mem_used_total,
fragmentation overhead (mem_used - compr_data_size) and overhead ratio
(overhead to compr_data_size), respectively, after untar and remove
operation is executed.

* untar-nomerge.out

orig_size compr_size used_size overhead overhead_ratio
525.88MB 199.16MB 210.23MB  11.08MB 5.56%
288.32MB  97.43MB 105.63MB   8.20MB 8.41%
177.32MB  61.12MB  69.40MB   8.28MB 13.55%
146.47MB  47.32MB  56.10MB   8.78MB 18.55%
124.16MB  38.85MB  48.41MB   9.55MB 24.58%
103.93MB  31.68MB  40.93MB   9.25MB 29.21%
 84.34MB  22.86MB  32.72MB   9.86MB 43.13%
 66.87MB  14.83MB  23.83MB   9.00MB 60.70%
 60.67MB  11.11MB  18.60MB   7.49MB 67.48%
 55.86MB   8.83MB  16.61MB   7.77MB 88.03%
 53.32MB   8.01MB  15.32MB   7.31MB 91.24%

* untar-merge.out

orig_size compr_size used_size overhead overhead_ratio
526.23MB 199.18MB 209.81MB  10.64MB 5.34%
288.68MB  97.45MB 104.08MB   6.63MB 6.80%
177.68MB  61.14MB  66.93MB   5.79MB 9.47%
146.83MB  47.34MB  52.79MB   5.45MB 11.51%
124.52MB  38.87MB  44.30MB   5.43MB 13.96%
104.29MB  31.70MB  36.83MB   5.13MB 16.19%
 84.70MB  22.88MB  27.92MB   5.04MB 22.04%
 67.11MB  14.83MB  19.26MB   4.43MB 29.86%
 60.82MB  11.10MB  14.90MB   3.79MB 34.17%
 55.90MB   8.82MB  12.61MB   3.79MB 42.97%
 53.32MB   8.01MB  11.73MB   3.73MB 46.53%

As you can see above result, merged one has better utilization (overhead
ratio, 5th column) and uses less memory (mem_used_total, 3rd column).
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: <juno.choi@lge.com>
Cc: "seungho1.park" <seungho1.park@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9eec4cd5

10 10月, 2014 4 次提交

zsmalloc: simplify init_zspage free obj linking · 5538c562

由 Dan Streetman 提交于 10月 09, 2014

Change zsmalloc init_zspage() logic to iterate through each object on each
of its pages, checking the offset to verify the object is on the current
page before linking it into the zspage.

The current zsmalloc init_zspage free object linking code has logic that
relies on there only being one page per zspage when PAGE_SIZE is a
multiple of class->size.  It calculates the number of objects for the
current page, and iterates through all of them plus one, to account for
the assumed partial object at the end of the page.  While this currently
works, the logic can be simplified to just link the object at each
successive offset until the offset is larger than PAGE_SIZE, which does
not rely on PAGE_SIZE being a multiple of class->size.
Signed-off-by: NDan Streetman <ddstreet@ieee.org>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5538c562

mm/zsmalloc.c: correct comment for fullness group computation · 6dd9737e

由 Wang Sheng-Hui 提交于 10月 09, 2014

The letter 'f' in "n <= N/f" stands for fullness_threshold_frac, not
1/fullness_threshold_frac.
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6dd9737e

zsmalloc: change return value unit of zs_get_total_size_bytes · 722cdc17

由 Minchan Kim 提交于 10月 09, 2014

zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with
*byte unit* but zsmalloc operates *page unit* rather than byte unit so
let's change the API so benefit we could get is that reduce unnecessary
overhead (ie, change page unit with byte unit) in zsmalloc.

Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

722cdc17

zsmalloc: move pages_allocated to zs_pool · 13de8933

由 Minchan Kim 提交于 10月 09, 2014

Currently, zram has no feature to limit memory so theoretically zram can
deplete system memory.  Users have asked for a limit several times as even
without exhaustion zram makes it hard to control memory usage of the
platform.  This patchset adds the feature.

Patch 1 makes zs_get_total_size_bytes faster because it would be used
frequently in later patches for the new feature.

Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page
so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT).

Patch 3 adds new feature.  I added the feature into zram layer, not
zsmalloc because limiation is zram's requirement, not zsmalloc so any
other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary
branch of zsmalloc.  In future, if every users of zsmalloc want the
feature, then, we could move the feature from client side to zsmalloc
easily but vice versa would be painful.

Patch 4 adds news facility to report maximum memory usage of zram so that
this avoids user polling frequently via /sys/block/zram0/ mem_used_total
and ensures transient max are not missed.

This patch (of 4):

pages_allocated has counted in size_class structure and when user of
zsmalloc want to see total_size_bytes, it should gather all of count from
each size_class to report the sum.

It's not bad if user don't see the value often but if user start to see
the value frequently, it would be not a good deal for performance pov.

This patch moves the count from size_class to zs_pool so it could reduce
memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]).
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Reviewed-by: NDavid Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13de8933

30 8月, 2014 1 次提交

mm/zpool: use prefixed module loading · 137f8cff

由 Kees Cook 提交于 8月 29, 2014

To avoid potential format string expansion via module parameters, do not
use the zpool type directly in request_module() without a format string.
Additionally, to avoid arbitrary modules being loaded via zpool API
(e.g.  via the zswap_zpool_type module parameter) add a "zpool-" prefix
to the requested module, as well as module aliases for the existing
zpool types (zbud and zsmalloc).
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

137f8cff

07 8月, 2014 3 次提交

mm/zpool: zbud/zsmalloc implement zpool · c795779d

由 Dan Streetman 提交于 8月 06, 2014

Update zbud and zsmalloc to implement the zpool api.

[fengguang.wu@intel.com: make functions static]
Signed-off-by: NDan Streetman <ddstreet@ieee.org>
Tested-by: NSeth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c795779d

mm/zpool: implement common zpool api to zbud/zsmalloc · af8d417a

由 Dan Streetman 提交于 8月 06, 2014

Add zpool api.

zpool provides an interface for memory storage, typically of compressed
memory.  Users can select what backend to use; currently the only
implementations are zbud, a low density implementation with up to two
compressed pages per storage page, and zsmalloc, a higher density
implementation with multiple compressed pages per storage page.
Signed-off-by: NDan Streetman <ddstreet@ieee.org>
Tested-by: NSeth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af8d417a

mm/vmalloc.c: clean up map_vm_area third argument · f6f8ed47

由 WANG Chao 提交于 8月 06, 2014

Currently map_vm_area() takes (struct page *** pages) as third argument,
and after mapping, it moves (*pages) to point to (*pages +
nr_mappped_pages).

It looks like this kind of increment is useless to its caller these
days.  The callers don't care about the increments and actually they're
trying to avoid this by passing another copy to map_vm_area().

The caller can always guarantee all the pages can be mapped into vm_area
as specified in first argument and the caller only cares about whether
map_vm_area() fails or not.

This patch cleans up the pointer movement in map_vm_area() and updates
its callers accordingly.
Signed-off-by: NWANG Chao <chaowang@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6f8ed47

05 6月, 2014 2 次提交

zsmalloc: fixup trivial zs size classes value in comments · 7eb52512

由 Weijie Yang 提交于 6月 04, 2014

According to calculation, ZS_SIZE_CLASSES value is 255 on systems with 4K
page size, not 254.  The old value may forget count the ZS_MIN_ALLOC_SIZE
in.

This patch fixes this trivial issue in the comments.
Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7eb52512

mm: replace __get_cpu_var uses with this_cpu_ptr · 7c8e0181

由 Christoph Lameter 提交于 6月 04, 2014

Replace places where __get_cpu_var() is used for an address calculation
with this_cpu_ptr().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c8e0181

20 3月, 2014 1 次提交

zsmalloc: Fix CPU hotplug callback registration · f0e71fcd

由 Srivatsa S. Bhat 提交于 3月 11, 2014

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the zsmalloc code by using this latter form of callback registration.

Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f0e71fcd

31 1月, 2014 2 次提交

zsmalloc: add copyright · 31fc00bb

由 Minchan Kim 提交于 1月 30, 2014

Add my copyright to the zsmalloc source code which I maintain.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31fc00bb

zsmalloc: move it under mm · bcf1647d

由 Minchan Kim 提交于 1月 30, 2014

This patch moves zsmalloc under mm directory.

Before that, description will explain why we have needed custom
allocator.

Zsmalloc is a new slab-based memory allocator for storing compressed
pages.  It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.

zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.

zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms.  Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab.  This allows for higher allocation success rate under memory
pressure.

Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.

This ability to span pages results in zsmalloc allocations not being
directly addressable by the user.  The user is given an
non-dereferencable handle in response to an allocation request.  That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used.  The mapping is necessary since the
object data may reside in two different noncontigious pages.

The zsmalloc fulfills the allocation needs for zram perfectly

[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bcf1647d

18 12月, 2013 1 次提交

staging: delete non-required instances of include <linux/init.h> · 885a947e

由 Paul Gortmaker 提交于 12月 10, 2013

None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

885a947e

11 12月, 2013 2 次提交

zsmalloc: add more comment · c3e3e88a

由 Nitin Cupta 提交于 12月 11, 2013

This patch adds lots of comments and it will help others
to review and enhance.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c3e3e88a

zsmalloc: add Kconfig for enabling page table method · 1b945aee

由 Minchan Kim 提交于 12月 11, 2013

Zsmalloc has two methods 1) copy-based and 2) pte based to
access objects that span two pages.
You can see history why we supported two approach from [1].

But it was bad choice that adding hard coding to select arch
which want to use pte based method because there are lots of
SoC in an architecure and they can have different cache size,
CPU speed and so on so it would be better to expose it to user
as selectable Kconfig option like Andrew Morton suggested.

[1] https://lkml.org/lkml/2012/7/11/58Acked-by: NNitin Gupta <ngupta@vflare.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1b945aee

26 11月, 2013 1 次提交

staging: zsmalloc: Ensure handle is never 0 on success · 67296874

由 Olav Haugan 提交于 11月 22, 2013

zsmalloc encodes a handle using the pfn and an object
index. On hardware platforms with physical memory starting
at 0x0 the pfn can be 0. This causes the encoded handle to be
0 and is incorrectly interpreted as an allocation failure.

This issue affects all current and future SoCs with physical
memory starting at 0x0. All MSM8974 SoCs which includes
Google Nexus 5 devices are affected.

To prevent this false error we ensure that the encoded handle
will not be 0 when allocation succeeds.
Signed-off-by: NOlav Haugan <ohaugan@codeaurora.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

67296874

24 7月, 2013 1 次提交
- S
  staging: zsmalloc: access page->private by using page_private macro · e842b976
  由 Sunghan Suh 提交于 7月 12, 2013
```
Signed-off-by: NSunghan Suh <sunghan.suh@samsung.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
```
  e842b976
22 5月, 2013 1 次提交

staging/zsmalloc: Fixed up incorrect formatted comments · 396b7fd6

由 Sara Bird 提交于 5月 20, 2013

The existing comments are using an odd style. Fixed them up to adhere
to the StyleGuide. No code changes.
Signed-off-by: NSara Bird <sara.bird.iar@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

396b7fd6

21 5月, 2013 1 次提交

Staging: Fixes string split across lines in zsmalloc zsmalloc-main · 93ad5ab5

由 Marlies Ruck 提交于 5月 15, 2013

Fixes the following checkpatch warning:
WARNING: quoted string split across lines
Signed-off-by: NMarlies Ruck <marlies.ruck@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

93ad5ab5

24 4月, 2013 1 次提交

staging/zsmalloc: don't use pgtable-mapping from modules · 796ce5a7

由 Arnd Bergmann 提交于 4月 23, 2013

Building zsmalloc as a module does not work on ARM because it uses
an interface that is not exported:

ERROR: "flush_tlb_kernel_range" [drivers/staging/zsmalloc/zsmalloc.ko] undefined!

Since this is only used as a performance optimization and only on ARM,
we can avoid the problem simply by not using that optimization when
building zsmalloc it is a loadable module.

flush_tlb_kernel_range is often an inline function, but out of the
architectures that use an extern function, only powerpc exports
it.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

796ce5a7

29 3月, 2013 1 次提交

staging: zsmalloc: Fix link error on ARM · d95abbbb

由 Joerg Roedel 提交于 3月 27, 2013

Testing the arm chromebook config against the upstream
kernel produces a linker error for the zsmalloc module from
staging. The symbol flush_tlb_kernel_range is not available
there. Fix this by removing the reimplementation of
unmap_kernel_range in the zsmalloc module and using the
function directly. The unmap_kernel_range function is not
usable by modules, so also disallow building the driver as a
module for now.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: NJoerg Roedel <joro@8bytes.org>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d95abbbb

24 2月, 2013 1 次提交

mm: rename page struct field helpers · 22b751c3

由 Mel Gorman 提交于 2月 22, 2013

The function names page_xchg_last_nid(), page_last_nid() and
reset_page_last_nid() were judged to be inconsistent so rename them to a
struct_field_op style pattern.  As it looked jarring to have
reset_page_mapcount() and page_nid_reset_last() beside each other in
memmap_init_zone(), this patch also renames reset_page_mapcount() to
page_mapcount_reset().  There are others like init_page_count() but as
it is used throughout the arch code a rename would likely cause more
conflicts than it is worth.

[akpm@linux-foundation.org: fix zcache]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

22b751c3

31 1月, 2013 1 次提交

staging: zsmalloc: remove unused pool name · 0d145a50

由 Seth Jennings 提交于 1月 30, 2013

zs_create_pool() currently takes a name argument which is
never used in any useful way.

This patch removes it.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0d145a50

30 1月, 2013 2 次提交

staging: zsmalloc: Fix TLB coherency and build problem · 99155188

由 Minchan Kim 提交于 1月 28, 2013

Recently, Matt Sealey reported he fail to build zsmalloc caused by
using of local_flush_tlb_kernel_range which are architecture dependent
function so !CONFIG_SMP in ARM couldn't implement it so it ends up
build error following as.

  MODPOST 216 modules
  LZMA    arch/arm/boot/compressed/piggy.lzma
  AS      arch/arm/boot/compressed/lib1funcs.o
ERROR: "v7wbi_flush_kern_tlb_range"
[drivers/staging/zsmalloc/zsmalloc.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....

The reason we used that function is copy method by [1]
was really slow in ARM but at that time.

More severe problem is ARM can prefetch speculatively on other CPUs
so under us, other TLBs can have an entry only if we do flush local
CPU. Russell King pointed that. Thanks!
We don't have many choices except using flush_tlb_kernel_range.

My experiment in ARMv7 processor 4 core didn't make any difference with
zsmapbench[2] between local_flush_tlb_kernel_range and flush_tlb_kernel_range
but still page-table based is much better than copy-based.

* bigger is better.

1. local_flush_tlb_kernel_range: 3918795 mappings
2. flush_tlb_kernel_range : 3989538 mappings
3. copy-based: 635158 mappings

This patch replace local_flush_tlb_kernel_range with
flush_tlb_kernel_range which are avaialbe in all architectures
because we already have used it in vmalloc allocator which are
generic one so build problem should go away and performane loss
shoud be void.

[1] f553646a, zsmalloc: add page table mapping method
[2] https://github.com/spartacus06/zsmapbench

Cc: stable@vger.kernel.org
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Reported-by: NMatt Sealey <matt@genesi-usa.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

99155188

staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE · d662b8eb

由 Seth Jennings 提交于 1月 25, 2013

Right now ZS_SIZE_CLASS_DELTA is hardcoded to be 16.  This
creates 254 classes for systems with 4k pages. However, on
PPC64 with 64k pages, it creates 4095 classes which is far
too many.

This patch makes ZS_SIZE_CLASS_DELTA relative to PAGE_SIZE
so that regardless of the page size, there will be the same
number of classes.
Acked-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NDan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d662b8eb

16 1月, 2013 1 次提交

staging: zsmalloc: comment zs_create_pool function · 4bbc0bc0

由 Davidlohr Bueso 提交于 1月 04, 2013

Just as with zs_malloc() and zs_map_object(), it is worth
formally commenting the zs_create_pool() function.
Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4bbc0bc0

14 8月, 2012 4 次提交

zsmalloc: collapse internal .h into .c · 0959c63f

由 Seth Jennings 提交于 8月 08, 2012

The patch collapses in the internal zsmalloc_int.h into
the zsmalloc-main.c file.

This is done in preparation for the promotion to mm/ where
separate internal headers are discouraged.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0959c63f

staging: zsmalloc: add page table mapping method · f553646a

由 Seth Jennings 提交于 7月 18, 2012

This patchset provides page mapping via the page table.
On some archs, most notably ARM, this method has been
demonstrated to be faster than copying.

The logic controlling the method selection (copy vs page table)
is controlled by the definition of USE_PGTABLE_MAPPING which
is/can be defined for any arch that performs better with page
table mapping.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f553646a

staging: zsmalloc: prevent mappping in interrupt context · c60369f0

由 Seth Jennings 提交于 7月 18, 2012

Because we use per-cpu mapping areas shared among the
pools/users, we can't allow mapping in interrupt context
because it can corrupt another users mappings.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c60369f0

staging: zsmalloc: s/firstpage/page in new copy map funcs · 6539a36c

由 Seth Jennings 提交于 7月 18, 2012

firstpage already has precedent and meaning the first page
of a zspage.  In the case of the copy mapping functions,
it is the first of a pair of pages needing to be mapped.

This patch just renames the firstpage argument to "page" to
avoid confusion.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6539a36c

10 7月, 2012 1 次提交

staging: zsmalloc: add mapping modes · b7418510

由 Seth Jennings 提交于 7月 02, 2012

This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b7418510

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功