提交 · cb8f2eec3c5c87e31219c5e58625b8e890004e48 · OpenHarmony / kernel_linux

07 8月, 2014 1 次提交

zram: rename struct `table' to `zram_table_entry' · cb8f2eec

由 Sergey Senozhatsky 提交于 8月 06, 2014

Andrew Morton has recently noted that `struct table' actually represents
table entry and, thus, should be renamed.  Rename to `zram_table_entry'.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb8f2eec

08 4月, 2014 7 次提交

zram: make compression algorithm selection possible · e46b8a03

由 Sergey Senozhatsky 提交于 4月 07, 2014

Add and document `comp_algorithm' device attribute.  This attribute allows
to show supported compression and currently selected compression
algorithms:

	cat /sys/block/zram0/comp_algorithm
	[lzo] lz4

and change selected compression algorithm:
	echo lzo > /sys/block/zram0/comp_algorithm
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e46b8a03

zram: add multi stream functionality · beca3ec7

由 Sergey Senozhatsky 提交于 4月 07, 2014

Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released.  This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr).  Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel.  See TEST section later in commit message for
performance data.

Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access.  zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.

The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
  find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
  create and destroy zcomp_strm_multi

zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.

Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:

- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
  unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
  awaken by zcomp_strm_multi_release() caller.

zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper

Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)

base                      spinlock                    mutex

==Initial write           ==Initial write             ==Initial  write
records:  5               records:  5                 records:   5
avg:      1642424.35      avg:      699610.40         avg:       1655583.71
std:      39890.95(2.43%) std:      232014.19(33.16%) std:       52293.96
max:      1690170.94      max:      1163473.45        max:       1697164.75
min:      1568669.52      min:      573429.88         min:       1553410.23
==Rewrite                 ==Rewrite                   ==Rewrite
records:  5               records:  5                 records:   5
avg:      1611775.39      avg:      501406.64         avg:       1684419.11
std:      17144.58(1.06%) std:      15354.41(3.06%)   std:       18367.42
max:      1641800.95      max:      531356.78         max:       1706445.84
min:      1593515.27      min:      488817.78         min:       1655335.73

When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up().  This
is why single stream implemented as a special case with mutex locking.

Introduce and document zram device attribute max_comp_streams.  This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm).  Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).

max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).

default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.

Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.

TEST
iozone -t 3 -R -r 16K -s 60M -I +Z

       test           base       1 strm (mutex)     3 strm (spinlock)
-----------------------------------------------------------------------
 Initial write      589286.78       583518.39          718011.05
       Rewrite      604837.97       596776.38         1515125.72
  Random write      584120.11       595714.58         1388850.25
        Pwrite      535731.17       541117.38          739295.27
        Fwrite     1418083.88      1478612.72         1484927.06

Usage example:
set max_comp_streams to 4
        echo 4 > /sys/block/zram0/max_comp_streams

show current max_comp_streams (default value is 1).
        cat /sys/block/zram0/max_comp_streams
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

beca3ec7

zram: use zcomp compressing backends · b7ca232e

由 Sergey Senozhatsky 提交于 4月 07, 2014

Do not perform direct LZO compress/decompress calls, initialise
and use zcomp LZO backend (single compression stream) instead.

[akpm@linux-foundation.org: resolve conflicts with zram-delete-zram_init_device-fix.patch]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7ca232e

zram: drop not used table `count' member · 59fc86a4

由 Sergey Senozhatsky 提交于 4月 07, 2014

struct table `count' member is not used.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

59fc86a4

zram: use atomic64_t for all zram stats · 90a7806e

由 Sergey Senozhatsky 提交于 4月 07, 2014

This is a preparation patch for stats code duplication removal.

1) use atomic64_t for `pages_zero' and `pages_stored' zram stats.

2) `compr_size' and `pages_zero' struct zram_stats members did not
   follow the existing device attr naming scheme: zram_stats.ATTR has
   ATTR_show() function.  rename them:

   -- compr_size -> compr_data_size
   -- pages_zero -> zero_pages

Minchan Kim's note:
 If we really have trouble with atomic stat operation, we could
 change it with percpu_counter so that it could solve atomic overhead and
 unnecessary memory space by introducing unsigned long instead of 64bit
 atomic_t.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90a7806e

zram: remove good and bad compress stats · b7cccf8b

由 Sergey Senozhatsky 提交于 4月 07, 2014

Remove `good' and `bad' compressed sub-requests stats.  RW request may
cause a number of RW sub-requests.  zram used to account `good' compressed
sub-queries (with compressed size less than 50% of original size), `bad'
compressed sub-queries (with compressed size greater that 75% of original
size), leaving sub-requests with compression size between 50% and 75% of
original size not accounted and not reported.  zram already accounts each
sub-request's compression size so we can calculate real device compression
ratio.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7cccf8b

zram: drop `init_done' struct zram member · be2d1d56

由 Sergey Senozhatsky 提交于 4月 07, 2014

Introduce init_done() helper function which allows us to drop `init_done'
struct zram member.  init_done() uses the fact that ->init_done == 1
equals to ->meta != NULL.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be2d1d56

31 1月, 2014 8 次提交

zram: remove zram->lock in read path and change it with mutex · e46e3315

由 Minchan Kim 提交于 1月 30, 2014

Finally, we separated zram->lock dependency from 32bit stat/ table
handling so there is no reason to use rw_semaphore between read and
write path so this patch removes the lock from read path totally and
changes rw_semaphore with mutex.  So, we could do

old:

  read-read: OK
  read-write: NO
  write-write: NO

Now:

  read-read: OK
  read-write: OK
  write-write: NO

The below data proves mixed workload performs well 11 times and there is
also enhance on write-write path because current rw-semaphore doesn't
support SPIN_ON_OWNER.  It's side effect but anyway good thing for us.

Write-related tests perform better (from 61% to 1058%) but read path has
good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.

  CPU 12
  iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

  ==Initial write                ==Initial write
  records: 10                    records: 10
  avg:  516189.16                avg:  839907.96
  std:   22486.53 (4.36%)        std:   47902.17 (5.70%)
  max:  546970.60                max:  909910.35
  min:  481131.54                min:  751148.38
  ==Rewrite                      ==Rewrite
  records: 10                    records: 10
  avg:  509527.98                avg: 1050156.37
  std:   45799.94 (8.99%)        std:   40695.44 (3.88%)
  max:  611574.27                max: 1111929.26
  min:  443679.95                min:  980409.62
  ==Read                         ==Read
  records: 10                    records: 10
  avg: 4408624.17                avg: 4472546.76
  std:  281152.61 (6.38%)        std:  163662.78 (3.66%)
  max: 4867888.66                max: 4727351.03
  min: 4058347.69                min: 4126520.88
  ==Re-read                      ==Re-read
  records: 10                    records: 10
  avg: 4462147.53                avg: 4363257.75
  std:  283546.11 (6.35%)        std:  247292.63 (5.67%)
  max: 4912894.44                max: 4677241.75
  min: 4131386.50                min: 4035235.84
  ==Reverse Read                 ==Reverse Read
  records: 10                    records: 10
  avg: 4565865.97                avg: 4485818.08
  std:  313395.63 (6.86%)        std:  248470.10 (5.54%)
  max: 5232749.16                max: 4789749.94
  min: 4185809.62                min: 3963081.34
  ==Stride read                  ==Stride read
  records: 10                    records: 10
  avg: 4515981.80                avg: 4418806.01
  std:  211192.32 (4.68%)        std:  212837.97 (4.82%)
  max: 4889287.28                max: 4686967.22
  min: 4210362.00                min: 4083041.84
  ==Random read                  ==Random read
  records: 10                    records: 10
  avg: 4410525.23                avg: 4387093.18
  std:  236693.22 (5.37%)        std:  235285.23 (5.36%)
  max: 4713698.47                max: 4669760.62
  min: 4057163.62                min: 3952002.16
  ==Mixed workload               ==Mixed workload
  records: 10                    records: 10
  avg:  243234.25                avg: 2818677.27
  std:   28505.07 (11.72%)       std:  195569.70 (6.94%)
  max:  288905.23                max: 3126478.11
  min:  212473.16                min: 2484150.69
  ==Random write                 ==Random write
  records: 10                    records: 10
  avg:  555887.07                avg: 1053057.79
  std:   70841.98 (12.74%)       std:   35195.36 (3.34%)
  max:  683188.28                max: 1096125.73
  min:  437299.57                min:  992481.93
  ==Pwrite                       ==Pwrite
  records: 10                    records: 10
  avg:  501745.93                avg:  810363.09
  std:   16373.54 (3.26%)        std:   19245.01 (2.37%)
  max:  518724.52                max:  833359.70
  min:  464208.73                min:  765501.87
  ==Pread                        ==Pread
  records: 10                    records: 10
  avg: 4539894.60                avg: 4457680.58
  std:  197094.66 (4.34%)        std:  188965.60 (4.24%)
  max: 4877170.38                max: 4689905.53
  min: 4226326.03                min: 4095739.72
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e46e3315

zram: remove workqueue for freeing removed pending slot · f614a9f4

由 Minchan Kim 提交于 1月 30, 2014

Commit a0c516cb ("zram: don't grab mutex in zram_slot_free_noity")
introduced free request pending code to avoid scheduling by mutex under
spinlock and it was a mess which made code lenghty and increased
overhead.

Now, we don't need zram->lock any more to free slot so this patch
reverts it and then, tb_lock should protect it.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f614a9f4

zram: introduce zram->tb_lock · 92967471

由 Minchan Kim 提交于 1月 30, 2014

Currently, the zram table is protected by zram->lock but it's rather
coarse-grained lock and it makes hard for scalibility.

Let's use own rwlock instead of depending on zram->lock.  This patch
adds new locking so obviously, it would make slow but this patch is just
prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
which is hurdle about zram scalability.

Final patch in this patchset series will remove the lock from read-path
and change rw_semaphore with mutex in write path.  With bonus, we could
drop pending slot free mess in next patch.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92967471

zram: use atomic operation for stat · deb0bdeb

由 Minchan Kim 提交于 1月 30, 2014

Some of fields in zram->stats are protected by zram->lock which is
rather coarse-grained so let's use atomic operation without explict
locking.

This patch is ready for removing dependency of zram->lock in read path
which is very coarse-grained rw_semaphore.  Of course, this patch adds
new atomic operation so it might make slow but my 12CPU test couldn't
spot any regression.  All gain/lose is marginal within stddev.

  iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

  ==Initial write                ==Initial write
  records: 50                    records: 50
  avg:  412875.17                avg:  415638.23
  std:   38543.12 (9.34%)        std:   36601.11 (8.81%)
  max:  521262.03                max:  502976.72
  min:  343263.13                min:  351389.12
  ==Rewrite                      ==Rewrite
  records: 50                    records: 50
  avg:  416640.34                avg:  397914.33
  std:   60798.92 (14.59%)       std:   46150.42 (11.60%)
  max:  543057.07                max:  522669.17
  min:  304071.67                min:  316588.77
  ==Read                         ==Read
  records: 50                    records: 50
  avg: 4147338.63                avg: 4070736.51
  std:  179333.25 (4.32%)        std:  223499.89 (5.49%)
  max: 4459295.28                max: 4539514.44
  min: 3753057.53                min: 3444686.31
  ==Re-read                      ==Re-read
  records: 50                    records: 50
  avg: 4096706.71                avg: 4117218.57
  std:  229735.04 (5.61%)        std:  171676.25 (4.17%)
  max: 4430012.09                max: 4459263.94
  min: 2987217.80                min: 3666904.28
  ==Reverse Read                 ==Reverse Read
  records: 50                    records: 50
  avg: 4062763.83                avg: 4078508.32
  std:  186208.46 (4.58%)        std:  172684.34 (4.23%)
  max: 4401358.78                max: 4424757.22
  min: 3381625.00                min: 3679359.94
  ==Stride read                  ==Stride read
  records: 50                    records: 50
  avg: 4094933.49                avg: 4082170.22
  std:  185710.52 (4.54%)        std:  196346.68 (4.81%)
  max: 4478241.25                max: 4460060.97
  min: 3732593.23                min: 3584125.78
  ==Random read                  ==Random read
  records: 50                    records: 50
  avg: 4031070.04                avg: 4074847.49
  std:  192065.51 (4.76%)        std:  206911.33 (5.08%)
  max: 4356931.16                max: 4399442.56
  min: 3481619.62                min: 3548372.44
  ==Mixed workload               ==Mixed workload
  records: 50                    records: 50
  avg:  149925.73                avg:  149675.54
  std:    7701.26 (5.14%)        std:    6902.09 (4.61%)
  max:  191301.56                max:  175162.05
  min:  133566.28                min:  137762.87
  ==Random write                 ==Random write
  records: 50                    records: 50
  avg:  404050.11                avg:  393021.47
  std:   58887.57 (14.57%)       std:   42813.70 (10.89%)
  max:  601798.09                max:  524533.43
  min:  325176.99                min:  313255.34
  ==Pwrite                       ==Pwrite
  records: 50                    records: 50
  avg:  411217.70                avg:  411237.96
  std:   43114.99 (10.48%)       std:   33136.29 (8.06%)
  max:  530766.79                max:  471899.76
  min:  320786.84                min:  317906.94
  ==Pread                        ==Pread
  records: 50                    records: 50
  avg: 4154908.65                avg: 4087121.92
  std:  151272.08 (3.64%)        std:  219505.04 (5.37%)
  max: 4459478.12                max: 4435857.38
  min: 3730512.41                min: 3101101.67
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

deb0bdeb

zram: add copyright · 7bfb3de8

由 Minchan Kim 提交于 1月 30, 2014

Add my copyright to the zram source code which I maintain.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bfb3de8

zram: remove old private project comment · 49061236

由 Minchan Kim 提交于 1月 30, 2014

Remove the old private compcache project address so upcoming patches
should be sent to LKML because we Linux kernel community will take care.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49061236

zram: promote zram from staging · cd67e10a

由 Minchan Kim 提交于 1月 30, 2014

Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now.  Of
course, there are lots of product using zram in real practice.

The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram
which is used as swap, too and recently Android Kitkat start to use zram
for small memory smart phone.  And there was a report Google released
their ChromeOS with zram, too and cyanogenmod have been used zram long
time ago.  And I heard some disto have used zram block device for tmpfs.
In addition, I saw many report from many other peoples.  For example,
Lubuntu start to use it.

The benefit of zram is very clear.  With my experience, one of the
benefit was to remove jitter of video application with backgroud memory
pressure.  It would be effect of efficient memory usage by compression
but more issue is whether swap is there or not in the system.  Recent
mobile platforms have used JAVA so there are many anonymous pages.  But
embedded system normally are reluctant to use eMMC or SDCard as swap
because there is wear-leveling and latency issues so if we do not use
swap, it means we can't reclaim anoymous pages and at last, we could
encounter OOM kill.  :(

Although we have real storage as swap, it was a problem, too.  Because
it sometime ends up making system very unresponsible caused by slow swap
storage performance.

Quote from Luigi on Google
 "Since Chrome OS was mentioned: the main reason why we don't use swap
  to a disk (rotating or SSD) is because it doesn't degrade gracefully
  and leads to a bad interactive experience.  Generally we prefer to
  manage RAM at a higher level, by transparently killing and restarting
  processes.  But we noticed that zram is fast enough to be competitive
  with the latter, and it lets us make more efficient use of the
  available RAM.  " and he announced.
http://www.spinics.net/lists/linux-mm/msg57717.html

Other uses case is to use zram for block device.  Zram is block device
so anyone can format the block device and mount on it so some guys on
the internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html

Let's promote zram and enhance/maintain it instead of removing.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd67e10a

zsmalloc: move it under mm · bcf1647d

由 Minchan Kim 提交于 1月 30, 2014

This patch moves zsmalloc under mm directory.

Before that, description will explain why we have needed custom
allocator.

Zsmalloc is a new slab-based memory allocator for storing compressed
pages.  It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.

zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.

zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms.  Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab.  This allows for higher allocation success rate under memory
pressure.

Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.

This ability to span pages results in zsmalloc allocations not being
directly addressable by the user.  The user is given an
non-dereferencable handle in response to an allocation request.  That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used.  The mapping is necessary since the
object data may reside in two different noncontigious pages.

The zsmalloc fulfills the allocation needs for zram perfectly

[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bcf1647d

13 8月, 2013 1 次提交

zram: don't grab mutex in zram_slot_free_noity · a0c516cb

由 Minchan Kim 提交于 8月 12, 2013

[1] introduced down_write in zram_slot_free_notify to prevent race
between zram_slot_free_notify and zram_bvec_[read|write]. The race
could happen if somebody who has right permission to open swap device
is reading swap device while it is used by swap in parallel.

However, zram_slot_free_notify is called with holding spin_lock of
swap layer so we shouldn't avoid holing mutex. Otherwise, lockdep
warns it.

This patch adds new list to handle free slot and workqueue
so zram_slot_free_notify just registers slot index to be freed and
registers the request to workqueue. If workqueue is expired,
it holds mutex_lock so there is no problem any more.

If any I/O is issued, zram handles pending slot-free request
caused by zram_slot_free_notify right before handling issued
request because workqueue wouldn't be expired yet so zram I/O
request handling function can miss it.

Lastly, when zram is reset, flush_work could handle all of pending
free request so we shouldn't have memory leak.

NOTE: If zram_slot_free_notify's kmalloc with GFP_ATOMIC would be
failed, the slot will be freed when next write I/O write the slot.

[1] [57ab0485, zram: use zram->lock to protect zram_free_page()
    in swap free notify path]

* from v2
  * refactoring

* from v1
  * totally redesign

Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a0c516cb

25 6月, 2013 1 次提交

zram: remove zram_sysfs file (v2) · 9b3bb7ab

由 Sergey Senozhatsky 提交于 6月 22, 2013

Move zram sysfs code to zram drv and remove zram_sysfs.c
file. This gives ability to make static a number of previously
exported zram functions, used from zram sysfs, e.g. internal zram
zram_meta_alloc/free(). We also can drop zram_drv wrapper
functions, used from zram sysfs:
e.g. zram_reset_device()/__zram_reset_device() pair.

v2: as suggested by Greg K-H, move MODULE description to the
bottom of the file.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9b3bb7ab

07 6月, 2013 3 次提交

zram: use atomic64_xxx() to replace zram_stat64_xxx() · da5cc7d3

由 Jiang Liu 提交于 6月 07, 2013

Use atomic64_xxx() to replace open-coded zram_stat64_xxx().
Some architectures have native support of atomic64 operations,
so we can get rid of the spin_lock() in zram_stat64_xxx().
On the other hand, for platforms use generic version of atomic64
implement, it may cause an extra save/restore of the interrupt
flag.  So it's a tradeoff.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

da5cc7d3

zram: kill unused zram_get_num_devices() · 0f0e3ba3

由 Jiang Liu 提交于 6月 07, 2013

Now there's no caller of zram_get_num_devices(), so kill it.
And change zram_devices to static because it's only used in zram_drv.c.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0f0e3ba3

zram: use zram->lock to protect zram_free_page() in swap free notify path · 57ab0485

由 Jiang Liu 提交于 6月 07, 2013

zram_slot_free_notify() is free-running without any protection from
concurrent operations. So there are race conditions between
zram_bvec_read()/zram_bvec_write() and zram_slot_free_notify(),
and possible consequences include:
1) Trigger BUG_ON(!handle) on zram_bvec_write() side.
2) Access to freed pages on zram_bvec_read() side.
3) Break some fields (bad_compress, good_compress, pages_stored)
   in zram->stats if the swap layer makes concurrently call to
   zram_slot_free_notify().

So enhance zram_slot_free_notify() to acquire writer lock on zram->lock
before calling zram_free_page().
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

57ab0485

06 2月, 2013 1 次提交

zram: get rid of lockdep warning · 8b3cc3ed

由 Minchan Kim 提交于 2月 06, 2013

Lockdep complains about recursive deadlock of zram->init_lock.
[1] made it false positive because we can't request IO to zram
before setting disksize. Anyway, we should shut lockdep up to
avoid many reporting from user.

[1] : zram: force disksize setting before using zram
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8b3cc3ed

04 2月, 2013 1 次提交

zram: force disksize setting before using zram · 0231c403

由 Minchan Kim 提交于 1月 30, 2013

Now zram document syas "set disksize is optional"
but partly it's wrong. When you try to use zram firstly after
booting, you must set disksize, otherwise zram can't work because
zram gendisk's size is 0. But once you do it, you can use zram freely
after reset because reset doesn't reset to zero paradoxically.
So in this time, disksize setting is optional.:(
It's inconsitent for user behavior and not straightforward.

This patch forces always setting disksize firstly before using zram.
Yes. It changes current behavior so someone could complain when
he upgrades zram. Apparently it could be a problem if zram is mainline
but it still lives in staging so behavior could be changed for right
way to go. Let them excuse.
Acked-by: NJerome Marchand <jmarchand@redhat.com>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NDan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0231c403

23 10月, 2012 1 次提交

staging: zram: correct obsolete comment on max_zpage_size · 55dcbbb1

由 Minchan Kim 提交于 10月 10, 2012

Zram doesn't use xv_malloc any more so it doesn't have
limitation about zobj_header.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

55dcbbb1

12 6月, 2012 2 次提交

staging: zram: conventions, __aligned() attribute · 80677c25

由 Sam Hansen 提交于 6月 07, 2012

Using the __aligned() attribute in favor of __attribute__((aligned(size)))
Signed-off-by: NSam Hansen <solid.se7en@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

80677c25

staging: zram: remove special handle of uncompressed page · 130f315a

由 Minchan Kim 提交于 6月 08, 2012

xvmalloc can't handle PAGE_SIZE page so that zram have to
handle it specially but zsmalloc can do it so let's remove
unnecessary special handling code.

Quote from Nitin
"I think page vs handle distinction was added since xvmalloc could not
handle full page allocation. Now that zsmalloc allows full page
allocation, we can just use it for both cases. This would also allow
removing the ZRAM_UNCOMPRESSED flag. The only downside will be slightly
slower code path for full page allocation but this event is anyways
supposed to be rare, so should be fine."

1. This patch reduces code very much.

 drivers/staging/zram/zram_drv.c   |  104 +++++--------------------------------
 drivers/staging/zram/zram_drv.h   |   17 +-----
 drivers/staging/zram/zram_sysfs.c |    6 +--
 3 files changed, 15 insertions(+), 112 deletions(-)

2. change pages_expand with bad_compress so it can count
   bad compression(above 75%) ratio.

3. remove zobj_header which is for back-reference for defragmentation
   because firstly, it's not used at the moment and zsmalloc can't handle
   bigger size than PAGE_SIZE so zram can't do it any more without redesign.

Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

130f315a

11 6月, 2012 1 次提交

staging: zsmalloc: zsmalloc: use unsigned long instead of void * · c2344348

由 Minchan Kim 提交于 6月 08, 2012

We should use unsigned long as handle instead of void * to avoid any
confusion. Without this, users may just treat zs_malloc return value as
a pointer and try to deference it.

This patch passed compile test(zram, zcache and ramster) and zram is
tested on qemu.

changelog
  * from v2
	- remove hval pointed out by Nitin
	- based on next-20120607
  * from v1
	- change zcache's zv_create return value
	- baesd on next-20120604

Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c2344348

14 2月, 2012 1 次提交

staging: zram: Rename module parameter · 5fa5a901

由 Nitin Gupta 提交于 2月 12, 2012

zram accepts number of devices to be created
as a module parameter. This was renamed from
num_devices to zram_num_devices (without updating
the documentation!) since num_devices was declared
as a non-static global variable, polluting the global
namespace. Now, we declare it as a static variable
and revert back the name change.

The documentation (zram.txt) already mentions
num_devices as the module parameter name.
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5fa5a901

09 2月, 2012 1 次提交

staging: zram: replace xvmalloc with zsmalloc · fd1a30de

由 Nitin Gupta 提交于 1月 09, 2012

Replaces xvmalloc with zsmalloc as the compressed page allocator
for zram
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fd1a30de

10 9月, 2011 1 次提交

zram: Fix sparse warnings · 2ccbec05

由 Nitin Gupta 提交于 9月 09, 2011

Fixes sparse warning:
zram_drv.c:666:6: warning: symbol 'zram_slot_free_notify' was not
declared. Should it be static?

Also, max_zpage_size is now size_t just to be consistent with data-type
of other variables maintaining sizes of various kinds.
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

2ccbec05

07 9月, 2011 1 次提交

staging: zram: fix zram locking · 0900beae

由 Jerome Marchand 提交于 9月 06, 2011

Currently init_lock only prevents concurrent execution of zram_init_device()
and zram_reset_device() but not zram_make_request() nor sysfs store functions.

This patch changes init_lock into a rw_semaphore. A write lock is taken by
init, reset and store functions, a read lock is taken by zram_make_request().
Also, avoids to release the lock before calling __zram_reset_device() for
cleaning after a failed init, thus preventing any concurrent task to see an
inconsistent state of zram.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

0900beae

24 8月, 2011 2 次提交

staging: zram: make global var "num_devices" use unique name · efd54f43

由 Noah Watkins 提交于 7月 20, 2011

The global variable "num_devices" is too general to be
global. This patch switches the name to be "zram_num_devices".
Signed-off-by: NNoah Watkins <noahwatkins@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

efd54f43

staging: zram: make global var "devices" use unique name · 43801f6e

由 Noah Watkins 提交于 7月 20, 2011

The global variable "devices" is too general to be global.
This patch switches the name to be "zram_devices".
Signed-off-by: NNoah Watkins <noahwatkins@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

43801f6e

06 7月, 2011 2 次提交

Staging: zram: Replace mutex lock by a R/W semaphore · c5bde238

由 Jerome Marchand 提交于 6月 10, 2011

Currently, nothing protects zram table from concurrent access.
For instance, ZRAM_UNCOMPRESSED bit can be cleared by zram_free_page()
called from a concurrent write between the time ZRAM_UNCOMPRESSED has
been set and the time it is tested to unmap KM_USER0 in
zram_bvec_write(). This ultimately leads to kernel panic.

Also, a read request can occurs when the page has been freed by a
running write request and before it has been updated, leading to
zero filled block being incorrectly read and "Read before write"
error message.

This patch replace the current mutex by a rw_semaphore. It extends
the protection to zram table (currently, only compression buffers are
protected) and read requests (currently, only write requests are
protected).
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

c5bde238

Staging: zram: allow partial page operations · 924bd88d

由 Jerome Marchand 提交于 6月 10, 2011

Commit 7b19b8d4 (zram: Prevent overflow
in logical block size) introduced ZRAM_LOGICAL_BLOCK_SIZE constant to
prevent overflow of logical block size on 64k page kernel.
However, the current implementation of zram only allow operation on block
of the same size as a page. That makes theorically legit 4k requests fail
on 64k page kernel.

This patch makes zram allow operation on partial pages. Basically, it
means we still do operations on full pages internally, but only copy the
relevent segments from/to the user memory.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

924bd88d

05 2月, 2011 1 次提交

zram: Prevent overflow in logical block size · 7b19b8d4

由 Robert Jennings 提交于 1月 28, 2011

On a 64K page kernel, the value PAGE_SIZE passed to
blk_queue_logical_block_size would overflow the logical block size
argument (resulting in setting it to 0).

This patch sets the logical block size to 4096, using a new
ZRAM_LOGICAL_BLOCK_SIZE constant.
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

7b19b8d4

01 9月, 2010 2 次提交

Staging: zram: Remove need for explicit device initialization · 484875ad

由 Nitin Gupta 提交于 8月 09, 2010

Currently, the user has to explicitly write a positive value to
initstate sysfs node before the device can be used. This event
triggers allocation of per-device metadata like memory pool,
table array and so on.

We do not pre-initialize all zram devices since the 'table' array,
mapping disk blocks to compressed chunks, takes considerable amount
of memory (8 bytes per page). So, pre-initializing all devices will
be quite wasteful if only few or none of the devices are actually
used.

This explicit device initialization from user is an odd requirement and
can be easily avoided. We now initialize the device when first write is
done to the device.
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

484875ad

Staging: zram: Replace ioctls with sysfs interface · 33863c21

由 Nitin Gupta 提交于 8月 09, 2010

Creates per-device sysfs nodes in /sys/block/zram<id>/
Currently following stats are exported:
 - disksize
 - num_reads
 - num_writes
 - invalid_io
 - zero_pages
 - orig_data_size
 - compr_data_size
 - mem_used_total

By default, disksize is set to 0. So, to start using
a zram device, fist write a disksize value and then
initialize device by writing any positive value to
initstate. For example:

        # initialize /dev/zram0 with 50MB disksize
        echo 50*1024*1024 | bc > /sys/block/zram0/disksize
        echo 1 > /sys/block/zram0/initstate

When done using a disk, issue reset to free its memory
by writing any positive value to reset node:

        echo 1 > /sys/block/zram0/reset

This change also obviates the need for 'rzscontrol' utility.
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

33863c21

19 6月, 2010 2 次提交

Staging: zram: Rename ramzswap to zram in code · f1e3cfff

由 Nitin Gupta 提交于 6月 01, 2010

Automated renames in code:
 - rzs* -> zram*
 - RZS* -> ZRAM*
 - ramzswap* -> zram*

Manual changes:
 - Edited comments/messages mentioning "swap"
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

f1e3cfff

Staging: Rename ramzswap files to zram · 16a4bfb9

由 Nitin Gupta 提交于 6月 01, 2010

Related changes:
 - Modify revelant Kconfig and Makefile accordingly.
 - Change include filenames in code.
 - Remove dependency on CONFIG_SWAP in Kconfig as zram usage
is no longer limited to swap disks.
Signed-off-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

16a4bfb9

OpenHarmony / kernel_linux 上一次同步 接近 4 年

OpenHarmony / kernel_linux
上一次同步接近 4 年