1. 07 8月, 2014 1 次提交
  2. 08 4月, 2014 7 次提交
    • S
      zram: make compression algorithm selection possible · e46b8a03
      Sergey Senozhatsky 提交于
      Add and document `comp_algorithm' device attribute.  This attribute allows
      to show supported compression and currently selected compression
      algorithms:
      
      	cat /sys/block/zram0/comp_algorithm
      	[lzo] lz4
      
      and change selected compression algorithm:
      	echo lzo > /sys/block/zram0/comp_algorithm
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e46b8a03
    • S
      zram: add multi stream functionality · beca3ec7
      Sergey Senozhatsky 提交于
      Existing zram (zcomp) implementation has only one compression stream
      (buffer and algorithm private part), so in order to prevent data
      corruption only one write (compress operation) can use this compression
      stream, forcing all concurrent write operations to wait for stream lock
      to be released.  This patch changes zcomp to keep a compression streams
      list of user-defined size (via sysfs device attr).  Each write operation
      still exclusively holds compression stream, the difference is that we
      can have N write operations (depending on size of streams list)
      executing in parallel.  See TEST section later in commit message for
      performance data.
      
      Introduce struct zcomp_strm_multi and a set of functions to manage
      zcomp_strm stream access.  zcomp_strm_multi has a list of idle
      zcomp_strm structs, spinlock to protect idle list and wait queue, making
      it possible to perform parallel compressions.
      
      The following set of functions added:
      - zcomp_strm_multi_find()/zcomp_strm_multi_release()
        find and release a compression stream, implement required locking
      - zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
        create and destroy zcomp_strm_multi
      
      zcomp ->strm_find() and ->strm_release() callbacks are set during
      initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
      correspondingly.
      
      Each time zcomp issues a zcomp_strm_multi_find() call, the following set
      of operations performed:
      
      - spin lock strm_lock
      - if idle list is not empty, remove zcomp_strm from idle list, spin
        unlock and return zcomp stream pointer to caller
      - if idle list is empty, current adds itself to wait queue. it will be
        awaken by zcomp_strm_multi_release() caller.
      
      zcomp_strm_multi_release():
      - spin lock strm_lock
      - add zcomp stream to idle list
      - spin unlock, wake up sleeper
      
      Minchan Kim reported that spinlock-based locking scheme has demonstrated
      a severe perfomance regression for single compression stream case,
      comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
      
      base                      spinlock                    mutex
      
      ==Initial write           ==Initial write             ==Initial  write
      records:  5               records:  5                 records:   5
      avg:      1642424.35      avg:      699610.40         avg:       1655583.71
      std:      39890.95(2.43%) std:      232014.19(33.16%) std:       52293.96
      max:      1690170.94      max:      1163473.45        max:       1697164.75
      min:      1568669.52      min:      573429.88         min:       1553410.23
      ==Rewrite                 ==Rewrite                   ==Rewrite
      records:  5               records:  5                 records:   5
      avg:      1611775.39      avg:      501406.64         avg:       1684419.11
      std:      17144.58(1.06%) std:      15354.41(3.06%)   std:       18367.42
      max:      1641800.95      max:      531356.78         max:       1706445.84
      min:      1593515.27      min:      488817.78         min:       1655335.73
      
      When only one compression stream available, mutex with spin on owner
      tends to perform much better than frequent wait_event()/wake_up().  This
      is why single stream implemented as a special case with mutex locking.
      
      Introduce and document zram device attribute max_comp_streams.  This
      attr shows and stores current zcomp's max number of zcomp streams
      (max_strm).  Extend zcomp's zcomp_create() with `max_strm' parameter.
      `max_strm' limits the number of zcomp_strm structs in compression
      backend's idle list (max_comp_streams).
      
      max_comp_streams used during initialisation as follows:
      -- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
      using single compression stream zcomp_strm_single (mutex-based locking).
      -- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
      using multi compression stream zcomp_strm_multi (spinlock-based locking).
      
      default max_comp_streams value is 1, meaning that zram with single stream
      will be initialised.
      
      Later patch will introduce configuration knob to change max_comp_streams
      on already initialised and used zcomp.
      
      TEST
      iozone -t 3 -R -r 16K -s 60M -I +Z
      
             test           base       1 strm (mutex)     3 strm (spinlock)
      -----------------------------------------------------------------------
       Initial write      589286.78       583518.39          718011.05
             Rewrite      604837.97       596776.38         1515125.72
        Random write      584120.11       595714.58         1388850.25
              Pwrite      535731.17       541117.38          739295.27
              Fwrite     1418083.88      1478612.72         1484927.06
      
      Usage example:
      set max_comp_streams to 4
              echo 4 > /sys/block/zram0/max_comp_streams
      
      show current max_comp_streams (default value is 1).
              cat /sys/block/zram0/max_comp_streams
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      beca3ec7
    • S
      zram: use zcomp compressing backends · b7ca232e
      Sergey Senozhatsky 提交于
      Do not perform direct LZO compress/decompress calls, initialise
      and use zcomp LZO backend (single compression stream) instead.
      
      [akpm@linux-foundation.org: resolve conflicts with zram-delete-zram_init_device-fix.patch]
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7ca232e
    • S
      zram: drop not used table `count' member · 59fc86a4
      Sergey Senozhatsky 提交于
      struct table `count' member is not used.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59fc86a4
    • S
      zram: use atomic64_t for all zram stats · 90a7806e
      Sergey Senozhatsky 提交于
      This is a preparation patch for stats code duplication removal.
      
      1) use atomic64_t for `pages_zero' and `pages_stored' zram stats.
      
      2) `compr_size' and `pages_zero' struct zram_stats members did not
         follow the existing device attr naming scheme: zram_stats.ATTR has
         ATTR_show() function.  rename them:
      
         -- compr_size -> compr_data_size
         -- pages_zero -> zero_pages
      
      Minchan Kim's note:
       If we really have trouble with atomic stat operation, we could
       change it with percpu_counter so that it could solve atomic overhead and
       unnecessary memory space by introducing unsigned long instead of 64bit
       atomic_t.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90a7806e
    • S
      zram: remove good and bad compress stats · b7cccf8b
      Sergey Senozhatsky 提交于
      Remove `good' and `bad' compressed sub-requests stats.  RW request may
      cause a number of RW sub-requests.  zram used to account `good' compressed
      sub-queries (with compressed size less than 50% of original size), `bad'
      compressed sub-queries (with compressed size greater that 75% of original
      size), leaving sub-requests with compression size between 50% and 75% of
      original size not accounted and not reported.  zram already accounts each
      sub-request's compression size so we can calculate real device compression
      ratio.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7cccf8b
    • S
      zram: drop `init_done' struct zram member · be2d1d56
      Sergey Senozhatsky 提交于
      Introduce init_done() helper function which allows us to drop `init_done'
      struct zram member.  init_done() uses the fact that ->init_done == 1
      equals to ->meta != NULL.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be2d1d56
  3. 31 1月, 2014 8 次提交
    • M
      zram: remove zram->lock in read path and change it with mutex · e46e3315
      Minchan Kim 提交于
      Finally, we separated zram->lock dependency from 32bit stat/ table
      handling so there is no reason to use rw_semaphore between read and
      write path so this patch removes the lock from read path totally and
      changes rw_semaphore with mutex.  So, we could do
      
      old:
      
        read-read: OK
        read-write: NO
        write-write: NO
      
      Now:
      
        read-read: OK
        read-write: OK
        write-write: NO
      
      The below data proves mixed workload performs well 11 times and there is
      also enhance on write-write path because current rw-semaphore doesn't
      support SPIN_ON_OWNER.  It's side effect but anyway good thing for us.
      
      Write-related tests perform better (from 61% to 1058%) but read path has
      good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.
      
        CPU 12
        iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
      
        ==Initial write                ==Initial write
        records: 10                    records: 10
        avg:  516189.16                avg:  839907.96
        std:   22486.53 (4.36%)        std:   47902.17 (5.70%)
        max:  546970.60                max:  909910.35
        min:  481131.54                min:  751148.38
        ==Rewrite                      ==Rewrite
        records: 10                    records: 10
        avg:  509527.98                avg: 1050156.37
        std:   45799.94 (8.99%)        std:   40695.44 (3.88%)
        max:  611574.27                max: 1111929.26
        min:  443679.95                min:  980409.62
        ==Read                         ==Read
        records: 10                    records: 10
        avg: 4408624.17                avg: 4472546.76
        std:  281152.61 (6.38%)        std:  163662.78 (3.66%)
        max: 4867888.66                max: 4727351.03
        min: 4058347.69                min: 4126520.88
        ==Re-read                      ==Re-read
        records: 10                    records: 10
        avg: 4462147.53                avg: 4363257.75
        std:  283546.11 (6.35%)        std:  247292.63 (5.67%)
        max: 4912894.44                max: 4677241.75
        min: 4131386.50                min: 4035235.84
        ==Reverse Read                 ==Reverse Read
        records: 10                    records: 10
        avg: 4565865.97                avg: 4485818.08
        std:  313395.63 (6.86%)        std:  248470.10 (5.54%)
        max: 5232749.16                max: 4789749.94
        min: 4185809.62                min: 3963081.34
        ==Stride read                  ==Stride read
        records: 10                    records: 10
        avg: 4515981.80                avg: 4418806.01
        std:  211192.32 (4.68%)        std:  212837.97 (4.82%)
        max: 4889287.28                max: 4686967.22
        min: 4210362.00                min: 4083041.84
        ==Random read                  ==Random read
        records: 10                    records: 10
        avg: 4410525.23                avg: 4387093.18
        std:  236693.22 (5.37%)        std:  235285.23 (5.36%)
        max: 4713698.47                max: 4669760.62
        min: 4057163.62                min: 3952002.16
        ==Mixed workload               ==Mixed workload
        records: 10                    records: 10
        avg:  243234.25                avg: 2818677.27
        std:   28505.07 (11.72%)       std:  195569.70 (6.94%)
        max:  288905.23                max: 3126478.11
        min:  212473.16                min: 2484150.69
        ==Random write                 ==Random write
        records: 10                    records: 10
        avg:  555887.07                avg: 1053057.79
        std:   70841.98 (12.74%)       std:   35195.36 (3.34%)
        max:  683188.28                max: 1096125.73
        min:  437299.57                min:  992481.93
        ==Pwrite                       ==Pwrite
        records: 10                    records: 10
        avg:  501745.93                avg:  810363.09
        std:   16373.54 (3.26%)        std:   19245.01 (2.37%)
        max:  518724.52                max:  833359.70
        min:  464208.73                min:  765501.87
        ==Pread                        ==Pread
        records: 10                    records: 10
        avg: 4539894.60                avg: 4457680.58
        std:  197094.66 (4.34%)        std:  188965.60 (4.24%)
        max: 4877170.38                max: 4689905.53
        min: 4226326.03                min: 4095739.72
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e46e3315
    • M
      zram: remove workqueue for freeing removed pending slot · f614a9f4
      Minchan Kim 提交于
      Commit a0c516cb ("zram: don't grab mutex in zram_slot_free_noity")
      introduced free request pending code to avoid scheduling by mutex under
      spinlock and it was a mess which made code lenghty and increased
      overhead.
      
      Now, we don't need zram->lock any more to free slot so this patch
      reverts it and then, tb_lock should protect it.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f614a9f4
    • M
      zram: introduce zram->tb_lock · 92967471
      Minchan Kim 提交于
      Currently, the zram table is protected by zram->lock but it's rather
      coarse-grained lock and it makes hard for scalibility.
      
      Let's use own rwlock instead of depending on zram->lock.  This patch
      adds new locking so obviously, it would make slow but this patch is just
      prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
      which is hurdle about zram scalability.
      
      Final patch in this patchset series will remove the lock from read-path
      and change rw_semaphore with mutex in write path.  With bonus, we could
      drop pending slot free mess in next patch.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92967471
    • M
      zram: use atomic operation for stat · deb0bdeb
      Minchan Kim 提交于
      Some of fields in zram->stats are protected by zram->lock which is
      rather coarse-grained so let's use atomic operation without explict
      locking.
      
      This patch is ready for removing dependency of zram->lock in read path
      which is very coarse-grained rw_semaphore.  Of course, this patch adds
      new atomic operation so it might make slow but my 12CPU test couldn't
      spot any regression.  All gain/lose is marginal within stddev.
      
        iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
      
        ==Initial write                ==Initial write
        records: 50                    records: 50
        avg:  412875.17                avg:  415638.23
        std:   38543.12 (9.34%)        std:   36601.11 (8.81%)
        max:  521262.03                max:  502976.72
        min:  343263.13                min:  351389.12
        ==Rewrite                      ==Rewrite
        records: 50                    records: 50
        avg:  416640.34                avg:  397914.33
        std:   60798.92 (14.59%)       std:   46150.42 (11.60%)
        max:  543057.07                max:  522669.17
        min:  304071.67                min:  316588.77
        ==Read                         ==Read
        records: 50                    records: 50
        avg: 4147338.63                avg: 4070736.51
        std:  179333.25 (4.32%)        std:  223499.89 (5.49%)
        max: 4459295.28                max: 4539514.44
        min: 3753057.53                min: 3444686.31
        ==Re-read                      ==Re-read
        records: 50                    records: 50
        avg: 4096706.71                avg: 4117218.57
        std:  229735.04 (5.61%)        std:  171676.25 (4.17%)
        max: 4430012.09                max: 4459263.94
        min: 2987217.80                min: 3666904.28
        ==Reverse Read                 ==Reverse Read
        records: 50                    records: 50
        avg: 4062763.83                avg: 4078508.32
        std:  186208.46 (4.58%)        std:  172684.34 (4.23%)
        max: 4401358.78                max: 4424757.22
        min: 3381625.00                min: 3679359.94
        ==Stride read                  ==Stride read
        records: 50                    records: 50
        avg: 4094933.49                avg: 4082170.22
        std:  185710.52 (4.54%)        std:  196346.68 (4.81%)
        max: 4478241.25                max: 4460060.97
        min: 3732593.23                min: 3584125.78
        ==Random read                  ==Random read
        records: 50                    records: 50
        avg: 4031070.04                avg: 4074847.49
        std:  192065.51 (4.76%)        std:  206911.33 (5.08%)
        max: 4356931.16                max: 4399442.56
        min: 3481619.62                min: 3548372.44
        ==Mixed workload               ==Mixed workload
        records: 50                    records: 50
        avg:  149925.73                avg:  149675.54
        std:    7701.26 (5.14%)        std:    6902.09 (4.61%)
        max:  191301.56                max:  175162.05
        min:  133566.28                min:  137762.87
        ==Random write                 ==Random write
        records: 50                    records: 50
        avg:  404050.11                avg:  393021.47
        std:   58887.57 (14.57%)       std:   42813.70 (10.89%)
        max:  601798.09                max:  524533.43
        min:  325176.99                min:  313255.34
        ==Pwrite                       ==Pwrite
        records: 50                    records: 50
        avg:  411217.70                avg:  411237.96
        std:   43114.99 (10.48%)       std:   33136.29 (8.06%)
        max:  530766.79                max:  471899.76
        min:  320786.84                min:  317906.94
        ==Pread                        ==Pread
        records: 50                    records: 50
        avg: 4154908.65                avg: 4087121.92
        std:  151272.08 (3.64%)        std:  219505.04 (5.37%)
        max: 4459478.12                max: 4435857.38
        min: 3730512.41                min: 3101101.67
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      deb0bdeb
    • M
      zram: add copyright · 7bfb3de8
      Minchan Kim 提交于
      Add my copyright to the zram source code which I maintain.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bfb3de8
    • M
      zram: remove old private project comment · 49061236
      Minchan Kim 提交于
      Remove the old private compcache project address so upcoming patches
      should be sent to LKML because we Linux kernel community will take care.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49061236
    • M
      zram: promote zram from staging · cd67e10a
      Minchan Kim 提交于
      Zram has lived in staging for a LONG LONG time and have been
      fixed/improved by many contributors so code is clean and stable now.  Of
      course, there are lots of product using zram in real practice.
      
      The major TV companys have used zram as swap since two years ago and
      recently our production team released android smart phone with zram
      which is used as swap, too and recently Android Kitkat start to use zram
      for small memory smart phone.  And there was a report Google released
      their ChromeOS with zram, too and cyanogenmod have been used zram long
      time ago.  And I heard some disto have used zram block device for tmpfs.
      In addition, I saw many report from many other peoples.  For example,
      Lubuntu start to use it.
      
      The benefit of zram is very clear.  With my experience, one of the
      benefit was to remove jitter of video application with backgroud memory
      pressure.  It would be effect of efficient memory usage by compression
      but more issue is whether swap is there or not in the system.  Recent
      mobile platforms have used JAVA so there are many anonymous pages.  But
      embedded system normally are reluctant to use eMMC or SDCard as swap
      because there is wear-leveling and latency issues so if we do not use
      swap, it means we can't reclaim anoymous pages and at last, we could
      encounter OOM kill.  :(
      
      Although we have real storage as swap, it was a problem, too.  Because
      it sometime ends up making system very unresponsible caused by slow swap
      storage performance.
      
      Quote from Luigi on Google
       "Since Chrome OS was mentioned: the main reason why we don't use swap
        to a disk (rotating or SSD) is because it doesn't degrade gracefully
        and leads to a bad interactive experience.  Generally we prefer to
        manage RAM at a higher level, by transparently killing and restarting
        processes.  But we noticed that zram is fast enough to be competitive
        with the latter, and it lets us make more efficient use of the
        available RAM.  " and he announced.
      http://www.spinics.net/lists/linux-mm/msg57717.html
      
      Other uses case is to use zram for block device.  Zram is block device
      so anyone can format the block device and mount on it so some guys on
      the internet start zram as /var/tmp.
      http://forums.gentoo.org/viewtopic-t-838198-start-0.html
      
      Let's promote zram and enhance/maintain it instead of removing.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd67e10a
    • M
      zsmalloc: move it under mm · bcf1647d
      Minchan Kim 提交于
      This patch moves zsmalloc under mm directory.
      
      Before that, description will explain why we have needed custom
      allocator.
      
      Zsmalloc is a new slab-based memory allocator for storing compressed
      pages.  It is designed for low fragmentation and high allocation success
      rate on large object, but <= PAGE_SIZE allocations.
      
      zsmalloc differs from the kernel slab allocator in two primary ways to
      achieve these design goals.
      
      zsmalloc never requires high order page allocations to back slabs, or
      "size classes" in zsmalloc terms.  Instead it allows multiple
      single-order pages to be stitched together into a "zspage" which backs
      the slab.  This allows for higher allocation success rate under memory
      pressure.
      
      Also, zsmalloc allows objects to span page boundaries within the zspage.
      This allows for lower fragmentation than could be had with the kernel
      slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
      kernel slab allocator, if a page compresses to 60% of it original size,
      the memory savings gained through compression is lost in fragmentation
      because another object of the same size can't be stored in the leftover
      space.
      
      This ability to span pages results in zsmalloc allocations not being
      directly addressable by the user.  The user is given an
      non-dereferencable handle in response to an allocation request.  That
      handle must be mapped, using zs_map_object(), which returns a pointer to
      the mapped region that can be used.  The mapping is necessary since the
      object data may reside in two different noncontigious pages.
      
      The zsmalloc fulfills the allocation needs for zram perfectly
      
      [sjenning@linux.vnet.ibm.com: borrow Seth's quote]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf1647d
  4. 13 8月, 2013 1 次提交
    • M
      zram: don't grab mutex in zram_slot_free_noity · a0c516cb
      Minchan Kim 提交于
      [1] introduced down_write in zram_slot_free_notify to prevent race
      between zram_slot_free_notify and zram_bvec_[read|write]. The race
      could happen if somebody who has right permission to open swap device
      is reading swap device while it is used by swap in parallel.
      
      However, zram_slot_free_notify is called with holding spin_lock of
      swap layer so we shouldn't avoid holing mutex. Otherwise, lockdep
      warns it.
      
      This patch adds new list to handle free slot and workqueue
      so zram_slot_free_notify just registers slot index to be freed and
      registers the request to workqueue. If workqueue is expired,
      it holds mutex_lock so there is no problem any more.
      
      If any I/O is issued, zram handles pending slot-free request
      caused by zram_slot_free_notify right before handling issued
      request because workqueue wouldn't be expired yet so zram I/O
      request handling function can miss it.
      
      Lastly, when zram is reset, flush_work could handle all of pending
      free request so we shouldn't have memory leak.
      
      NOTE: If zram_slot_free_notify's kmalloc with GFP_ATOMIC would be
      failed, the slot will be freed when next write I/O write the slot.
      
      [1] [57ab0485, zram: use zram->lock to protect zram_free_page()
          in swap free notify path]
      
      * from v2
        * refactoring
      
      * from v1
        * totally redesign
      
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0c516cb
  5. 25 6月, 2013 1 次提交
  6. 07 6月, 2013 3 次提交
  7. 06 2月, 2013 1 次提交
  8. 04 2月, 2013 1 次提交
  9. 23 10月, 2012 1 次提交
  10. 12 6月, 2012 2 次提交
    • S
      staging: zram: conventions, __aligned() attribute · 80677c25
      Sam Hansen 提交于
      Using the __aligned() attribute in favor of __attribute__((aligned(size)))
      Signed-off-by: NSam Hansen <solid.se7en@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80677c25
    • M
      staging: zram: remove special handle of uncompressed page · 130f315a
      Minchan Kim 提交于
      xvmalloc can't handle PAGE_SIZE page so that zram have to
      handle it specially but zsmalloc can do it so let's remove
      unnecessary special handling code.
      
      Quote from Nitin
      "I think page vs handle distinction was added since xvmalloc could not
      handle full page allocation. Now that zsmalloc allows full page
      allocation, we can just use it for both cases. This would also allow
      removing the ZRAM_UNCOMPRESSED flag. The only downside will be slightly
      slower code path for full page allocation but this event is anyways
      supposed to be rare, so should be fine."
      
      1. This patch reduces code very much.
      
       drivers/staging/zram/zram_drv.c   |  104 +++++--------------------------------
       drivers/staging/zram/zram_drv.h   |   17 +-----
       drivers/staging/zram/zram_sysfs.c |    6 +--
       3 files changed, 15 insertions(+), 112 deletions(-)
      
      2. change pages_expand with bad_compress so it can count
         bad compression(above 75%) ratio.
      
      3. remove zobj_header which is for back-reference for defragmentation
         because firstly, it's not used at the moment and zsmalloc can't handle
         bigger size than PAGE_SIZE so zram can't do it any more without redesign.
      
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      130f315a
  11. 11 6月, 2012 1 次提交
  12. 14 2月, 2012 1 次提交
    • N
      staging: zram: Rename module parameter · 5fa5a901
      Nitin Gupta 提交于
      zram accepts number of devices to be created
      as a module parameter. This was renamed from
      num_devices to zram_num_devices (without updating
      the documentation!) since num_devices was declared
      as a non-static global variable, polluting the global
      namespace. Now, we declare it as a static variable
      and revert back the name change.
      
      The documentation (zram.txt) already mentions
      num_devices as the module parameter name.
      Signed-off-by: NNitin Gupta <ngupta@vflare.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fa5a901
  13. 09 2月, 2012 1 次提交
  14. 10 9月, 2011 1 次提交
  15. 07 9月, 2011 1 次提交
    • J
      staging: zram: fix zram locking · 0900beae
      Jerome Marchand 提交于
      Currently init_lock only prevents concurrent execution of zram_init_device()
      and zram_reset_device() but not zram_make_request() nor sysfs store functions.
      
      This patch changes init_lock into a rw_semaphore. A write lock is taken by
      init, reset and store functions, a read lock is taken by zram_make_request().
      Also, avoids to release the lock before calling __zram_reset_device() for
      cleaning after a failed init, thus preventing any concurrent task to see an
      inconsistent state of zram.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0900beae
  16. 24 8月, 2011 2 次提交
  17. 06 7月, 2011 2 次提交
    • J
      Staging: zram: Replace mutex lock by a R/W semaphore · c5bde238
      Jerome Marchand 提交于
      Currently, nothing protects zram table from concurrent access.
      For instance, ZRAM_UNCOMPRESSED bit can be cleared by zram_free_page()
      called from a concurrent write between the time ZRAM_UNCOMPRESSED has
      been set and the time it is tested to unmap KM_USER0 in
      zram_bvec_write(). This ultimately leads to kernel panic.
      
      Also, a read request can occurs when the page has been freed by a
      running write request and before it has been updated, leading to
      zero filled block being incorrectly read and "Read before write"
      error message.
      
      This patch replace the current mutex by a rw_semaphore. It extends
      the protection to zram table (currently, only compression buffers are
      protected) and read requests (currently, only write requests are
      protected).
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      c5bde238
    • J
      Staging: zram: allow partial page operations · 924bd88d
      Jerome Marchand 提交于
      Commit 7b19b8d4 (zram: Prevent overflow
      in logical block size) introduced ZRAM_LOGICAL_BLOCK_SIZE constant to
      prevent overflow of logical block size on 64k page kernel.
      However, the current implementation of zram only allow operation on block
      of the same size as a page. That makes theorically legit 4k requests fail
      on 64k page kernel.
      
      This patch makes zram allow operation on partial pages. Basically, it
      means we still do operations on full pages internally, but only copy the
      relevent segments from/to the user memory.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      924bd88d
  18. 05 2月, 2011 1 次提交
  19. 01 9月, 2010 2 次提交
    • N
      Staging: zram: Remove need for explicit device initialization · 484875ad
      Nitin Gupta 提交于
      Currently, the user has to explicitly write a positive value to
      initstate sysfs node before the device can be used. This event
      triggers allocation of per-device metadata like memory pool,
      table array and so on.
      
      We do not pre-initialize all zram devices since the 'table' array,
      mapping disk blocks to compressed chunks, takes considerable amount
      of memory (8 bytes per page). So, pre-initializing all devices will
      be quite wasteful if only few or none of the devices are actually
      used.
      
      This explicit device initialization from user is an odd requirement and
      can be easily avoided. We now initialize the device when first write is
      done to the device.
      Signed-off-by: NNitin Gupta <ngupta@vflare.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      484875ad
    • N
      Staging: zram: Replace ioctls with sysfs interface · 33863c21
      Nitin Gupta 提交于
      Creates per-device sysfs nodes in /sys/block/zram<id>/
      Currently following stats are exported:
       - disksize
       - num_reads
       - num_writes
       - invalid_io
       - zero_pages
       - orig_data_size
       - compr_data_size
       - mem_used_total
      
      By default, disksize is set to 0. So, to start using
      a zram device, fist write a disksize value and then
      initialize device by writing any positive value to
      initstate. For example:
      
              # initialize /dev/zram0 with 50MB disksize
              echo 50*1024*1024 | bc > /sys/block/zram0/disksize
              echo 1 > /sys/block/zram0/initstate
      
      When done using a disk, issue reset to free its memory
      by writing any positive value to reset node:
      
              echo 1 > /sys/block/zram0/reset
      
      This change also obviates the need for 'rzscontrol' utility.
      Signed-off-by: NNitin Gupta <ngupta@vflare.org>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      33863c21
  20. 19 6月, 2010 2 次提交