1. 01 2月, 2020 2 次提交
  2. 19 10月, 2019 1 次提交
  3. 15 7月, 2019 2 次提交
  4. 31 5月, 2019 1 次提交
  5. 21 5月, 2019 1 次提交
  6. 27 4月, 2019 1 次提交
  7. 30 3月, 2019 1 次提交
    • M
      drivers/block/zram/zram_drv.c: fix idle/writeback string compare · 0bc9f5d1
      Minchan Kim 提交于
      Makoto report a below KASAN error: zram does out-of-bounds read.  Because
      strscpy copies from source up to count bytes unconditionally.  It could
      cause out-of-bounds read on next object in slab.
      
      To prevent it, use strlcpy which checks source's length automatically.
      
         BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154
         Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314
         ..
         Call trace:
           strscpy+0x68/0x154
           idle_store+0xc4/0x34c
           dev_attr_store+0x50/0x6c
           sysfs_kf_write+0x98/0xb4
           kernfs_fop_write+0x198/0x260
           __vfs_write+0x10c/0x338
           vfs_write+0x114/0x238
           SyS_write+0xc8/0x168
           __sys_trace_return+0x0/0x4
      
         Allocated by task 1314:
          __kmalloc+0x280/0x318
          kernfs_fop_write+0xac/0x260
          __vfs_write+0x10c/0x338
          vfs_write+0x114/0x238
          SyS_write+0xc8/0x168
          __sys_trace_return+0x0/0x4
      
         Freed by task 2855:
          kfree+0x138/0x630
          kernfs_put_open_node+0x10c/0x124
          kernfs_fop_release+0xd8/0x114
          __fput+0x130/0x2a4
          ____fput+0x1c/0x28
          task_work_run+0x16c/0x1c8
          do_notify_resume+0x2bc/0x107c
          work_pending+0x8/0x10
      
         The buggy address belongs to the object at ffffffc0c3495a00
          which belongs to the cache kmalloc-128 of size 128
         The buggy address is located 0 bytes inside of
          128-byte region [ffffffc0c3495a00, ffffffc0c3495a80)
         The buggy address belongs to the page:
         page:ffffffbf030d2500 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
         flags: 0x4000000000010200(slab|head)
         page dumped because: kasan: bad access detected
      
         Memory state around the buggy address:
          ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         >ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                            ^
          ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Link: http://lkml.kernel.org/r/20190319231911.145968-1-minchan@kernel.org
      Cc: <stable@vger.kernel.org>	[5.0]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: NMakoto Wu <makotowu@google.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bc9f5d1
  8. 15 3月, 2019 1 次提交
  9. 08 3月, 2019 1 次提交
  10. 09 1月, 2019 1 次提交
    • M
      zram: idle writeback fixes and cleanup · 1d69a3f8
      Minchan Kim 提交于
      This patch includes some fixes and cleanup for idle-page writeback.
      
      1. writeback_limit interface
      
      Now writeback_limit interface is rather conusing.  For example, once
      writeback limit budget is exausted, admin can see 0 from
      /sys/block/zramX/writeback_limit which is same semantic with disable
      writeback_limit at this moment.  IOW, admin cannot tell that zero came
      from disable writeback limit or exausted writeback limit.
      
      To make the interface clear, let's sepatate enable of writeback limit to
      another knob - /sys/block/zram0/writeback_limit_enable
      
      * before:
        while true :
          # to re-enable writeback limit once previous one is used up
          echo 0 > /sys/block/zram0/writeback_limit
          echo $((200<<20)) > /sys/block/zram0/writeback_limit
          ..
          .. # used up the writeback limit budget
      
      * new
        # To enable writeback limit, from the beginning, admin should
        # enable it.
        echo $((200<<20)) > /sys/block/zram0/writeback_limit
        echo 1 > /sys/block/zram/0/writeback_limit_enable
        while true :
          echo $((200<<20)) > /sys/block/zram0/writeback_limit
          ..
          .. # used up the writeback limit budget
      
      It's much strightforward.
      
      2. fix condition check idle/huge writeback mode check
      
      The mode in writeback_store is not bit opeartion any more so no need to
      use bit operations.  Furthermore, current condition check is broken in
      that it does writeback every pages regardless of huge/idle.
      
      3. clean up idle_store
      
      No need to use goto.
      
      [minchan@kernel.org: missed spin_lock_init]
        Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com
      Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Suggested-by: NJohn Dias <joaodias@google.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Srinivas Paladugu <srnvs@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d69a3f8
  11. 29 12月, 2018 7 次提交
    • M
      zram: writeback throttle · bb416d18
      Minchan Kim 提交于
      If there are lots of write IO with flash device, it could have a
      wearout problem of storage. To overcome the problem, admin needs
      to design write limitation to guarantee flash health
      for entire product life.
      
      This patch creates a new knob "writeback_limit" for zram.
      
      writeback_limit's default value is 0 so that it doesn't limit
      any writeback. If admin want to measure writeback count in a
      certain period, he could know it via /sys/block/zram0/bd_stat's
      3rd column.
      
      If admin want to limit writeback as per-day 400M, he could do it
      like below.
      
      	MB_SHIFT=20
      	4K_SHIFT=12
      	echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
      		/sys/block/zram0/writeback_limit.
      
      If admin want to allow further write again, he could do it like below
      
      	echo 0 > /sys/block/zram0/writeback_limit
      
      If admin want to see remaining writeback budget,
      
      	cat /sys/block/zram0/writeback_limit
      
      The writeback_limit count will reset whenever you reset zram (e.g., system
      reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of writeback
      happened until you reset the zram to allocate extra writeback budget in
      next setting is user's job.
      
      [minchan@kernel.org: v4]
        Link: http://lkml.kernel.org/r/20181203024045.153534-8-minchan@kernel.org
      Link: http://lkml.kernel.org/r/20181127055429.251614-8-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb416d18
    • M
      zram: add bd_stat statistics · 23eddf39
      Minchan Kim 提交于
      bd_stat represents things that happened in the backing device.  Currently
      it supports bd_counts, bd_reads and bd_writes which are helpful to
      understand wearout of flash and memory saving.
      
      [minchan@kernel.org: v4]
        Link: http://lkml.kernel.org/r/20181203024045.153534-7-minchan@kernel.org
      Link: http://lkml.kernel.org/r/20181127055429.251614-7-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23eddf39
    • M
      zram: support idle/huge page writeback · a939888e
      Minchan Kim 提交于
      Add a new feature "zram idle/huge page writeback".  In the zram-swap use
      case, zram usually has many idle/huge swap pages.  It's pointless to keep
      them in memory (ie, zram).
      
      To solve this problem, this feature introduces idle/huge page writeback to
      the backing device so the goal is to save more memory space on embedded
      systems.
      
      Normal sequence to use idle/huge page writeback feature is as follows,
      
      while (1) {
              # mark allocated zram slot to idle
              echo all > /sys/block/zram0/idle
              # leave system working for several hours
              # Unless there is no access for some blocks on zram,
      	# they are still IDLE marked pages.
      
              echo "idle" > /sys/block/zram0/writeback
      	or/and
      	echo "huge" > /sys/block/zram0/writeback
              # write the IDLE or/and huge marked slot into backing device
      	# and free the memory.
      }
      
      Per the discussion at
      https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
      
      This patch removes direct incommpressibe page writeback feature
      (d2afd25114f4 ("zram: write incompressible pages to backing device")).
      
      Below concerns from Sergey:
      == &< ==
      
      "IDLE writeback" is superior to "incompressible writeback".
      
      "incompressible writeback" is completely unpredictable and uncontrollable;
      it depens on data patterns and compression algorithms.  While "IDLE
      writeback" is predictable.
      
      I even suspect, that, *ideally*, we can remove "incompressible writeback".
      "IDLE pages" is a super set which also includes "incompressible" pages.
      So, technically, we still can do "incompressible writeback" from "IDLE
      writeback" path; but a much more reasonable one, based on a page idling
      period.
      
      I understand that you want to keep "direct incompressible writeback"
      around.  ZRAM is especially popular on devices which do suffer from flash
      wearout, so I can see "incompressible writeback" path becoming a dead
      code, long term.
      
      == &< ==
      
      Below concerns from Minchan:
      == &< ==
      
      My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
      both hugepage/idlepage writeck will turn on.  However someuser want to
      enable only idlepage writeback so we need to introduce turn on/off knob
      for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase.  I
      don't want to make it complicated *if possible*.
      
      Long term, I imagine we need to make VM aware of new swap hierarchy a
      little bit different with as-is.  For example, first high priority swap
      can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
      swap device.  With that, hugepage writeback will work tranparently.
      
      So we could regard it as regression because incompressible pages doesn't
      go to backing storage automatically.  Instead, user should do it via "echo
      huge" > /sys/block/zram/writeback" manually.
      
      == &< ==
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a939888e
    • M
      zram: introduce ZRAM_IDLE flag · e82592c4
      Minchan Kim 提交于
      To support idle page writeback with upcoming patches, this patch
      introduces a new ZRAM_IDLE flag.
      
      Userspace can mark zram slots as "idle" via
      	"echo all > /sys/block/zramX/idle"
      which marks every allocated zram slot as ZRAM_IDLE.
      User could see it by /sys/kernel/debug/zram/zram0/block_state.
      
                300    75.033841 ...i
                301    63.806904 s..i
                302    63.806919 ..hi
      
      Once there is IO for the slot, the mark will be disappeared.
      
      	  300    75.033841 ...
                301    63.806904 s..i
                302    63.806919 ..hi
      
      Therefore, 300th block is idle zpage. With this feature,
      user can how many zram has idle pages which are waste of memory.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-5-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e82592c4
    • M
      zram: refactor flags and writeback stuff · 7e529283
      Minchan Kim 提交于
      Rename some variables and restructure some code for better readability in
      writeback and zs_free_page.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-4-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e529283
    • M
      zram: fix double free backing device · 5547932d
      Minchan Kim 提交于
      If blkdev_get fails, we shouldn't do blkdev_put.  Otherwise, kernel emits
      below log.  This patch fixes it.
      
        WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120
        Modules linked in:
        CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        RIP: 0010:blkdev_put+0x105/0x120
        Call Trace:
          __x64_sys_swapoff+0x46d/0x490
          do_syscall_64+0x5a/0x190
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
        irq event stamp: 4466
        hardirqs last  enabled at (4465):  __free_pages_ok+0x1e3/0x490
        hardirqs last disabled at (4466):  trace_hardirqs_off_thunk+0x1a/0x1c
        softirqs last  enabled at (3420):  __do_softirq+0x333/0x446
        softirqs last disabled at (3407):  irq_exit+0xd1/0xe0
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5547932d
    • M
      zram: fix lockdep warning of free block handling · 3c9959e0
      Minchan Kim 提交于
      Patch series "zram idle page writeback", v3.
      
      Inherently, swap device has many idle pages which are rare touched since
      it was allocated.  It is never problem if we use storage device as swap.
      However, it's just waste for zram-swap.
      
      This patchset supports zram idle page writeback feature.
      
      * Admin can define what is idle page "no access since X time ago"
      * Admin can define when zram should writeback them
      * Admin can define when zram should stop writeback to prevent wearout
      
      Details are in each patch's description.
      
      This patch (of 7):
      
        ================================
        WARNING: inconsistent lock state
        4.19.0+ #390 Not tainted
        --------------------------------
        inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
        00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
        {SOFTIRQ-ON-W} state was registered at:
          _raw_spin_lock+0x2c/0x40
          zram_make_request+0x755/0xdc9
          generic_make_request+0x373/0x6a0
          submit_bio+0x6c/0x140
          __swap_writepage+0x3a8/0x480
          shrink_page_list+0x1102/0x1a60
          shrink_inactive_list+0x21b/0x3f0
          shrink_node_memcg.constprop.99+0x4f8/0x7e0
          shrink_node+0x7d/0x2f0
          do_try_to_free_pages+0xe0/0x300
          try_to_free_pages+0x116/0x2b0
          __alloc_pages_slowpath+0x3f4/0xf80
          __alloc_pages_nodemask+0x2a2/0x2f0
          __handle_mm_fault+0x42e/0xb50
          handle_mm_fault+0x55/0xb0
          __do_page_fault+0x235/0x4b0
          page_fault+0x1e/0x30
        irq event stamp: 228412
        hardirqs last  enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
        hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
        softirqs last  enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
        softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&(&zram->bitmap_lock)->rlock);
          <Interrupt>
            lock(&(&zram->bitmap_lock)->rlock);
      
         *** DEADLOCK ***
      
        no locks held by zram_verify/2095.
      
        stack backtrace:
        CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         <IRQ>
         dump_stack+0x67/0x9b
         print_usage_bug+0x1bd/0x1d3
         mark_lock+0x4aa/0x540
         __lock_acquire+0x51d/0x1300
         lock_acquire+0x90/0x180
         _raw_spin_lock+0x2c/0x40
         put_entry_bdev+0x1e/0x50
         zram_free_page+0xf6/0x110
         zram_slot_free_notify+0x42/0xa0
         end_swap_bio_read+0x5b/0x170
         blk_update_request+0x8f/0x340
         scsi_end_request+0x2c/0x1e0
         scsi_io_completion+0x98/0x650
         blk_done_softirq+0x9e/0xd0
         __do_softirq+0xcc/0x427
         irq_exit+0xd1/0xe0
         do_IRQ+0x93/0x120
         common_interrupt+0xf/0xf
         </IRQ>
      
      With writeback feature, zram_slot_free_notify could be called in softirq
      context by end_swap_bio_read.  However, bitmap_lock is not aware of that
      so lockdep yell out:
      
        get_entry_bdev
        spin_lock(bitmap->lock);
        irq
        softirq
        end_swap_bio_read
        zram_slot_free_notify
        zram_slot_lock <-- deadlock prone
        zram_free_page
        put_entry_bdev
        spin_lock(bitmap->lock); <-- deadlock prone
      
      With akpm's suggestion (i.e.  bitmap operation is already atomic), we
      could remove bitmap lock.  It might fail to find a empty slot if serious
      contention happens.  However, it's not severe problem because huge page
      writeback has already possiblity to fail if there is severe memory
      pressure.  Worst case is just keeping the incompressible in memory, not
      storage.
      
      The other problem is zram_slot_lock in zram_slot_slot_free_notify.  To
      make it safe is this patch introduces zram_slot_trylock where
      zram_slot_free_notify uses it.  Although it's rare to be contented, this
      patch adds new debug stat "miss_free" to keep monitoring how often it
      happens.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c9959e0
  12. 11 10月, 2018 1 次提交
    • B
      drivers/block: remove redundant 'default n' from Kconfig-s · 486c6fba
      Bartlomiej Zolnierkiewicz 提交于
      'default n' is the default value for any bool or tristate Kconfig
      setting so there is no need to write it explicitly.
      
      Also since commit f467c564 ("kconfig: only write '# CONFIG_FOO
      is not set' for visible symbols") the Kconfig behavior is the same
      regardless of 'default n' being present or not:
      
          ...
          One side effect of (and the main motivation for) this change is making
          the following two definitions behave exactly the same:
      
              config FOO
                      bool
      
              config FOO
                      bool
                      default n
      
          With this change, neither of these will generate a
          '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
          That might make it clearer to people that a bare 'default n' is
          redundant.
          ...
      Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      486c6fba
  13. 28 9月, 2018 1 次提交
  14. 23 8月, 2018 1 次提交
  15. 11 8月, 2018 1 次提交
  16. 18 7月, 2018 2 次提交
    • M
      block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3
      Michael Callahan 提交于
      Add and use a new op_stat_group() function for indexing partition stat
      fields rather than indexing them by rq_data_dir() or bio_data_dir().
      This function works similarly to op_is_sync() in that it takes the
      request::cmd_flags or bio::bi_opf flags and determines which stats
      should et updated.
      
      In addition, the second parameter to generic_start_io_acct() and
      generic_end_io_acct() is now a REQ_OP rather than simply a read or
      write bit and it uses op_stat_group() on the parameter to determine
      the stat group.
      
      Note that the partition in_flight counts are not part of the per-cpu
      statistics and as such are not indexed via this function.  It's now
      indexed by op_is_write().
      
      tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
      Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ddcf35d3
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  17. 13 6月, 2018 1 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
  18. 08 6月, 2018 4 次提交
  19. 06 4月, 2018 1 次提交
    • S
      zram: drop max_zpage_size and use zs_huge_class_size() · 60f5921a
      Sergey Senozhatsky 提交于
      Remove ZRAM's enforced "huge object" value and use zsmalloc huge-class
      watermark instead, which makes more sense.
      
      TEST
      - I used a 1G zram device, LZO compression back-end, original
        data set size was 444MB. Looking at zsmalloc classes stats the
        test ended up to be pretty fair.
      
      BASE ZRAM/ZSMALLOC
      =====================
      zram mm_stat
      
      498978816 191482495 199831552        0 199831552    15634        0
      
      zsmalloc classes
      
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
      ...
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0           380        380        304                4        0
         254  4096           0            0         10620      10620      10620                1        0
      
       Total                 7           46        106982     106187      48787                         0
      
      PATCHED ZRAM/ZSMALLOC
      =====================
      
      zram mm_stat
      
      498978816 182579184 194248704        0 194248704    15628        0
      
      zsmalloc classes
      
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
      ...
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0          7180       7180       5744                4        0
         254  4096           0            0          3820       3820       3820                1        0
      
       Total                 8           45        106959     106193      47424                         0
      
      As we can see, we reduced the number of objects stored in class-4096,
      because a huge number of objects which we previously forcibly stored in
      class-4096 now stored in non-huge class-3264.  This results in lower
      memory consumption:
      
      - zsmalloc now uses 47424 physical pages, which is less than 48787 pages
        zsmalloc used before.
      
      - objects that we store in class-3264 share zspages.  That's why overall
        the number of pages that both class-4096 and class-3264 consumed went
        down from 10924 to 9564.
      
      [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
        Link: http://lkml.kernel.org/r/20180314081833.1096-3-sergey.senozhatsky@gmail.com
      Link: http://lkml.kernel.org/r/20180306070639.7389-3-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60f5921a
  20. 18 3月, 2018 1 次提交
    • B
      block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21
      Bart Van Assche 提交于
      It happens often while I'm preparing a patch for a block driver that
      I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
      available for this driver? Do I have to introduce definitions of these
      constants before I can use these constants? To avoid this confusion,
      move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
      <linux/blkdev.h> header file such that these become available for all
      block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
      header file conditional to avoid that including that header file after
      <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
      redefinition.
      
      Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
      not been removed from uapi header files nor from NAND drivers in
      which these constants are used for another purpose than converting
      block layer offsets and sizes into a number of sectors.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      233bde21
  21. 09 3月, 2018 1 次提交
  22. 01 3月, 2018 1 次提交
  23. 07 1月, 2018 1 次提交
  24. 16 11月, 2017 5 次提交
    • C
      drivers/block/zram/zram_drv.c: make zram_page_end_io() static · 384bc41f
      Colin Ian King 提交于
      zram_page_end_io() is local to the source and does not need to be in
      global scope, so make it static.
      
      Cleans up sparse warning:
      
        symbol 'zram_page_end_io' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/20171016173336.20320-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      384bc41f
    • S
      zram: remove zlib from the list of recommended algorithms · 0b07ff39
      Sergey Senozhatsky 提交于
      ZSTD tends to outperform deflate/inflate, thus we remove zlib from the
      list of recommended algorithms and recommend zstd instead.
      
      Link: http://lkml.kernel.org/r/20170912050005.3247-2-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b07ff39
    • S
      zram: add zstd to the supported algorithms list · 5ef3a8b1
      Sergey Senozhatsky 提交于
      Add ZSTD to the list of supported compression algorithms.
      
      ZRAM fio perf test:
      
                            LZO         DEFLATE         ZSTD
      
      #jobs1
      WRITE:              (2180MB/s)   (77.2MB/s)      (1429MB/s)
      WRITE:              (1617MB/s)   (77.7MB/s)      (1202MB/s)
      READ:                (426MB/s)   (595MB/s)       (1181MB/s)
      READ:                (422MB/s)   (572MB/s)       (1020MB/s)
      READ:                (318MB/s)   (67.8MB/s)      (563MB/s)
      WRITE:               (318MB/s)   (67.9MB/s)      (564MB/s)
      READ:                (336MB/s)   (68.3MB/s)      (583MB/s)
      WRITE:               (335MB/s)   (68.2MB/s)      (582MB/s)
      #jobs2
      WRITE:              (3441MB/s)   (152MB/s)       (2141MB/s)
      WRITE:              (2507MB/s)   (147MB/s)       (1888MB/s)
      READ:                (801MB/s)   (1146MB/s)      (1890MB/s)
      READ:                (767MB/s)   (1096MB/s)      (2073MB/s)
      READ:                (621MB/s)   (126MB/s)       (1009MB/s)
      WRITE:               (621MB/s)   (126MB/s)       (1009MB/s)
      READ:                (656MB/s)   (125MB/s)       (1075MB/s)
      WRITE:               (657MB/s)   (126MB/s)       (1077MB/s)
      #jobs3
      WRITE:              (4772MB/s)   (225MB/s)       (3394MB/s)
      WRITE:              (3905MB/s)   (211MB/s)       (2939MB/s)
      READ:               (1216MB/s)   (1608MB/s)      (3218MB/s)
      READ:               (1159MB/s)   (1431MB/s)      (2981MB/s)
      READ:                (906MB/s)   (156MB/s)       (1457MB/s)
      WRITE:               (907MB/s)   (156MB/s)       (1458MB/s)
      READ:                (953MB/s)   (158MB/s)       (1595MB/s)
      WRITE:               (952MB/s)   (157MB/s)       (1593MB/s)
      #jobs4
      WRITE:              (6036MB/s)   (265MB/s)       (4469MB/s)
      WRITE:              (5059MB/s)   (263MB/s)       (3951MB/s)
      READ:               (1618MB/s)   (2066MB/s)      (4276MB/s)
      READ:               (1573MB/s)   (1942MB/s)      (3830MB/s)
      READ:               (1202MB/s)   (227MB/s)       (1971MB/s)
      WRITE:              (1200MB/s)   (227MB/s)       (1968MB/s)
      READ:               (1265MB/s)   (226MB/s)       (2116MB/s)
      WRITE:              (1264MB/s)   (226MB/s)       (2114MB/s)
      #jobs5
      WRITE:              (5339MB/s)   (233MB/s)       (3781MB/s)
      WRITE:              (4298MB/s)   (234MB/s)       (3276MB/s)
      READ:               (1626MB/s)   (2048MB/s)      (4081MB/s)
      READ:               (1567MB/s)   (1929MB/s)      (3758MB/s)
      READ:               (1174MB/s)   (205MB/s)       (1747MB/s)
      WRITE:              (1173MB/s)   (204MB/s)       (1746MB/s)
      READ:               (1214MB/s)   (208MB/s)       (1890MB/s)
      WRITE:              (1215MB/s)   (208MB/s)       (1892MB/s)
      #jobs6
      WRITE:              (5666MB/s)   (270MB/s)       (4338MB/s)
      WRITE:              (4828MB/s)   (267MB/s)       (3772MB/s)
      READ:               (1803MB/s)   (2058MB/s)      (4946MB/s)
      READ:               (1805MB/s)   (2156MB/s)      (4711MB/s)
      READ:               (1334MB/s)   (235MB/s)       (2135MB/s)
      WRITE:              (1335MB/s)   (235MB/s)       (2137MB/s)
      READ:               (1364MB/s)   (236MB/s)       (2268MB/s)
      WRITE:              (1365MB/s)   (237MB/s)       (2270MB/s)
      #jobs7
      WRITE:              (5474MB/s)   (270MB/s)       (4300MB/s)
      WRITE:              (4666MB/s)   (266MB/s)       (3817MB/s)
      READ:               (2022MB/s)   (2319MB/s)      (5472MB/s)
      READ:               (1924MB/s)   (2260MB/s)      (5031MB/s)
      READ:               (1369MB/s)   (242MB/s)       (2153MB/s)
      WRITE:              (1370MB/s)   (242MB/s)       (2155MB/s)
      READ:               (1499MB/s)   (246MB/s)       (2310MB/s)
      WRITE:              (1497MB/s)   (246MB/s)       (2307MB/s)
      #jobs8
      WRITE:              (5558MB/s)   (273MB/s)       (4439MB/s)
      WRITE:              (4763MB/s)   (271MB/s)       (3918MB/s)
      READ:               (2201MB/s)   (2599MB/s)      (6062MB/s)
      READ:               (2105MB/s)   (2463MB/s)      (5413MB/s)
      READ:               (1490MB/s)   (252MB/s)       (2238MB/s)
      WRITE:              (1488MB/s)   (252MB/s)       (2236MB/s)
      READ:               (1566MB/s)   (254MB/s)       (2434MB/s)
      WRITE:              (1568MB/s)   (254MB/s)       (2437MB/s)
      #jobs9
      WRITE:              (5120MB/s)   (264MB/s)       (4035MB/s)
      WRITE:              (4531MB/s)   (267MB/s)       (3740MB/s)
      READ:               (1940MB/s)   (2258MB/s)      (4986MB/s)
      READ:               (2024MB/s)   (2387MB/s)      (4871MB/s)
      READ:               (1343MB/s)   (246MB/s)       (2038MB/s)
      WRITE:              (1342MB/s)   (246MB/s)       (2037MB/s)
      READ:               (1553MB/s)   (238MB/s)       (2243MB/s)
      WRITE:              (1552MB/s)   (238MB/s)       (2242MB/s)
      #jobs10
      WRITE:              (5345MB/s)   (271MB/s)       (3988MB/s)
      WRITE:              (4750MB/s)   (254MB/s)       (3668MB/s)
      READ:               (1876MB/s)   (2363MB/s)      (5150MB/s)
      READ:               (1990MB/s)   (2256MB/s)      (5080MB/s)
      READ:               (1355MB/s)   (250MB/s)       (2019MB/s)
      WRITE:              (1356MB/s)   (251MB/s)       (2020MB/s)
      READ:               (1490MB/s)   (252MB/s)       (2202MB/s)
      WRITE:              (1488MB/s)   (252MB/s)       (2199MB/s)
      
      jobs1                              perfstat
      instructions                 52,065,555,710 (    0.79)    855,731,114,587 (    2.64)       54,280,709,944 (    1.40)
      branches                     14,020,427,116 ( 725.847)    101,733,449,582 (1074.521)       11,170,591,067 ( 992.869)
      branch-misses                    22,626,174 (   0.16%)        274,197,885 (   0.27%)           25,915,805 (   0.23%)
      jobs2                              perfstat
      instructions                103,633,110,402 (    0.75)  1,710,822,100,914 (    2.59)      107,879,874,104 (    1.28)
      branches                     27,931,237,282 ( 679.203)    203,298,267,479 (1037.326)       22,185,350,842 ( 884.427)
      branch-misses                    46,103,811 (   0.17%)        533,747,204 (   0.26%)           49,682,483 (   0.22%)
      jobs3                              perfstat
      instructions                154,857,283,657 (    0.76)  2,565,748,974,197 (    2.57)      161,515,435,813 (    1.31)
      branches                     41,759,490,355 ( 670.529)    304,905,605,277 ( 978.765)       33,215,805,907 ( 888.003)
      branch-misses                    74,263,293 (   0.18%)        759,746,240 (   0.25%)           76,841,196 (   0.23%)
      jobs4                              perfstat
      instructions                206,215,849,076 (    0.75)  3,420,169,460,897 (    2.60)      215,003,061,664 (    1.31)
      branches                     55,632,141,739 ( 666.501)    406,394,977,433 ( 927.241)       44,214,322,251 ( 883.532)
      branch-misses                   102,287,788 (   0.18%)      1,098,617,314 (   0.27%)          103,891,040 (   0.23%)
      jobs5                              perfstat
      instructions                258,711,315,588 (    0.67)  4,275,657,533,244 (    2.23)      269,332,235,685 (    1.08)
      branches                     69,802,821,166 ( 588.823)    507,996,211,252 ( 797.036)       55,450,846,129 ( 735.095)
      branch-misses                   129,217,214 (   0.19%)      1,243,284,991 (   0.24%)          173,512,278 (   0.31%)
      jobs6                              perfstat
      instructions                312,796,166,008 (    0.61)  5,133,896,344,660 (    2.02)      323,658,769,588 (    1.04)
      branches                     84,372,488,583 ( 520.541)    610,310,494,402 ( 697.642)       66,683,292,992 ( 693.939)
      branch-misses                   159,438,978 (   0.19%)      1,396,368,563 (   0.23%)          174,406,934 (   0.26%)
      jobs7                              perfstat
      instructions                363,211,372,930 (    0.56)  5,988,205,600,879 (    1.75)      377,824,674,156 (    0.93)
      branches                     98,057,013,765 ( 463.117)    711,841,255,974 ( 598.762)       77,879,009,954 ( 600.443)
      branch-misses                   199,513,153 (   0.20%)      1,507,651,077 (   0.21%)          248,203,369 (   0.32%)
      jobs8                              perfstat
      instructions                413,960,354,615 (    0.52)  6,842,918,558,378 (    1.45)      431,938,486,581 (    0.83)
      branches                    111,812,574,884 ( 414.224)    813,299,084,518 ( 491.173)       89,062,699,827 ( 517.795)
      branch-misses                   233,584,845 (   0.21%)      1,531,593,921 (   0.19%)          286,818,489 (   0.32%)
      jobs9                              perfstat
      instructions                465,976,220,300 (    0.53)  7,698,467,237,372 (    1.47)      486,352,600,321 (    0.84)
      branches                    125,931,456,162 ( 424.063)    915,207,005,715 ( 498.192)      100,370,404,090 ( 517.439)
      branch-misses                   256,992,445 (   0.20%)      1,782,809,816 (   0.19%)          345,239,380 (   0.34%)
      jobs10                             perfstat
      instructions                517,406,372,715 (    0.53)  8,553,527,312,900 (    1.48)      540,732,653,094 (    0.84)
      branches                    139,839,780,676 ( 427.732)  1,016,737,699,389 ( 503.172)      111,696,557,638 ( 516.750)
      branch-misses                   259,595,561 (   0.19%)      1,952,570,279 (   0.19%)          357,818,661 (   0.32%)
      
      seconds elapsed        20.630411534     96.084546565    12.743373571
      seconds elapsed        22.292627625     100.984155001   14.407413560
      seconds elapsed        22.396016966     110.344880848   14.032201392
      seconds elapsed        22.517330949     113.351459170   14.243074935
      seconds elapsed        28.548305104     156.515193765   19.159286861
      seconds elapsed        30.453538116     164.559937678   19.362492717
      seconds elapsed        33.467108086     188.486827481   21.492612173
      seconds elapsed        35.617727591     209.602677783   23.256422492
      seconds elapsed        42.584239509     243.959902566   28.458540338
      seconds elapsed        47.683632526     269.635248851   31.542404137
      
      Over all, ZSTD has slower WRITE, but much faster READ (perhaps
      a static compression buffer used during the test helped ZSTD a
      lot), which results in faster test results.
      
      Memory consumption (zram mm_stat file):
      
      zram LZO mm_stat
      mm_stat (jobs1): 2147483648 23068672 33558528        0 33558528        0        0
      mm_stat (jobs2): 2147483648 23068672 33558528        0 33558528        0        0
      mm_stat (jobs3): 2147483648 23068672 33558528        0 33562624        0        0
      mm_stat (jobs4): 2147483648 23068672 33558528        0 33558528        0        0
      mm_stat (jobs5): 2147483648 23068672 33558528        0 33558528        0        0
      mm_stat (jobs6): 2147483648 23068672 33558528        0 33562624        0        0
      mm_stat (jobs7): 2147483648 23068672 33558528        0 33566720        0        0
      mm_stat (jobs8): 2147483648 23068672 33558528        0 33558528        0        0
      mm_stat (jobs9): 2147483648 23068672 33558528        0 33558528        0        0
      mm_stat (jobs10): 2147483648 23068672 33558528        0 33562624        0        0
      
      zram DEFLATE mm_stat
      mm_stat (jobs1): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs2): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs3): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs4): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs5): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs6): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs7): 2147483648 16252928 25178112        0 25190400        0        0
      mm_stat (jobs8): 2147483648 16252928 25178112        0 25190400        0        0
      mm_stat (jobs9): 2147483648 16252928 25178112        0 25178112        0        0
      mm_stat (jobs10): 2147483648 16252928 25178112        0 25178112        0        0
      
      zram ZSTD mm_stat
      mm_stat (jobs1): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs2): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs3): 2147483648 11010048 16781312        0 16785408        0        0
      mm_stat (jobs4): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs5): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs6): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs7): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs8): 2147483648 11010048 16781312        0 16781312        0        0
      mm_stat (jobs9): 2147483648 11010048 16781312        0 16785408        0        0
      mm_stat (jobs10): 2147483648 11010048 16781312        0 16781312        0        0
      
      ==================================================================================
      
      Official benchmarks [1]:
      
      Compressor name         Ratio   Compression     Decompress.
      zstd 1.1.3 -1           2.877   430 MB/s        1110 MB/s
      zlib 1.2.8 -1           2.743   110 MB/s        400 MB/s
      brotli 0.5.2 -0         2.708   400 MB/s        430 MB/s
      quicklz 1.5.0 -1        2.238   550 MB/s        710 MB/s
      lzo1x 2.09 -1           2.108   650 MB/s        830 MB/s
      lz4 1.7.5               2.101   720 MB/s        3600 MB/s
      snappy 1.1.3            2.091   500 MB/s        1650 MB/s
      lzf 3.6 -1              2.077   400 MB/s        860 MB/s
      
      Minchan said:
      
      : I did test with my sample data and compared zstd with deflate.  zstd's
      : compress ratio is lower a little bit but compression speed is much faster
      : 3 times more and decompress speed is too 2 times more.  With different
      : data, it is different but overall, zstd would be better for speed at the
      : cost of a little lower compress ratio(about 5%) so I believe it's worth to
      : replace deflate.
      
      [1] https://github.com/facebook/zstd
      
      Link: http://lkml.kernel.org/r/20170912050005.3247-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Tested-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef3a8b1
    • M
      bdi: introduce BDI_CAP_SYNCHRONOUS_IO · 23c47d2a
      Minchan Kim 提交于
      As discussed at
      
        https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com>
      
      someday we will remove rw_page().  If so, we need something to detect
      such super-fast storage on which synchronous IO operations like the
      current rw_page are always a win.
      
      Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices.  With it, we
      could use various optimization techniques.
      
      Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23c47d2a
    • M
      zram: set BDI_CAP_STABLE_WRITES once · e447a015
      Minchan Kim 提交于
      With fast swap storage, the platform wants to use swap more aggressively
      and swap-in is crucial to application latency.
      
      The rw_page() based synchronous devices like zram, pmem and btt are such
      fast storage.  When I profile swapin performance with zram lz4
      decompress test, S/W overhead is more than 70%.  Maybe, it would be
      bigger in nvdimm.
      
      This patchset reduces swap-in latency by skipping swapcache if the swap
      device is a synchronous device like a rw_page() based device.
      
      It enhances by 45% my swapin test (5G sequential swapin, no readahead)
      from 2.41sec to 1.64sec.
      
      This patch (of 4):
      
      Commit 19b7ccf8 ("block: get rid of blk_integrity_revalidate()")
      fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag
      unconditionally whenever revalidat_disk is called) so zram doesn't need
      to reset the flag any more when revalidating the bdev.  Instead, set the
      flag just once when the zram device is created.
      
      It shouldn't change any behavior.
      
      Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e447a015
反馈
建议
客服 返回
顶部