1. 16 11月, 2020 1 次提交
  2. 19 10月, 2020 1 次提交
  3. 17 10月, 2020 1 次提交
  4. 25 9月, 2020 2 次提交
  5. 24 9月, 2020 1 次提交
  6. 02 9月, 2020 1 次提交
  7. 01 7月, 2020 3 次提交
  8. 27 5月, 2020 1 次提交
  9. 28 3月, 2020 1 次提交
    • C
      block: simplify queue allocation · 3d745ea5
      Christoph Hellwig 提交于
      Current make_request based drivers use either blk_alloc_queue_node or
      blk_alloc_queue to allocate a queue, and then set up the make_request_fn
      function pointer and a few parameters using the blk_queue_make_request
      helper.  Simplify this by passing the make_request pointer to
      blk_alloc_queue, and while at it merge the _node variant into the main
      helper by always passing a node_id, and remove the superfluous gfp_mask
      parameter.  A lower-level __blk_alloc_queue is kept for the blk-mq case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d745ea5
  10. 25 3月, 2020 1 次提交
  11. 01 2月, 2020 2 次提交
  12. 19 10月, 2019 1 次提交
  13. 27 4月, 2019 1 次提交
  14. 30 3月, 2019 1 次提交
    • M
      drivers/block/zram/zram_drv.c: fix idle/writeback string compare · 0bc9f5d1
      Minchan Kim 提交于
      Makoto report a below KASAN error: zram does out-of-bounds read.  Because
      strscpy copies from source up to count bytes unconditionally.  It could
      cause out-of-bounds read on next object in slab.
      
      To prevent it, use strlcpy which checks source's length automatically.
      
         BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154
         Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314
         ..
         Call trace:
           strscpy+0x68/0x154
           idle_store+0xc4/0x34c
           dev_attr_store+0x50/0x6c
           sysfs_kf_write+0x98/0xb4
           kernfs_fop_write+0x198/0x260
           __vfs_write+0x10c/0x338
           vfs_write+0x114/0x238
           SyS_write+0xc8/0x168
           __sys_trace_return+0x0/0x4
      
         Allocated by task 1314:
          __kmalloc+0x280/0x318
          kernfs_fop_write+0xac/0x260
          __vfs_write+0x10c/0x338
          vfs_write+0x114/0x238
          SyS_write+0xc8/0x168
          __sys_trace_return+0x0/0x4
      
         Freed by task 2855:
          kfree+0x138/0x630
          kernfs_put_open_node+0x10c/0x124
          kernfs_fop_release+0xd8/0x114
          __fput+0x130/0x2a4
          ____fput+0x1c/0x28
          task_work_run+0x16c/0x1c8
          do_notify_resume+0x2bc/0x107c
          work_pending+0x8/0x10
      
         The buggy address belongs to the object at ffffffc0c3495a00
          which belongs to the cache kmalloc-128 of size 128
         The buggy address is located 0 bytes inside of
          128-byte region [ffffffc0c3495a00, ffffffc0c3495a80)
         The buggy address belongs to the page:
         page:ffffffbf030d2500 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
         flags: 0x4000000000010200(slab|head)
         page dumped because: kasan: bad access detected
      
         Memory state around the buggy address:
          ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         >ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                            ^
          ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Link: http://lkml.kernel.org/r/20190319231911.145968-1-minchan@kernel.org
      Cc: <stable@vger.kernel.org>	[5.0]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: NMakoto Wu <makotowu@google.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bc9f5d1
  15. 15 3月, 2019 1 次提交
  16. 09 1月, 2019 1 次提交
    • M
      zram: idle writeback fixes and cleanup · 1d69a3f8
      Minchan Kim 提交于
      This patch includes some fixes and cleanup for idle-page writeback.
      
      1. writeback_limit interface
      
      Now writeback_limit interface is rather conusing.  For example, once
      writeback limit budget is exausted, admin can see 0 from
      /sys/block/zramX/writeback_limit which is same semantic with disable
      writeback_limit at this moment.  IOW, admin cannot tell that zero came
      from disable writeback limit or exausted writeback limit.
      
      To make the interface clear, let's sepatate enable of writeback limit to
      another knob - /sys/block/zram0/writeback_limit_enable
      
      * before:
        while true :
          # to re-enable writeback limit once previous one is used up
          echo 0 > /sys/block/zram0/writeback_limit
          echo $((200<<20)) > /sys/block/zram0/writeback_limit
          ..
          .. # used up the writeback limit budget
      
      * new
        # To enable writeback limit, from the beginning, admin should
        # enable it.
        echo $((200<<20)) > /sys/block/zram0/writeback_limit
        echo 1 > /sys/block/zram/0/writeback_limit_enable
        while true :
          echo $((200<<20)) > /sys/block/zram0/writeback_limit
          ..
          .. # used up the writeback limit budget
      
      It's much strightforward.
      
      2. fix condition check idle/huge writeback mode check
      
      The mode in writeback_store is not bit opeartion any more so no need to
      use bit operations.  Furthermore, current condition check is broken in
      that it does writeback every pages regardless of huge/idle.
      
      3. clean up idle_store
      
      No need to use goto.
      
      [minchan@kernel.org: missed spin_lock_init]
        Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com
      Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Suggested-by: NJohn Dias <joaodias@google.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Srinivas Paladugu <srnvs@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d69a3f8
  17. 29 12月, 2018 7 次提交
    • M
      zram: writeback throttle · bb416d18
      Minchan Kim 提交于
      If there are lots of write IO with flash device, it could have a
      wearout problem of storage. To overcome the problem, admin needs
      to design write limitation to guarantee flash health
      for entire product life.
      
      This patch creates a new knob "writeback_limit" for zram.
      
      writeback_limit's default value is 0 so that it doesn't limit
      any writeback. If admin want to measure writeback count in a
      certain period, he could know it via /sys/block/zram0/bd_stat's
      3rd column.
      
      If admin want to limit writeback as per-day 400M, he could do it
      like below.
      
      	MB_SHIFT=20
      	4K_SHIFT=12
      	echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
      		/sys/block/zram0/writeback_limit.
      
      If admin want to allow further write again, he could do it like below
      
      	echo 0 > /sys/block/zram0/writeback_limit
      
      If admin want to see remaining writeback budget,
      
      	cat /sys/block/zram0/writeback_limit
      
      The writeback_limit count will reset whenever you reset zram (e.g., system
      reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of writeback
      happened until you reset the zram to allocate extra writeback budget in
      next setting is user's job.
      
      [minchan@kernel.org: v4]
        Link: http://lkml.kernel.org/r/20181203024045.153534-8-minchan@kernel.org
      Link: http://lkml.kernel.org/r/20181127055429.251614-8-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb416d18
    • M
      zram: add bd_stat statistics · 23eddf39
      Minchan Kim 提交于
      bd_stat represents things that happened in the backing device.  Currently
      it supports bd_counts, bd_reads and bd_writes which are helpful to
      understand wearout of flash and memory saving.
      
      [minchan@kernel.org: v4]
        Link: http://lkml.kernel.org/r/20181203024045.153534-7-minchan@kernel.org
      Link: http://lkml.kernel.org/r/20181127055429.251614-7-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23eddf39
    • M
      zram: support idle/huge page writeback · a939888e
      Minchan Kim 提交于
      Add a new feature "zram idle/huge page writeback".  In the zram-swap use
      case, zram usually has many idle/huge swap pages.  It's pointless to keep
      them in memory (ie, zram).
      
      To solve this problem, this feature introduces idle/huge page writeback to
      the backing device so the goal is to save more memory space on embedded
      systems.
      
      Normal sequence to use idle/huge page writeback feature is as follows,
      
      while (1) {
              # mark allocated zram slot to idle
              echo all > /sys/block/zram0/idle
              # leave system working for several hours
              # Unless there is no access for some blocks on zram,
      	# they are still IDLE marked pages.
      
              echo "idle" > /sys/block/zram0/writeback
      	or/and
      	echo "huge" > /sys/block/zram0/writeback
              # write the IDLE or/and huge marked slot into backing device
      	# and free the memory.
      }
      
      Per the discussion at
      https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
      
      This patch removes direct incommpressibe page writeback feature
      (d2afd25114f4 ("zram: write incompressible pages to backing device")).
      
      Below concerns from Sergey:
      == &< ==
      
      "IDLE writeback" is superior to "incompressible writeback".
      
      "incompressible writeback" is completely unpredictable and uncontrollable;
      it depens on data patterns and compression algorithms.  While "IDLE
      writeback" is predictable.
      
      I even suspect, that, *ideally*, we can remove "incompressible writeback".
      "IDLE pages" is a super set which also includes "incompressible" pages.
      So, technically, we still can do "incompressible writeback" from "IDLE
      writeback" path; but a much more reasonable one, based on a page idling
      period.
      
      I understand that you want to keep "direct incompressible writeback"
      around.  ZRAM is especially popular on devices which do suffer from flash
      wearout, so I can see "incompressible writeback" path becoming a dead
      code, long term.
      
      == &< ==
      
      Below concerns from Minchan:
      == &< ==
      
      My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
      both hugepage/idlepage writeck will turn on.  However someuser want to
      enable only idlepage writeback so we need to introduce turn on/off knob
      for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase.  I
      don't want to make it complicated *if possible*.
      
      Long term, I imagine we need to make VM aware of new swap hierarchy a
      little bit different with as-is.  For example, first high priority swap
      can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
      swap device.  With that, hugepage writeback will work tranparently.
      
      So we could regard it as regression because incompressible pages doesn't
      go to backing storage automatically.  Instead, user should do it via "echo
      huge" > /sys/block/zram/writeback" manually.
      
      == &< ==
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a939888e
    • M
      zram: introduce ZRAM_IDLE flag · e82592c4
      Minchan Kim 提交于
      To support idle page writeback with upcoming patches, this patch
      introduces a new ZRAM_IDLE flag.
      
      Userspace can mark zram slots as "idle" via
      	"echo all > /sys/block/zramX/idle"
      which marks every allocated zram slot as ZRAM_IDLE.
      User could see it by /sys/kernel/debug/zram/zram0/block_state.
      
                300    75.033841 ...i
                301    63.806904 s..i
                302    63.806919 ..hi
      
      Once there is IO for the slot, the mark will be disappeared.
      
      	  300    75.033841 ...
                301    63.806904 s..i
                302    63.806919 ..hi
      
      Therefore, 300th block is idle zpage. With this feature,
      user can how many zram has idle pages which are waste of memory.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-5-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e82592c4
    • M
      zram: refactor flags and writeback stuff · 7e529283
      Minchan Kim 提交于
      Rename some variables and restructure some code for better readability in
      writeback and zs_free_page.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-4-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e529283
    • M
      zram: fix double free backing device · 5547932d
      Minchan Kim 提交于
      If blkdev_get fails, we shouldn't do blkdev_put.  Otherwise, kernel emits
      below log.  This patch fixes it.
      
        WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120
        Modules linked in:
        CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        RIP: 0010:blkdev_put+0x105/0x120
        Call Trace:
          __x64_sys_swapoff+0x46d/0x490
          do_syscall_64+0x5a/0x190
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
        irq event stamp: 4466
        hardirqs last  enabled at (4465):  __free_pages_ok+0x1e3/0x490
        hardirqs last disabled at (4466):  trace_hardirqs_off_thunk+0x1a/0x1c
        softirqs last  enabled at (3420):  __do_softirq+0x333/0x446
        softirqs last disabled at (3407):  irq_exit+0xd1/0xe0
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5547932d
    • M
      zram: fix lockdep warning of free block handling · 3c9959e0
      Minchan Kim 提交于
      Patch series "zram idle page writeback", v3.
      
      Inherently, swap device has many idle pages which are rare touched since
      it was allocated.  It is never problem if we use storage device as swap.
      However, it's just waste for zram-swap.
      
      This patchset supports zram idle page writeback feature.
      
      * Admin can define what is idle page "no access since X time ago"
      * Admin can define when zram should writeback them
      * Admin can define when zram should stop writeback to prevent wearout
      
      Details are in each patch's description.
      
      This patch (of 7):
      
        ================================
        WARNING: inconsistent lock state
        4.19.0+ #390 Not tainted
        --------------------------------
        inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
        00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
        {SOFTIRQ-ON-W} state was registered at:
          _raw_spin_lock+0x2c/0x40
          zram_make_request+0x755/0xdc9
          generic_make_request+0x373/0x6a0
          submit_bio+0x6c/0x140
          __swap_writepage+0x3a8/0x480
          shrink_page_list+0x1102/0x1a60
          shrink_inactive_list+0x21b/0x3f0
          shrink_node_memcg.constprop.99+0x4f8/0x7e0
          shrink_node+0x7d/0x2f0
          do_try_to_free_pages+0xe0/0x300
          try_to_free_pages+0x116/0x2b0
          __alloc_pages_slowpath+0x3f4/0xf80
          __alloc_pages_nodemask+0x2a2/0x2f0
          __handle_mm_fault+0x42e/0xb50
          handle_mm_fault+0x55/0xb0
          __do_page_fault+0x235/0x4b0
          page_fault+0x1e/0x30
        irq event stamp: 228412
        hardirqs last  enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
        hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
        softirqs last  enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
        softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&(&zram->bitmap_lock)->rlock);
          <Interrupt>
            lock(&(&zram->bitmap_lock)->rlock);
      
         *** DEADLOCK ***
      
        no locks held by zram_verify/2095.
      
        stack backtrace:
        CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         <IRQ>
         dump_stack+0x67/0x9b
         print_usage_bug+0x1bd/0x1d3
         mark_lock+0x4aa/0x540
         __lock_acquire+0x51d/0x1300
         lock_acquire+0x90/0x180
         _raw_spin_lock+0x2c/0x40
         put_entry_bdev+0x1e/0x50
         zram_free_page+0xf6/0x110
         zram_slot_free_notify+0x42/0xa0
         end_swap_bio_read+0x5b/0x170
         blk_update_request+0x8f/0x340
         scsi_end_request+0x2c/0x1e0
         scsi_io_completion+0x98/0x650
         blk_done_softirq+0x9e/0xd0
         __do_softirq+0xcc/0x427
         irq_exit+0xd1/0xe0
         do_IRQ+0x93/0x120
         common_interrupt+0xf/0xf
         </IRQ>
      
      With writeback feature, zram_slot_free_notify could be called in softirq
      context by end_swap_bio_read.  However, bitmap_lock is not aware of that
      so lockdep yell out:
      
        get_entry_bdev
        spin_lock(bitmap->lock);
        irq
        softirq
        end_swap_bio_read
        zram_slot_free_notify
        zram_slot_lock <-- deadlock prone
        zram_free_page
        put_entry_bdev
        spin_lock(bitmap->lock); <-- deadlock prone
      
      With akpm's suggestion (i.e.  bitmap operation is already atomic), we
      could remove bitmap lock.  It might fail to find a empty slot if serious
      contention happens.  However, it's not severe problem because huge page
      writeback has already possiblity to fail if there is severe memory
      pressure.  Worst case is just keeping the incompressible in memory, not
      storage.
      
      The other problem is zram_slot_lock in zram_slot_slot_free_notify.  To
      make it safe is this patch introduces zram_slot_trylock where
      zram_slot_free_notify uses it.  Although it's rare to be contented, this
      patch adds new debug stat "miss_free" to keep monitoring how often it
      happens.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c9959e0
  18. 28 9月, 2018 1 次提交
  19. 23 8月, 2018 1 次提交
  20. 11 8月, 2018 1 次提交
  21. 18 7月, 2018 2 次提交
    • M
      block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3
      Michael Callahan 提交于
      Add and use a new op_stat_group() function for indexing partition stat
      fields rather than indexing them by rq_data_dir() or bio_data_dir().
      This function works similarly to op_is_sync() in that it takes the
      request::cmd_flags or bio::bi_opf flags and determines which stats
      should et updated.
      
      In addition, the second parameter to generic_start_io_acct() and
      generic_end_io_acct() is now a REQ_OP rather than simply a read or
      write bit and it uses op_stat_group() on the parameter to determine
      the stat group.
      
      Note that the partition in_flight counts are not part of the per-cpu
      statistics and as such are not indexed via this function.  It's now
      indexed by op_is_write().
      
      tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
      Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ddcf35d3
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  22. 13 6月, 2018 1 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
  23. 08 6月, 2018 4 次提交
  24. 06 4月, 2018 1 次提交
    • S
      zram: drop max_zpage_size and use zs_huge_class_size() · 60f5921a
      Sergey Senozhatsky 提交于
      Remove ZRAM's enforced "huge object" value and use zsmalloc huge-class
      watermark instead, which makes more sense.
      
      TEST
      - I used a 1G zram device, LZO compression back-end, original
        data set size was 444MB. Looking at zsmalloc classes stats the
        test ended up to be pretty fair.
      
      BASE ZRAM/ZSMALLOC
      =====================
      zram mm_stat
      
      498978816 191482495 199831552        0 199831552    15634        0
      
      zsmalloc classes
      
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
      ...
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0           380        380        304                4        0
         254  4096           0            0         10620      10620      10620                1        0
      
       Total                 7           46        106982     106187      48787                         0
      
      PATCHED ZRAM/ZSMALLOC
      =====================
      
      zram mm_stat
      
      498978816 182579184 194248704        0 194248704    15628        0
      
      zsmalloc classes
      
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
      ...
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0          7180       7180       5744                4        0
         254  4096           0            0          3820       3820       3820                1        0
      
       Total                 8           45        106959     106193      47424                         0
      
      As we can see, we reduced the number of objects stored in class-4096,
      because a huge number of objects which we previously forcibly stored in
      class-4096 now stored in non-huge class-3264.  This results in lower
      memory consumption:
      
      - zsmalloc now uses 47424 physical pages, which is less than 48787 pages
        zsmalloc used before.
      
      - objects that we store in class-3264 share zspages.  That's why overall
        the number of pages that both class-4096 and class-3264 consumed went
        down from 10924 to 9564.
      
      [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
        Link: http://lkml.kernel.org/r/20180314081833.1096-3-sergey.senozhatsky@gmail.com
      Link: http://lkml.kernel.org/r/20180306070639.7389-3-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60f5921a
  25. 09 3月, 2018 1 次提交
  26. 01 3月, 2018 1 次提交