1. 14 12月, 2014 1 次提交
    • J
      zsmalloc: merge size_class to reduce fragmentation · 9eec4cd5
      Joonsoo Kim 提交于
      zsmalloc has many size_classes to reduce fragmentation and they are in 16
      bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096.  And,
      zsmalloc has constraint that each zspage has 4 pages at maximum.
      
      In this situation, we can see interesting aspect.  Let's think about
      size_class for 1488, 1472, ..., 1376.  To prevent external fragmentation,
      they uses 4 pages per zspage and so all they can contain 11 objects at
      maximum.
      
      16384 (4096 * 4) = 1488 * 11 + remains
      16384 (4096 * 4) = 1472 * 11 + remains
      16384 (4096 * 4) = ...
      16384 (4096 * 4) = 1376 * 11 + remains
      
      It means that they have same characteristics and classification between
      them isn't needed.  If we use one size_class for them, we can reduce
      fragementation and save some memory since both the 1488 and 1472 sized
      classes can only fit 11 objects into 4 pages, and an object that's 1472
      bytes can fit into an object that's 1488 bytes, merging these classes to
      always use objects that are 1488 bytes will reduce the total number of
      size classes.  And reducing the total number of size classes reduces
      overall fragmentation, because a wider range of compressed pages can fit
      into a single size class, leaving less unused objects in each size class.
      
      For this purpose, this patch implement size_class merging.  If there is
      size_class that have same pages_per_zspage and same number of objects per
      zspage with previous size_class, we don't create new size_class.  Instead,
      we use previous, same characteristic size_class.  With this way, above
      example sizes (1488, 1472, ..., 1376) use just one size_class so we can
      get much more memory utilization.
      
      Below is result of my simple test.
      
      TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
      source code, remove directory in descending order in size.  (drivers arch
      fs sound include net Documentation firmware kernel tools)
      
      Each line represents orig_data_size, compr_data_size, mem_used_total,
      fragmentation overhead (mem_used - compr_data_size) and overhead ratio
      (overhead to compr_data_size), respectively, after untar and remove
      operation is executed.
      
      * untar-nomerge.out
      
      orig_size compr_size used_size overhead overhead_ratio
      525.88MB 199.16MB 210.23MB  11.08MB 5.56%
      288.32MB  97.43MB 105.63MB   8.20MB 8.41%
      177.32MB  61.12MB  69.40MB   8.28MB 13.55%
      146.47MB  47.32MB  56.10MB   8.78MB 18.55%
      124.16MB  38.85MB  48.41MB   9.55MB 24.58%
      103.93MB  31.68MB  40.93MB   9.25MB 29.21%
       84.34MB  22.86MB  32.72MB   9.86MB 43.13%
       66.87MB  14.83MB  23.83MB   9.00MB 60.70%
       60.67MB  11.11MB  18.60MB   7.49MB 67.48%
       55.86MB   8.83MB  16.61MB   7.77MB 88.03%
       53.32MB   8.01MB  15.32MB   7.31MB 91.24%
      
      * untar-merge.out
      
      orig_size compr_size used_size overhead overhead_ratio
      526.23MB 199.18MB 209.81MB  10.64MB 5.34%
      288.68MB  97.45MB 104.08MB   6.63MB 6.80%
      177.68MB  61.14MB  66.93MB   5.79MB 9.47%
      146.83MB  47.34MB  52.79MB   5.45MB 11.51%
      124.52MB  38.87MB  44.30MB   5.43MB 13.96%
      104.29MB  31.70MB  36.83MB   5.13MB 16.19%
       84.70MB  22.88MB  27.92MB   5.04MB 22.04%
       67.11MB  14.83MB  19.26MB   4.43MB 29.86%
       60.82MB  11.10MB  14.90MB   3.79MB 34.17%
       55.90MB   8.82MB  12.61MB   3.79MB 42.97%
       53.32MB   8.01MB  11.73MB   3.73MB 46.53%
      
      As you can see above result, merged one has better utilization (overhead
      ratio, 5th column) and uses less memory (mem_used_total, 3rd column).
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: <juno.choi@lge.com>
      Cc: "seungho1.park" <seungho1.park@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9eec4cd5
  2. 10 10月, 2014 4 次提交
    • D
      zsmalloc: simplify init_zspage free obj linking · 5538c562
      Dan Streetman 提交于
      Change zsmalloc init_zspage() logic to iterate through each object on each
      of its pages, checking the offset to verify the object is on the current
      page before linking it into the zspage.
      
      The current zsmalloc init_zspage free object linking code has logic that
      relies on there only being one page per zspage when PAGE_SIZE is a
      multiple of class->size.  It calculates the number of objects for the
      current page, and iterates through all of them plus one, to account for
      the assumed partial object at the end of the page.  While this currently
      works, the logic can be simplified to just link the object at each
      successive offset until the offset is larger than PAGE_SIZE, which does
      not rely on PAGE_SIZE being a multiple of class->size.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5538c562
    • W
      mm/zsmalloc.c: correct comment for fullness group computation · 6dd9737e
      Wang Sheng-Hui 提交于
      The letter 'f' in "n <= N/f" stands for fullness_threshold_frac, not
      1/fullness_threshold_frac.
      Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dd9737e
    • M
      zsmalloc: change return value unit of zs_get_total_size_bytes · 722cdc17
      Minchan Kim 提交于
      zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with
      *byte unit* but zsmalloc operates *page unit* rather than byte unit so
      let's change the API so benefit we could get is that reduce unnecessary
      overhead (ie, change page unit with byte unit) in zsmalloc.
      
      Since return type is pages, "zs_get_total_pages" is better than
      "zs_get_total_size_bytes".
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: <juno.choi@lge.com>
      Cc: <seungho1.park@lge.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: David Horner <ds2horner@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      722cdc17
    • M
      zsmalloc: move pages_allocated to zs_pool · 13de8933
      Minchan Kim 提交于
      Currently, zram has no feature to limit memory so theoretically zram can
      deplete system memory.  Users have asked for a limit several times as even
      without exhaustion zram makes it hard to control memory usage of the
      platform.  This patchset adds the feature.
      
      Patch 1 makes zs_get_total_size_bytes faster because it would be used
      frequently in later patches for the new feature.
      
      Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page
      so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT).
      
      Patch 3 adds new feature.  I added the feature into zram layer, not
      zsmalloc because limiation is zram's requirement, not zsmalloc so any
      other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary
      branch of zsmalloc.  In future, if every users of zsmalloc want the
      feature, then, we could move the feature from client side to zsmalloc
      easily but vice versa would be painful.
      
      Patch 4 adds news facility to report maximum memory usage of zram so that
      this avoids user polling frequently via /sys/block/zram0/ mem_used_total
      and ensures transient max are not missed.
      
      This patch (of 4):
      
      pages_allocated has counted in size_class structure and when user of
      zsmalloc want to see total_size_bytes, it should gather all of count from
      each size_class to report the sum.
      
      It's not bad if user don't see the value often but if user start to see
      the value frequently, it would be not a good deal for performance pov.
      
      This patch moves the count from size_class to zs_pool so it could reduce
      memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]).
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: <juno.choi@lge.com>
      Cc: <seungho1.park@lge.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Reviewed-by: NDavid Horner <ds2horner@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13de8933
  3. 30 8月, 2014 1 次提交
  4. 07 8月, 2014 3 次提交
  5. 05 6月, 2014 2 次提交
  6. 20 3月, 2014 1 次提交
    • S
      zsmalloc: Fix CPU hotplug callback registration · f0e71fcd
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the zsmalloc code by using this latter form of callback registration.
      
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f0e71fcd
  7. 31 1月, 2014 2 次提交
    • M
      zsmalloc: add copyright · 31fc00bb
      Minchan Kim 提交于
      Add my copyright to the zsmalloc source code which I maintain.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31fc00bb
    • M
      zsmalloc: move it under mm · bcf1647d
      Minchan Kim 提交于
      This patch moves zsmalloc under mm directory.
      
      Before that, description will explain why we have needed custom
      allocator.
      
      Zsmalloc is a new slab-based memory allocator for storing compressed
      pages.  It is designed for low fragmentation and high allocation success
      rate on large object, but <= PAGE_SIZE allocations.
      
      zsmalloc differs from the kernel slab allocator in two primary ways to
      achieve these design goals.
      
      zsmalloc never requires high order page allocations to back slabs, or
      "size classes" in zsmalloc terms.  Instead it allows multiple
      single-order pages to be stitched together into a "zspage" which backs
      the slab.  This allows for higher allocation success rate under memory
      pressure.
      
      Also, zsmalloc allows objects to span page boundaries within the zspage.
      This allows for lower fragmentation than could be had with the kernel
      slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
      kernel slab allocator, if a page compresses to 60% of it original size,
      the memory savings gained through compression is lost in fragmentation
      because another object of the same size can't be stored in the leftover
      space.
      
      This ability to span pages results in zsmalloc allocations not being
      directly addressable by the user.  The user is given an
      non-dereferencable handle in response to an allocation request.  That
      handle must be mapped, using zs_map_object(), which returns a pointer to
      the mapped region that can be used.  The mapping is necessary since the
      object data may reside in two different noncontigious pages.
      
      The zsmalloc fulfills the allocation needs for zram perfectly
      
      [sjenning@linux.vnet.ibm.com: borrow Seth's quote]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf1647d
  8. 18 12月, 2013 1 次提交
  9. 11 12月, 2013 2 次提交
  10. 26 11月, 2013 1 次提交
    • O
      staging: zsmalloc: Ensure handle is never 0 on success · 67296874
      Olav Haugan 提交于
      zsmalloc encodes a handle using the pfn and an object
      index. On hardware platforms with physical memory starting
      at 0x0 the pfn can be 0. This causes the encoded handle to be
      0 and is incorrectly interpreted as an allocation failure.
      
      This issue affects all current and future SoCs with physical
      memory starting at 0x0. All MSM8974 SoCs which includes
      Google Nexus 5 devices are affected.
      
      To prevent this false error we ensure that the encoded handle
      will not be 0 when allocation succeeds.
      Signed-off-by: NOlav Haugan <ohaugan@codeaurora.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67296874
  11. 24 7月, 2013 1 次提交
  12. 22 5月, 2013 1 次提交
  13. 21 5月, 2013 1 次提交
  14. 24 4月, 2013 1 次提交
    • A
      staging/zsmalloc: don't use pgtable-mapping from modules · 796ce5a7
      Arnd Bergmann 提交于
      Building zsmalloc as a module does not work on ARM because it uses
      an interface that is not exported:
      
      ERROR: "flush_tlb_kernel_range" [drivers/staging/zsmalloc/zsmalloc.ko] undefined!
      
      Since this is only used as a performance optimization and only on ARM,
      we can avoid the problem simply by not using that optimization when
      building zsmalloc it is a loadable module.
      
      flush_tlb_kernel_range is often an inline function, but out of the
      architectures that use an extern function, only powerpc exports
      it.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      796ce5a7
  15. 29 3月, 2013 1 次提交
  16. 24 2月, 2013 1 次提交
  17. 31 1月, 2013 1 次提交
  18. 30 1月, 2013 2 次提交
    • M
      staging: zsmalloc: Fix TLB coherency and build problem · 99155188
      Minchan Kim 提交于
      Recently, Matt Sealey reported he fail to build zsmalloc caused by
      using of local_flush_tlb_kernel_range which are architecture dependent
      function so !CONFIG_SMP in ARM couldn't implement it so it ends up
      build error following as.
      
        MODPOST 216 modules
        LZMA    arch/arm/boot/compressed/piggy.lzma
        AS      arch/arm/boot/compressed/lib1funcs.o
      ERROR: "v7wbi_flush_kern_tlb_range"
      [drivers/staging/zsmalloc/zsmalloc.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      make: *** Waiting for unfinished jobs....
      
      The reason we used that function is copy method by [1]
      was really slow in ARM but at that time.
      
      More severe problem is ARM can prefetch speculatively on other CPUs
      so under us, other TLBs can have an entry only if we do flush local
      CPU. Russell King pointed that. Thanks!
      We don't have many choices except using flush_tlb_kernel_range.
      
      My experiment in ARMv7 processor 4 core didn't make any difference with
      zsmapbench[2] between local_flush_tlb_kernel_range and flush_tlb_kernel_range
      but still page-table based is much better than copy-based.
      
      * bigger is better.
      
      1. local_flush_tlb_kernel_range: 3918795 mappings
      2. flush_tlb_kernel_range : 3989538 mappings
      3. copy-based: 635158 mappings
      
      This patch replace local_flush_tlb_kernel_range with
      flush_tlb_kernel_range which are avaialbe in all architectures
      because we already have used it in vmalloc allocator which are
      generic one so build problem should go away and performane loss
      shoud be void.
      
      [1] f553646a, zsmalloc: add page table mapping method
      [2] https://github.com/spartacus06/zsmapbench
      
      Cc: stable@vger.kernel.org
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Reported-by: NMatt Sealey <matt@genesi-usa.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99155188
    • S
      staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE · d662b8eb
      Seth Jennings 提交于
      Right now ZS_SIZE_CLASS_DELTA is hardcoded to be 16.  This
      creates 254 classes for systems with 4k pages. However, on
      PPC64 with 64k pages, it creates 4095 classes which is far
      too many.
      
      This patch makes ZS_SIZE_CLASS_DELTA relative to PAGE_SIZE
      so that regardless of the page size, there will be the same
      number of classes.
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Acked-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d662b8eb
  19. 16 1月, 2013 1 次提交
  20. 14 8月, 2012 4 次提交
  21. 10 7月, 2012 4 次提交
  22. 21 6月, 2012 1 次提交
  23. 14 6月, 2012 1 次提交
  24. 12 6月, 2012 1 次提交
  25. 11 6月, 2012 1 次提交