1. 11 7月, 2017 1 次提交
  2. 14 4月, 2017 1 次提交
  3. 02 3月, 2017 1 次提交
  4. 25 2月, 2017 2 次提交
  5. 23 2月, 2017 1 次提交
  6. 02 12月, 2016 1 次提交
  7. 29 7月, 2016 8 次提交
  8. 27 7月, 2016 11 次提交
  9. 27 5月, 2016 1 次提交
  10. 21 5月, 2016 6 次提交
  11. 10 5月, 2016 1 次提交
    • S
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky 提交于
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99
  12. 18 3月, 2016 2 次提交
  13. 21 1月, 2016 1 次提交
    • J
      zsmalloc: fix migrate_zspage-zs_free race condition · c102f07c
      Junil Lee 提交于
      record_obj() in migrate_zspage() does not preserve handle's
      HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
      (accidentally) un-pins the handle, while migrate_zspage() still performs
      an explicit unpin_tag() on the that handle.  This additional explicit
      unpin_tag() introduces a race condition with zs_free(), which can pin
      that handle by this time, so the handle becomes un-pinned.
      
      Schematically, it goes like this:
      
        CPU0                                        CPU1
        migrate_zspage
          find_alloced_obj
            trypin_tag
              set HANDLE_PIN_BIT                    zs_free()
                                                      pin_tag()
        obj_malloc() -- new object, no tag
        record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
        unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
      
      The race condition may result in a NULL pointer dereference:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
        CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
        PC is at get_zspage_mapping+0x0/0x24
        LR is at obj_free.isra.22+0x64/0x128
        Call trace:
           get_zspage_mapping+0x0/0x24
           zs_free+0x88/0x114
           zram_free_page+0x64/0xcc
           zram_slot_free_notify+0x90/0x108
           swap_entry_free+0x278/0x294
           free_swap_and_cache+0x38/0x11c
           unmap_single_vma+0x480/0x5c8
           unmap_vmas+0x44/0x60
           exit_mmap+0x50/0x110
           mmput+0x58/0xe0
           do_exit+0x320/0x8dc
           do_group_exit+0x44/0xa8
           get_signal+0x538/0x580
           do_signal+0x98/0x4b8
           do_notify_resume+0x14/0x5c
      
      This patch keeps the lock bit in migration path and update value
      atomically.
      Signed-off-by: NJunil Lee <junil0814.lee@lge.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org> [4.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c102f07c
  14. 16 1月, 2016 1 次提交
  15. 07 11月, 2015 2 次提交
    • K
      zsmalloc: use page->private instead of page->first_page · 32e7ba1e
      Kirill A. Shutemov 提交于
      We are going to rework how compound_head() work. It will not use
      page->first_page as we have it now.
      
      The only other user of page->first_page beyond compound pages is
      zsmalloc.
      
      Let's use page->private instead of page->first_page here. It occupies
      the same storage space.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32e7ba1e
    • S
      zsmalloc: reduce size_class memory usage · 6fe5186f
      Sergey Senozhatsky 提交于
      Each `struct size_class' contains `struct zs_size_stat': an array of
      NR_ZS_STAT_TYPE `unsigned long'.  For zsmalloc built with no
      CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned
      long)' per-class.
      
      The patch removes unneeded `struct zs_size_stat' members by redefining
      NR_ZS_STAT_TYPE (max stat idx in array).
      
      Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants,
      GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type
      larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at
      the moment.
      
      ./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new
      add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39)
      function                                     old     new   delta
      fix_fullness_group                            97      94      -3
      insert_zspage                                100      86     -14
      remove_zspage                                141     119     -22
      
      To summarize:
      a) each class now uses less memory
      b) we avoid a number of dec/inc stats (a minor optimization,
         but still).
      
      The gain will increase once we introduce additional stats.
      
      A simple IO test.
      
      iozone -t 4 -R -r 32K -s 60M -I +Z
                              patched                 base
      "  Initial write "       4145599.06              4127509.75
      "        Rewrite "       4146225.94              4223618.50
      "           Read "      17157606.00             17211329.50
      "        Re-read "      17380428.00             17267650.50
      "   Reverse Read "      16742768.00             16162732.75
      "    Stride read "      16586245.75             16073934.25
      "    Random read "      16349587.50             15799401.75
      " Mixed workload "      10344230.62              9775551.50
      "   Random write "       4277700.62              4260019.69
      "         Pwrite "       4302049.12              4313703.88
      "          Pread "       6164463.16              6126536.72
      "         Fwrite "       7131195.00              6952586.00
      "          Fread "      12682602.25             12619207.50
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6fe5186f