1. 25 2月, 2017 2 次提交
  2. 23 2月, 2017 1 次提交
  3. 02 12月, 2016 1 次提交
  4. 29 7月, 2016 8 次提交
  5. 27 7月, 2016 11 次提交
  6. 27 5月, 2016 1 次提交
  7. 21 5月, 2016 6 次提交
  8. 10 5月, 2016 1 次提交
    • S
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky 提交于
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99
  9. 18 3月, 2016 2 次提交
  10. 21 1月, 2016 1 次提交
    • J
      zsmalloc: fix migrate_zspage-zs_free race condition · c102f07c
      Junil Lee 提交于
      record_obj() in migrate_zspage() does not preserve handle's
      HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
      (accidentally) un-pins the handle, while migrate_zspage() still performs
      an explicit unpin_tag() on the that handle.  This additional explicit
      unpin_tag() introduces a race condition with zs_free(), which can pin
      that handle by this time, so the handle becomes un-pinned.
      
      Schematically, it goes like this:
      
        CPU0                                        CPU1
        migrate_zspage
          find_alloced_obj
            trypin_tag
              set HANDLE_PIN_BIT                    zs_free()
                                                      pin_tag()
        obj_malloc() -- new object, no tag
        record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
        unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
      
      The race condition may result in a NULL pointer dereference:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
        CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
        PC is at get_zspage_mapping+0x0/0x24
        LR is at obj_free.isra.22+0x64/0x128
        Call trace:
           get_zspage_mapping+0x0/0x24
           zs_free+0x88/0x114
           zram_free_page+0x64/0xcc
           zram_slot_free_notify+0x90/0x108
           swap_entry_free+0x278/0x294
           free_swap_and_cache+0x38/0x11c
           unmap_single_vma+0x480/0x5c8
           unmap_vmas+0x44/0x60
           exit_mmap+0x50/0x110
           mmput+0x58/0xe0
           do_exit+0x320/0x8dc
           do_group_exit+0x44/0xa8
           get_signal+0x538/0x580
           do_signal+0x98/0x4b8
           do_notify_resume+0x14/0x5c
      
      This patch keeps the lock bit in migration path and update value
      atomically.
      Signed-off-by: NJunil Lee <junil0814.lee@lge.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org> [4.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c102f07c
  11. 16 1月, 2016 1 次提交
  12. 07 11月, 2015 5 次提交