1. 03 8月, 2017 1 次提交
    • M
      zram: do not free pool->size_class · 3189c820
      Minchan Kim 提交于
      Mike reported kernel goes oops with ltp:zram03 testcase.
      
        zram: Added device: zram0
        zram0: detected capacity change from 0 to 107374182400
        BUG: unable to handle kernel paging request at 0000306d61727a77
        IP: zs_map_object+0xb9/0x260
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
         nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
        CPU: 6 PID: 12356 Comm: swapon Tainted: G            E   4.13.0.g87b2c3fc-default #194
        Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698     , BIOS -[D6E150AUS-1.10]- 12/15/2010
        task: ffff880158d2c4c0 task.stack: ffffc90001680000
        RIP: 0010:zs_map_object+0xb9/0x260
        Call Trace:
         zram_bvec_rw.isra.26+0xe8/0x780 [zram]
         zram_rw_page+0x6e/0xa0 [zram]
         bdev_read_page+0x81/0xb0
         do_mpage_readpage+0x51a/0x710
         mpage_readpages+0x122/0x1a0
         blkdev_readpages+0x1d/0x20
         __do_page_cache_readahead+0x1b2/0x270
         ondemand_readahead+0x180/0x2c0
         page_cache_sync_readahead+0x31/0x50
         generic_file_read_iter+0x7e7/0xaf0
         blkdev_read_iter+0x37/0x40
         __vfs_read+0xce/0x140
         vfs_read+0x9e/0x150
         SyS_read+0x46/0xa0
         entry_SYSCALL_64_fastpath+0x1a/0xa5
        Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 <8b> 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
        RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
        CR2: 0000306d61727a77
      
      He bisected the problem is [1].
      
      After commit cf8e0fed ("mm/zsmalloc: simplify zs_max_alloc_size
      handling"), zram doesn't use double pointer for pool->size_class any
      more in zs_create_pool so counter function zs_destroy_pool don't need to
      free it, either.
      
      Otherwise, it does kfree wrong address and then, kernel goes Oops.
      
      Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
      Fixes: cf8e0fed ("mm/zsmalloc: simplify zs_max_alloc_size handling")
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: NMike Galbraith <efault@gmx.de>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3189c820
  2. 11 7月, 2017 2 次提交
  3. 14 4月, 2017 1 次提交
  4. 02 3月, 2017 1 次提交
  5. 25 2月, 2017 2 次提交
  6. 23 2月, 2017 1 次提交
  7. 02 12月, 2016 1 次提交
  8. 29 7月, 2016 8 次提交
  9. 27 7月, 2016 11 次提交
  10. 27 5月, 2016 1 次提交
  11. 21 5月, 2016 6 次提交
  12. 10 5月, 2016 1 次提交
    • S
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky 提交于
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99
  13. 18 3月, 2016 2 次提交
  14. 21 1月, 2016 1 次提交
    • J
      zsmalloc: fix migrate_zspage-zs_free race condition · c102f07c
      Junil Lee 提交于
      record_obj() in migrate_zspage() does not preserve handle's
      HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
      (accidentally) un-pins the handle, while migrate_zspage() still performs
      an explicit unpin_tag() on the that handle.  This additional explicit
      unpin_tag() introduces a race condition with zs_free(), which can pin
      that handle by this time, so the handle becomes un-pinned.
      
      Schematically, it goes like this:
      
        CPU0                                        CPU1
        migrate_zspage
          find_alloced_obj
            trypin_tag
              set HANDLE_PIN_BIT                    zs_free()
                                                      pin_tag()
        obj_malloc() -- new object, no tag
        record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
        unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
      
      The race condition may result in a NULL pointer dereference:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
        CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
        PC is at get_zspage_mapping+0x0/0x24
        LR is at obj_free.isra.22+0x64/0x128
        Call trace:
           get_zspage_mapping+0x0/0x24
           zs_free+0x88/0x114
           zram_free_page+0x64/0xcc
           zram_slot_free_notify+0x90/0x108
           swap_entry_free+0x278/0x294
           free_swap_and_cache+0x38/0x11c
           unmap_single_vma+0x480/0x5c8
           unmap_vmas+0x44/0x60
           exit_mmap+0x50/0x110
           mmput+0x58/0xe0
           do_exit+0x320/0x8dc
           do_group_exit+0x44/0xa8
           get_signal+0x538/0x580
           do_signal+0x98/0x4b8
           do_notify_resume+0x14/0x5c
      
      This patch keeps the lock bit in migration path and update value
      atomically.
      Signed-off-by: NJunil Lee <junil0814.lee@lge.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org> [4.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c102f07c
  15. 16 1月, 2016 1 次提交