1. 27 1月, 2021 2 次提交
  2. 06 1月, 2021 1 次提交
  3. 08 12月, 2020 3 次提交
  4. 02 12月, 2020 1 次提交
  5. 18 11月, 2020 2 次提交
  6. 04 9月, 2020 1 次提交
  7. 30 6月, 2020 1 次提交
  8. 13 5月, 2020 1 次提交
  9. 23 12月, 2019 1 次提交
    • Q
      iommu/iova: Silence warnings under memory pressure · 944c9175
      Qian Cai 提交于
      When running heavy memory pressure workloads, this 5+ old system is
      throwing endless warnings below because disk IO is too slow to recover
      from swapping. Since the volume from alloc_iova_fast() could be large,
      once it calls printk(), it will trigger disk IO (writing to the log
      files) and pending softirqs which could cause an infinite loop and make
      no progress for days by the ongoimng memory reclaim. This is the counter
      part for Intel where the AMD part has already been merged. See the
      commit 3d708895 ("iommu/amd: Silence warnings under memory
      pressure"). Since the allocation failure will be reported in
      intel_alloc_iova(), so just call dev_err_once() there because even the
      "ratelimited" is too much, and silence the one in alloc_iova_mem() to
      avoid the expensive warn_alloc().
      
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       slab_out_of_memory: 66 callbacks suppressed
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
         cache: iommu_iova, object size: 40, buffer size: 448, default order:
      0, min order: 0
         node 0: slabs: 1822, objs: 16398, free: 0
         node 1: slabs: 2051, objs: 18459, free: 31
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
         cache: iommu_iova, object size: 40, buffer size: 448, default order:
      0, min order: 0
         node 0: slabs: 1822, objs: 16398, free: 0
         node 1: slabs: 2051, objs: 18459, free: 31
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
         cache: iommu_iova, object size: 40, buffer size: 448, default order:
      0, min order: 0
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         node 0: slabs: 697, objs: 4182, free: 0
         node 0: slabs: 697, objs: 4182, free: 0
         node 0: slabs: 697, objs: 4182, free: 0
         node 0: slabs: 697, objs: 4182, free: 0
         node 1: slabs: 381, objs: 2286, free: 27
         node 1: slabs: 381, objs: 2286, free: 27
         node 1: slabs: 381, objs: 2286, free: 27
         node 1: slabs: 381, objs: 2286, free: 27
         node 0: slabs: 1822, objs: 16398, free: 0
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         node 1: slabs: 2051, objs: 18459, free: 31
         node 0: slabs: 697, objs: 4182, free: 0
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
         node 1: slabs: 381, objs: 2286, free: 27
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         node 0: slabs: 697, objs: 4182, free: 0
         node 1: slabs: 381, objs: 2286, free: 27
       hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
       warn_alloc: 96 callbacks suppressed
       kworker/11:1H: page allocation failure: order:0,
      mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
       CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G    B
       Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
      12/27/2015
       Workqueue: kblockd blk_mq_run_work_fn
       Call Trace:
        dump_stack+0xa0/0xea
        warn_alloc.cold.94+0x8a/0x12d
        __alloc_pages_slowpath+0x1750/0x1870
        __alloc_pages_nodemask+0x58a/0x710
        alloc_pages_current+0x9c/0x110
        alloc_slab_page+0xc9/0x760
        allocate_slab+0x48f/0x5d0
        new_slab+0x46/0x70
        ___slab_alloc+0x4ab/0x7b0
        __slab_alloc+0x43/0x70
        kmem_cache_alloc+0x2dd/0x450
       SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
        alloc_iova+0x33/0x210
         cache: skbuff_head_cache, object size: 208, buffer size: 640, default
      order: 0, min order: 0
         node 0: slabs: 697, objs: 4182, free: 0
        alloc_iova_fast+0x62/0x3d1
         node 1: slabs: 381, objs: 2286, free: 27
        intel_alloc_iova+0xce/0xe0
        intel_map_sg+0xed/0x410
        scsi_dma_map+0xd7/0x160
        scsi_queue_rq+0xbf7/0x1310
        blk_mq_dispatch_rq_list+0x4d9/0xbc0
        blk_mq_sched_dispatch_requests+0x24a/0x300
        __blk_mq_run_hw_queue+0x156/0x230
        blk_mq_run_work_fn+0x3b/0x40
        process_one_work+0x579/0xb90
        worker_thread+0x63/0x5b0
        kthread+0x1e6/0x210
        ret_from_fork+0x3a/0x50
       Mem-Info:
       active_anon:2422723 inactive_anon:361971 isolated_anon:34403
        active_file:2285 inactive_file:1838 isolated_file:0
        unevictable:0 dirty:1 writeback:5 unstable:0
        slab_reclaimable:13972 slab_unreclaimable:453879
        mapped:2380 shmem:154 pagetables:6948 bounce:0
        free:19133 free_pcp:7363 free_cma:0
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      944c9175
  10. 17 12月, 2019 1 次提交
    • X
      iommu/iova: Init the struct iova to fix the possible memleak · 472d26df
      Xiaotao Yin 提交于
      During ethernet(Marvell octeontx2) set ring buffer test:
      ethtool -G eth1 rx <rx ring size> tx <tx ring size>
      following kmemleak will happen sometimes:
      
      unreferenced object 0xffff000b85421340 (size 64):
        comm "ethtool", pid 867, jiffies 4295323539 (age 550.500s)
        hex dump (first 64 bytes):
          80 13 42 85 0b 00 ff ff ff ff ff ff ff ff ff ff  ..B.............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000001b204ddf>] kmem_cache_alloc+0x1b0/0x350
          [<00000000d9ef2e50>] alloc_iova+0x3c/0x168
          [<00000000ea30f99d>] alloc_iova_fast+0x7c/0x2d8
          [<00000000b8bb2f1f>] iommu_dma_alloc_iova.isra.0+0x12c/0x138
          [<000000002f1a43b5>] __iommu_dma_map+0x8c/0xf8
          [<00000000ecde7899>] iommu_dma_map_page+0x98/0xf8
          [<0000000082004e59>] otx2_alloc_rbuf+0xf4/0x158
          [<000000002b107f6b>] otx2_rq_aura_pool_init+0x110/0x270
          [<00000000c3d563c7>] otx2_open+0x15c/0x734
          [<00000000a2f5f3a8>] otx2_dev_open+0x3c/0x68
          [<00000000456a98b5>] otx2_set_ringparam+0x1ac/0x1d4
          [<00000000f2fbb819>] dev_ethtool+0xb84/0x2028
          [<0000000069b67c5a>] dev_ioctl+0x248/0x3a0
          [<00000000af38663a>] sock_ioctl+0x280/0x638
          [<000000002582384c>] do_vfs_ioctl+0x8b0/0xa80
          [<000000004e1a2c02>] ksys_ioctl+0x84/0xb8
      
      The reason:
      When alloc_iova_mem() without initial with Zero, sometimes fpn_lo will
      equal to IOVA_ANCHOR by chance, so when return with -ENOMEM(iova32_full)
      from __alloc_and_insert_iova_range(), the new_iova will not be freed in
      free_iova_mem().
      
      Fixes: bb68b2fb ("iommu/iova: Add rbtree anchor node")
      Signed-off-by: NXiaotao Yin <xiaotao.yin@windriver.com>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      472d26df
  11. 30 8月, 2019 1 次提交
    • E
      iommu/iova: Avoid false sharing on fq_timer_on · 0d87308c
      Eric Dumazet 提交于
      In commit 14bd9a60 ("iommu/iova: Separate atomic variables
      to improve performance") Jinyu Qi identified that the atomic_cmpxchg()
      in queue_iova() was causing a performance loss and moved critical fields
      so that the false sharing would not impact them.
      
      However, avoiding the false sharing in the first place seems easy.
      We should attempt the atomic_cmpxchg() no more than 100 times
      per second. Adding an atomic_read() will keep the cache
      line mostly shared.
      
      This false sharing came with commit 9a005a80
      ("iommu/iova: Add flush timer").
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 9a005a80 ('iommu/iova: Add flush timer')
      Cc: Jinyu Qi <jinyuqi@huawei.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      0d87308c
  12. 22 7月, 2019 2 次提交
    • C
      iommu/iova: Remove stale cached32_node · 9eed17d3
      Chris Wilson 提交于
      Since the cached32_node is allowed to be advanced above dma_32bit_pfn
      (to provide a shortcut into the limited range), we need to be careful to
      remove the to be freed node if it is the cached32_node.
      
      [   48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110
      [   48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37
      [   48.477843]
      [   48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G     U            5.2.0+ #735
      [   48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
      [   48.478047] Workqueue: i915 __i915_gem_free_work [i915]
      [   48.478075] Call Trace:
      [   48.478111]  dump_stack+0x5b/0x90
      [   48.478137]  print_address_description+0x67/0x237
      [   48.478178]  ? __cached_rbnode_delete_update+0x68/0x110
      [   48.478212]  __kasan_report.cold.3+0x1c/0x38
      [   48.478240]  ? __cached_rbnode_delete_update+0x68/0x110
      [   48.478280]  ? __cached_rbnode_delete_update+0x68/0x110
      [   48.478308]  __cached_rbnode_delete_update+0x68/0x110
      [   48.478344]  private_free_iova+0x2b/0x60
      [   48.478378]  iova_magazine_free_pfns+0x46/0xa0
      [   48.478403]  free_iova_fast+0x277/0x340
      [   48.478443]  fq_ring_free+0x15a/0x1a0
      [   48.478473]  queue_iova+0x19c/0x1f0
      [   48.478597]  cleanup_page_dma.isra.64+0x62/0xb0 [i915]
      [   48.478712]  __gen8_ppgtt_cleanup+0x63/0x80 [i915]
      [   48.478826]  __gen8_ppgtt_cleanup+0x42/0x80 [i915]
      [   48.478940]  __gen8_ppgtt_clear+0x433/0x4b0 [i915]
      [   48.479053]  __gen8_ppgtt_clear+0x462/0x4b0 [i915]
      [   48.479081]  ? __sg_free_table+0x9e/0xf0
      [   48.479116]  ? kfree+0x7f/0x150
      [   48.479234]  i915_vma_unbind+0x1e2/0x240 [i915]
      [   48.479352]  i915_vma_destroy+0x3a/0x280 [i915]
      [   48.479465]  __i915_gem_free_objects+0xf0/0x2d0 [i915]
      [   48.479579]  __i915_gem_free_work+0x41/0xa0 [i915]
      [   48.479607]  process_one_work+0x495/0x710
      [   48.479642]  worker_thread+0x4c7/0x6f0
      [   48.479687]  ? process_one_work+0x710/0x710
      [   48.479724]  kthread+0x1b2/0x1d0
      [   48.479774]  ? kthread_create_worker_on_cpu+0xa0/0xa0
      [   48.479820]  ret_from_fork+0x1f/0x30
      [   48.479864]
      [   48.479907] Allocated by task 631:
      [   48.479944]  save_stack+0x19/0x80
      [   48.479994]  __kasan_kmalloc.constprop.6+0xc1/0xd0
      [   48.480038]  kmem_cache_alloc+0x91/0xf0
      [   48.480082]  alloc_iova+0x2b/0x1e0
      [   48.480125]  alloc_iova_fast+0x58/0x376
      [   48.480166]  intel_alloc_iova+0x90/0xc0
      [   48.480214]  intel_map_sg+0xde/0x1f0
      [   48.480343]  i915_gem_gtt_prepare_pages+0xb8/0x170 [i915]
      [   48.480465]  huge_get_pages+0x232/0x2b0 [i915]
      [   48.480590]  ____i915_gem_object_get_pages+0x40/0xb0 [i915]
      [   48.480712]  __i915_gem_object_get_pages+0x90/0xa0 [i915]
      [   48.480834]  i915_gem_object_prepare_write+0x2d6/0x330 [i915]
      [   48.480955]  create_test_object.isra.54+0x1a9/0x3e0 [i915]
      [   48.481075]  igt_shared_ctx_exec+0x365/0x3c0 [i915]
      [   48.481210]  __i915_subtests.cold.4+0x30/0x92 [i915]
      [   48.481341]  __run_selftests.cold.3+0xa9/0x119 [i915]
      [   48.481466]  i915_live_selftests+0x3c/0x70 [i915]
      [   48.481583]  i915_pci_probe+0xe7/0x220 [i915]
      [   48.481620]  pci_device_probe+0xe0/0x180
      [   48.481665]  really_probe+0x163/0x4e0
      [   48.481710]  device_driver_attach+0x85/0x90
      [   48.481750]  __driver_attach+0xa5/0x180
      [   48.481796]  bus_for_each_dev+0xda/0x130
      [   48.481831]  bus_add_driver+0x205/0x2e0
      [   48.481882]  driver_register+0xca/0x140
      [   48.481927]  do_one_initcall+0x6c/0x1af
      [   48.481970]  do_init_module+0x106/0x350
      [   48.482010]  load_module+0x3d2c/0x3ea0
      [   48.482058]  __do_sys_finit_module+0x110/0x180
      [   48.482102]  do_syscall_64+0x62/0x1f0
      [   48.482147]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   48.482190]
      [   48.482224] Freed by task 37:
      [   48.482273]  save_stack+0x19/0x80
      [   48.482318]  __kasan_slab_free+0x12e/0x180
      [   48.482363]  kmem_cache_free+0x70/0x140
      [   48.482406]  __free_iova+0x1d/0x30
      [   48.482445]  fq_ring_free+0x15a/0x1a0
      [   48.482490]  queue_iova+0x19c/0x1f0
      [   48.482624]  cleanup_page_dma.isra.64+0x62/0xb0 [i915]
      [   48.482749]  __gen8_ppgtt_cleanup+0x63/0x80 [i915]
      [   48.482873]  __gen8_ppgtt_cleanup+0x42/0x80 [i915]
      [   48.482999]  __gen8_ppgtt_clear+0x433/0x4b0 [i915]
      [   48.483123]  __gen8_ppgtt_clear+0x462/0x4b0 [i915]
      [   48.483250]  i915_vma_unbind+0x1e2/0x240 [i915]
      [   48.483378]  i915_vma_destroy+0x3a/0x280 [i915]
      [   48.483500]  __i915_gem_free_objects+0xf0/0x2d0 [i915]
      [   48.483622]  __i915_gem_free_work+0x41/0xa0 [i915]
      [   48.483659]  process_one_work+0x495/0x710
      [   48.483704]  worker_thread+0x4c7/0x6f0
      [   48.483748]  kthread+0x1b2/0x1d0
      [   48.483787]  ret_from_fork+0x1f/0x30
      [   48.483831]
      [   48.483868] The buggy address belongs to the object at ffff88870fc19000
      [   48.483868]  which belongs to the cache iommu_iova of size 40
      [   48.483920] The buggy address is located 32 bytes inside of
      [   48.483920]  40-byte region [ffff88870fc19000, ffff88870fc19028)
      [   48.483964] The buggy address belongs to the page:
      [   48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0
      [   48.484045] flags: 0x8000000000010200(slab|head)
      [   48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0
      [   48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000
      [   48.484188] page dumped because: kasan: bad access detected
      [   48.484230]
      [   48.484265] Memory state around the buggy address:
      [   48.484314]  ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   48.484361]  ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
      [   48.484451]                                ^
      [   48.484494]  ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   48.484530]  ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602
      Fixes: e60aa7b5 ("iommu/iova: Extend rbtree node caching")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: <stable@vger.kernel.org> # v4.15+
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      9eed17d3
    • D
      iommu/vt-d: Don't queue_iova() if there is no flush queue · effa4678
      Dmitry Safonov 提交于
      Intel VT-d driver was reworked to use common deferred flushing
      implementation. Previously there was one global per-cpu flush queue,
      afterwards - one per domain.
      
      Before deferring a flush, the queue should be allocated and initialized.
      
      Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
      queue. It's probably worth to init it for static or unmanaged domains
      too, but it may be arguable - I'm leaving it to iommu folks.
      
      Prevent queuing an iova flush if the domain doesn't have a queue.
      The defensive check seems to be worth to keep even if queue would be
      initialized for all kinds of domains. And is easy backportable.
      
      On 4.19.43 stable kernel it has a user-visible effect: previously for
      devices in si domain there were crashes, on sata devices:
      
       BUG: spinlock bad magic on CPU#6, swapper/0/1
        lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x45/0x115
        intel_unmap+0x107/0x113
        intel_unmap_sg+0x6b/0x76
        __ata_qc_complete+0x7f/0x103
        ata_qc_complete+0x9b/0x26a
        ata_qc_complete_multiple+0xd0/0xe3
        ahci_handle_port_interrupt+0x3ee/0x48a
        ahci_handle_port_intr+0x73/0xa9
        ahci_single_level_irq_intr+0x40/0x60
        __handle_irq_event_percpu+0x7f/0x19a
        handle_irq_event_percpu+0x32/0x72
        handle_irq_event+0x38/0x56
        handle_edge_irq+0x102/0x121
        handle_irq+0x147/0x15c
        do_IRQ+0x66/0xf2
        common_interrupt+0xf/0xf
       RIP: 0010:__do_softirq+0x8c/0x2df
      
      The same for usb devices that use ehci-pci:
       BUG: spinlock bad magic on CPU#0, swapper/0/1
        lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x77/0x145
        intel_unmap+0x107/0x113
        intel_unmap_page+0xe/0x10
        usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
        usb_hcd_unmap_urb_for_dma+0x17/0x100
        unmap_urb_for_dma+0x22/0x24
        __usb_hcd_giveback_urb+0x51/0xc3
        usb_giveback_urb_bh+0x97/0xde
        tasklet_action_common.isra.4+0x5f/0xa1
        tasklet_action+0x2d/0x30
        __do_softirq+0x138/0x2df
        irq_exit+0x7d/0x8b
        smp_apic_timer_interrupt+0x10f/0x151
        apic_timer_interrupt+0xf/0x20
        </IRQ>
       RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
      
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: <stable@vger.kernel.org> # 4.14+
      Fixes: 13cf0174 ("iommu/vt-d: Make use of iova deferred flushing")
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      effa4678
  13. 05 6月, 2019 1 次提交
  14. 22 3月, 2019 1 次提交
  15. 25 9月, 2018 1 次提交
  16. 22 11月, 2017 1 次提交
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
  17. 07 11月, 2017 1 次提交
  18. 12 10月, 2017 1 次提交
    • T
      iommu/iova: Make rcache flush optional on IOVA allocation failure · 538d5b33
      Tomasz Nowicki 提交于
      Since IOVA allocation failure is not unusual case we need to flush
      CPUs' rcache in hope we will succeed in next round.
      
      However, it is useful to decide whether we need rcache flush step because
      of two reasons:
      - Not scalability. On large system with ~100 CPUs iterating and flushing
        rcache for each CPU becomes serious bottleneck so we may want to defer it.
      - free_cpu_cached_iovas() does not care about max PFN we are interested in.
        Thus we may flush our rcaches and still get no new IOVA like in the
        commonly used scenario:
      
          if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
              iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);
      
          if (!iova)
              iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);
      
         1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
            PCI devices a SAC address
         2. alloc_iova() fails due to full 32-bit space
         3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
            throws entries away for nothing and alloc_iova() fails again
         4. Next alloc_iova_fast() call cannot take advantage of rcache since we
            have just defeated caches. In this case we pick the slowest option
            to proceed.
      
      This patch reworks flushed_rcache local flag to be additional function
      argument instead and control rcache flush step. Also, it updates all users
      to do the flush as the last chance.
      Signed-off-by: NTomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Tested-by: NNate Watterson <nwatters@codeaurora.org>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      538d5b33
  19. 02 10月, 2017 1 次提交
  20. 28 9月, 2017 3 次提交
    • R
      iommu/iova: Try harder to allocate from rcache magazine · e8b19840
      Robin Murphy 提交于
      When devices with different DMA masks are using the same domain, or for
      PCI devices where we usually try a speculative 32-bit allocation first,
      there is a fair possibility that the top PFN of the rcache stack at any
      given time may be unsuitable for the lower limit, prompting a fallback
      to allocating anew from the rbtree. Consequently, we may end up
      artifically increasing pressure on the 32-bit IOVA space as unused IOVAs
      accumulate lower down in the rcache stacks, while callers with 32-bit
      masks also impose unnecessary rbtree overhead.
      
      In such cases, let's try a bit harder to satisfy the allocation locally
      first - scanning the whole stack should still be relatively inexpensive.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      e8b19840
    • R
      iommu/iova: Make rcache limit_pfn handling more robust · b826ee9a
      Robin Murphy 提交于
      When popping a pfn from an rcache, we are currently checking it directly
      against limit_pfn for viability. Since this represents iova->pfn_lo, it
      is technically possible for the corresponding iova->pfn_hi to be greater
      than limit_pfn. Although we generally get away with it in practice since
      limit_pfn is typically a power-of-two boundary and the IOVAs are
      size-aligned, it's pretty trivial to make the iova_rcache_get() path
      take the allocation size into account for complete safety.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      b826ee9a
    • R
      iommu/iova: Simplify domain destruction · 7595dc58
      Robin Murphy 提交于
      All put_iova_domain() should have to worry about is freeing memory - by
      that point the domain must no longer be live, so the act of cleaning up
      doesn't need to be concurrency-safe or maintain the rbtree in a
      self-consistent state. There's no need to waste time with locking or
      emptying the rcache magazines, and we can just use the postorder
      traversal helper to clear out the remaining rbtree entries in-place.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      7595dc58
  21. 27 9月, 2017 6 次提交
  22. 16 8月, 2017 5 次提交
    • J
      iommu/iova: Add flush timer · 9a005a80
      Joerg Roedel 提交于
      Add a timer to flush entries from the Flush-Queues every
      10ms. This makes sure that no stale TLB entries remain for
      too long after an IOVA has been unmapped.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      9a005a80
    • J
      iommu/iova: Add locking to Flush-Queues · 8109c2a2
      Joerg Roedel 提交于
      The lock is taken from the same CPU most of the time. But
      having it allows to flush the queue also from another CPU if
      necessary.
      
      This will be used by a timer to regularily flush any pending
      IOVAs from the Flush-Queues.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      8109c2a2
    • J
      iommu/iova: Add flush counters to Flush-Queue implementation · fb418dab
      Joerg Roedel 提交于
      There are two counters:
      
      	* fq_flush_start_cnt  - Increased when a TLB flush
      	                        is started.
      
      	* fq_flush_finish_cnt - Increased when a TLB flush
      				is finished.
      
      The fq_flush_start_cnt is assigned to every Flush-Queue
      entry on its creation. When freeing entries from the
      Flush-Queue, the value in the entry is compared to the
      fq_flush_finish_cnt. The entry can only be freed when its
      value is less than the value of fq_flush_finish_cnt.
      
      The reason for these counters it to take advantage of IOMMU
      TLB flushes that happened on other CPUs. These already
      flushed the TLB for Flush-Queue entries on other CPUs so
      that they can already be freed without flushing the TLB
      again.
      
      This makes it less likely that the Flush-Queue is full and
      saves IOMMU TLB flushes.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      fb418dab
    • J
      iommu/iova: Implement Flush-Queue ring buffer · 19282101
      Joerg Roedel 提交于
      Add a function to add entries to the Flush-Queue ring
      buffer. If the buffer is full, call the flush-callback and
      free the entries.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      19282101
    • J
      iommu/iova: Add flush-queue data structures · 42f87e71
      Joerg Roedel 提交于
      This patch adds the basic data-structures to implement
      flush-queues in the generic IOVA code. It also adds the
      initialization and destroy routines for these data
      structures.
      
      The initialization routine is designed so that the use of
      this feature is optional for the users of IOVA code.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      42f87e71
  23. 28 6月, 2017 1 次提交
    • S
      iommu/iova: Don't disable preempt around this_cpu_ptr() · aaffaa8a
      Sebastian Andrzej Siewior 提交于
      Commit 583248e6 ("iommu/iova: Disable preemption around use of
      this_cpu_ptr()") disables preemption while accessing a per-CPU variable.
      This does keep lockdep quiet. However I don't see the point why it is
      bad if we get migrated after its access to another CPU.
      __iova_rcache_insert() and __iova_rcache_get() immediately locks the
      variable after obtaining it - before accessing its members.
      _If_ we get migrated away after retrieving the address of cpu_rcache
      before taking the lock then the *other* task on the same CPU will
      retrieve the same address of cpu_rcache and will spin on the lock.
      
      alloc_iova_fast() disables preemption while invoking
      free_cpu_cached_iovas() on each CPU. The function itself uses
      per_cpu_ptr() which does not trigger a warning (like this_cpu_ptr()
      does). It _could_ make sense to use get_online_cpus() instead but the we
      have a hotplug notifier for CPU down (and none for up) so we are good.
      
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: iommu@lists.linux-foundation.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      aaffaa8a
  24. 17 5月, 2017 1 次提交
    • R
      iommu/iova: Sort out rbtree limit_pfn handling · 757c370f
      Robin Murphy 提交于
      When walking the rbtree, the fact that iovad->start_pfn and limit_pfn
      are both inclusive limits creates an ambiguity once limit_pfn reaches
      the bottom of the address space and they overlap. Commit 5016bdb7
      ("iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range") fixed
      the worst side-effect of this, that of underflow wraparound leading to
      bogus allocations, but the remaining fallout is that any attempt to
      allocate start_pfn itself erroneously fails.
      
      The cleanest way to resolve the ambiguity is to simply make limit_pfn an
      exclusive limit when inside the guts of the rbtree. Since we're working
      with PFNs, representing one past the top of the address space is always
      possible without fear of overflow, and elsewhere it just makes life a
      little more straightforward.
      Reported-by: NAaron Sierra <asierra@xes-inc.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      757c370f