1. 19 7月, 2020 1 次提交
  2. 09 7月, 2020 2 次提交
    • L
      RDMA/mlx5: Set PD pointers for the error flow unwind · 0a037150
      Leon Romanovsky 提交于
      ib_pd is accessed internally during destroy of the TIR/TIS, but PD
      can be not set yet. This leading to the following kernel panic.
      
        BUG: kernel NULL pointer dereference, address: 0000000000000074
        PGD 8000000079eaa067 P4D 8000000079eaa067 PUD 7ae81067 PMD 0 Oops: 0000 [#1] SMP PTI
        CPU: 1 PID: 709 Comm: syz-executor.0 Not tainted 5.8.0-rc3 #41 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
        RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline]
        RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline]
        RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397
        Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48
        RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293
        RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998
        RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140
        RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00
        R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000
        R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000
        FS:  00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         mlx5_ib_create_qp+0xf36/0xf90 drivers/infiniband/hw/mlx5/qp.c:3014
         _ib_create_qp drivers/infiniband/core/core_priv.h:333 [inline]
         create_qp+0x57f/0xd20 drivers/infiniband/core/uverbs_cmd.c:1443
         ib_uverbs_create_qp+0xcf/0x100 drivers/infiniband/core/uverbs_cmd.c:1564
         ib_uverbs_write+0x5fa/0x780 drivers/infiniband/core/uverbs_main.c:664
         __vfs_write+0x3f/0x90 fs/read_write.c:495
         vfs_write+0xc7/0x1f0 fs/read_write.c:559
         ksys_write+0x5e/0x110 fs/read_write.c:612
         do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x466479
        Code: Bad RIP value.
        RSP: 002b:00007ffd057b62b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
        RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
        RBP: 00000000019bf8fc R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
        R13: 0000000000000bf6 R14: 00000000004cb859 R15: 00000000006fefc0
      
      Fixes: 6c41965d ("RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path")
      Link: https://lore.kernel.org/r/20200707110612.882962-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      0a037150
    • A
      IB/mlx5: Fix 50G per lane indication · 530c8632
      Aya Levin 提交于
      Some released FW versions mistakenly don't set the capability that 50G per
      lane link-modes are supported for VFs (ptys_extended_ethernet capability
      bit).
      
      Use PTYS.ext_eth_proto_capability instead, as this indication is always
      accurate. If PTYS.ext_eth_proto_capability is valid
      (has a non-zero value) conclude that the HCA supports 50G per lane.
      
      Otherwise, conclude that the HCA doesn't support 50G per lane.
      
      Fixes: 08e8676f ("IB/mlx5: Add support for 50Gbps per lane link modes")
      Link: https://lore.kernel.org/r/20200707110612.882962-3-leon@kernel.orgSigned-off-by: NAya Levin <ayal@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      530c8632
  3. 08 7月, 2020 1 次提交
  4. 03 7月, 2020 3 次提交
    • D
      IB/sa: Resolv use-after-free in ib_nl_make_request() · f427f4d6
      Divya Indi 提交于
      There is a race condition where ib_nl_make_request() inserts the request
      data into the linked list but the timer in ib_nl_request_timeout() can see
      it and destroy it before ib_nl_send_msg() is done touching it. This could
      happen, for instance, if there is a long delay allocating memory during
      nlmsg_new()
      
      This causes a use-after-free in the send_mad() thread:
      
        [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
        [ <ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
        [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
        [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
        [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma]
        [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma]
        [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
        [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
        [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
        [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
        [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
        [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
        [<ffffffff810a68fb>] kthread+0xcb/0xf0
        [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
        [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
        [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
        [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
      
      The ownership rule is once the request is on the list, ownership transfers
      to the list and the local thread can't touch it any more, just like for
      the normal MAD case in send_mad().
      
      Thus, instead of adding before send and then trying to delete after on
      errors, move the entire thing under the spinlock so that the send and
      update of the lists are atomic to the conurrent threads. Lightly reoganize
      things so spinlock safe memory allocations are done in the final NL send
      path and the rest of the setup work is done before and outside the lock.
      
      Fixes: 3ebd2fd0 ("IB/sa: Put netlink request into the request list before sending")
      Link: https://lore.kernel.org/r/1592964789-14533-1-git-send-email-divya.indi@oracle.comSigned-off-by: NDivya Indi <divya.indi@oracle.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      f427f4d6
    • K
      IB/hfi1: Do not destroy link_wq when the device is shut down · 2315ec12
      Kaike Wan 提交于
      The workqueue link_wq should only be destroyed when the hfi1 driver is
      unloaded, not when the device is shut down.
      
      Fixes: 71d47008 ("IB/hfi1: Create workqueue for link events")
      Link: https://lore.kernel.org/r/20200623204053.107638.70315.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      2315ec12
    • K
      IB/hfi1: Do not destroy hfi1_wq when the device is shut down · 28b70cd9
      Kaike Wan 提交于
      The workqueue hfi1_wq is destroyed in function shutdown_device(), which is
      called by either shutdown_one() or remove_one(). The function
      shutdown_one() is called when the kernel is rebooted while remove_one() is
      called when the hfi1 driver is unloaded. When the kernel is rebooted,
      hfi1_wq is destroyed while all qps are still active, leading to a kernel
      crash:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
        IP: [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
        PGD 0
        Oops: 0000 [#1] SMP
        Modules linked in: dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) ib_isert iscsi_target_mod target_core_mod ib_ucm mlx4_ib iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm rpcrdma sunrpc irqbypass crc32_pclmul ghash_clmulni_intel rdma_ucm aesni_intel ib_uverbs lrw gf128mul opa_vnic glue_helper ablk_helper ib_iser cryptd ib_umad rdma_cm iw_cm ses enclosure libiscsi scsi_transport_sas pcspkr joydev ib_ipoib(OE) scsi_transport_iscsi ib_cm sg ipmi_ssif mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si dm_multipath ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm hfi1(OE)
        crct10dif_pclmul crct10dif_common crc32c_intel drm ahci mlx4_core libahci rdmavt(OE) igb megaraid_sas ib_core libata drm_panel_orientation_quirks ptp pps_core devlink dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
        CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
        Hardware name: Phegda X2226A/S2600CW, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018
        task: ffff8a799ba0d140 ti: ffff8a799bad8000 task.ti: ffff8a799bad8000
        RIP: 0010:[<ffffffff94cb7b02>] [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
        RSP: 0018:ffff8a90dde43d80 EFLAGS: 00010046
        RAX: 0000000000000082 RBX: 0000000000000086 RCX: 0000000000000000
        RDX: ffff8a90b924fcb8 RSI: 0000000000000000 RDI: 000000000000001b
        RBP: ffff8a90dde43db8 R08: ffff8a799ba0d6d8 R09: ffff8a90dde53900
        R10: 0000000000000002 R11: ffff8a90dde43de8 R12: ffff8a90b924fcb8
        R13: 000000000000001b R14: 0000000000000000 R15: ffff8a90d2890000
        FS: 0000000000000000(0000) GS:ffff8a90dde40000(0000) knlGS:0000000000000000
        CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000102 CR3: 0000001a70410000 CR4: 00000000001607e0
        Call Trace:
        [<ffffffff94cb8105>] queue_work_on+0x45/0x50
        [<ffffffffc03f781e>] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
        [<ffffffffc03f78a2>] hfi1_schedule_send+0x32/0x70 [hfi1]
        [<ffffffffc02cf2d9>] rvt_rc_timeout+0xe9/0x130 [rdmavt]
        [<ffffffff94ce563a>] ? trigger_load_balance+0x6a/0x280
        [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
        [<ffffffff94ca7f58>] call_timer_fn+0x38/0x110
        [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
        [<ffffffff94caa3bd>] run_timer_softirq+0x24d/0x300
        [<ffffffff94ca0f05>] __do_softirq+0xf5/0x280
        [<ffffffff9537832c>] call_softirq+0x1c/0x30
        [<ffffffff94c2e675>] do_softirq+0x65/0xa0
        [<ffffffff94ca1285>] irq_exit+0x105/0x110
        [<ffffffff953796c8>] smp_apic_timer_interrupt+0x48/0x60
        [<ffffffff95375df2>] apic_timer_interrupt+0x162/0x170
        <EOI>
        [<ffffffff951adfb7>] ? cpuidle_enter_state+0x57/0xd0
        [<ffffffff951ae10e>] cpuidle_idle_call+0xde/0x230
        [<ffffffff94c366de>] arch_cpu_idle+0xe/0xc0
        [<ffffffff94cfc3ba>] cpu_startup_entry+0x14a/0x1e0
        [<ffffffff94c57db7>] start_secondary+0x1f7/0x270
        [<ffffffff94c000d5>] start_cpu+0x5/0x14
      
      The solution is to destroy the workqueue only when the hfi1 driver is
      unloaded, not when the device is shut down. In addition, when the device
      is shut down, no more work should be scheduled on the workqueues and the
      workqueues are flushed.
      
      Fixes: 8d3e7113 ("IB/{hfi1, qib}: Add handling of kernel restart")
      Link: https://lore.kernel.org/r/20200623204047.107638.77646.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      28b70cd9
  5. 02 7月, 2020 2 次提交
  6. 25 6月, 2020 4 次提交
    • M
      IB/hfi1: Add atomic triggered sleep/wakeup · 38fd98af
      Mike Marciniszyn 提交于
      When running iperf in a two host configuration the following trace can
      occur:
      
      [  319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out
      
      The issue happens because the current implementation relies on the netif
      txq being stopped to control the flushing of the tx list.
      
      There are two resources that the transmit logic can wait on and stop the
      txq:
      - SDMA descriptors
      - Ring space to hold completions
      
      The ring space is tested on the sending side and relieved when the ring is
      consumed in the napi tx reaping.
      
      Unfortunately, that reaping can run conncurrently with the workqueue
      flushing of the txlist.  If the txq is started just before the workitem
      executes, the txlist will never be flushed, leading to the txq being
      stuck.
      
      Fix by:
      - Adding sleep/wakeup wrappers
        * Use an atomic to control the call to the netif routines inside the
          wrappers
      
      - Use another atomic to record ring space exhaustion
        * Only wakeup when the a ring space exhaustion has happened and it
          relieved
      
      Add additional wrappers to clarify the ring space resource handling.
      
      Fixes: d99dc602 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
      Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.comReviewed-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      38fd98af
    • M
      IB/hfi1: Correct -EBUSY handling in tx code · 82172b76
      Mike Marciniszyn 提交于
      The current code mishandles -EBUSY in two ways:
      - The flow change doesn't test the return from the flush and runs on to
        process the current packet racing with the wakeup processing
      - The -EBUSY handling for a single packet inserts the tx into the txlist
        after the submit call, racing with the same wakeup processing
      
      Fix the first by dropping the skb and returning NETDEV_TX_OK.
      
      Fix the second by insuring the the list entry within the txreq is inited
      when allocated.  This enables the sleep routine to detect that the txreq
      has used the non-list api and queue the packet to the txlist.
      
      Both flaws can lead to having the flushing thread executing in causing two
      threads to manipulate the txlist.
      
      Fixes: d99dc602 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
      Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.comReviewed-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      82172b76
    • D
      IB/hfi1: Fix module use count flaw due to leftover module put calls · 822fbd37
      Dennis Dalessandro 提交于
      When the try_module_get calls were removed from opening and closing of the
      i2c debugfs file, the corresponding module_put calls were missed.  This
      results in an inaccurate module use count that requires a power cycle to
      fix.
      
      Fixes: 09fbca8e ("IB/hfi1: No need to use try_module_get for debugfs")
      Link: https://lore.kernel.org/r/20200623203230.106975.76240.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NKaike Wan <kaike.wan@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      822fbd37
    • D
      IB/hfi1: Restore kfree in dummy_netdev cleanup · b46925a2
      Dennis Dalessandro 提交于
      We need to do some rework on the dummy netdev. Calling the free_netdev()
      would normally make sense, and that will be addressed in an upcoming
      patch. For now just revert the behavior to what it was before keeping the
      unused variable removal part of the patch.
      
      The dd->dumm_netdev is mainly used for packet receiving through
      alloc_netdev_mqs() for typical net devices. A a result, it should be freed
      with kfree instead of free_netdev() that leads to a crash when unloading
      the hfi1 module:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0
        Oops: 0000 [#1] SMP PTI
        CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1
        Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
        RIP: 0010:__hw_addr_flush+0x12/0x80
        Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84
        RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
        RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298
        RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549
        R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298
        R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000
        FS:  00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         dev_addr_flush+0x15/0x30
         free_netdev+0x7e/0x130
         hfi1_netdev_free+0x59/0x70 [hfi1]
         remove_one+0x65/0x110 [hfi1]
         pci_device_remove+0x3b/0xc0
         device_release_driver_internal+0xec/0x1b0
         driver_detach+0x46/0x90
         bus_remove_driver+0x58/0xd0
         pci_unregister_driver+0x26/0xa0
         hfi1_mod_cleanup+0xc/0xd54 [hfi1]
         __x64_sys_delete_module+0x16c/0x260
         ? exit_to_usermode_loop+0xa4/0xc0
         do_syscall_64+0x5b/0x200
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 193ba031 ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()")
      Link: https://lore.kernel.org/r/20200623203224.106975.16926.stgit@awfm-01.aw.intel.comReviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      b46925a2
  7. 23 6月, 2020 3 次提交
    • S
      IB/mad: Fix use after free when destroying MAD agent · 116a1b9f
      Shay Drory 提交于
      Currently, when RMPP MADs are processed while the MAD agent is destroyed,
      it could result in use after free of rmpp_recv, as decribed below:
      
      	cpu-0						cpu-1
      	-----						-----
      ib_mad_recv_done()
       ib_mad_complete_recv()
        ib_process_rmpp_recv_wc()
      						unregister_mad_agent()
      						 ib_cancel_rmpp_recvs()
      						  cancel_delayed_work()
         process_rmpp_data()
          start_rmpp()
           queue_delayed_work(rmpp_recv->cleanup_work)
      						  destroy_rmpp_recv()
      						   free_rmpp_recv()
           cleanup_work()[1]
            spin_lock_irqsave(&rmpp_recv->agent->lock) <-- use after free
      
      [1] cleanup_work() == recv_cleanup_handler
      
      Fix it by waiting for the MAD agent reference count becoming zero before
      calling to ib_cancel_rmpp_recvs().
      
      Fixes: 9a41e38a ("IB/mad: Use IDR for agent IDs")
      Link: https://lore.kernel.org/r/20200621104738.54850-2-leon@kernel.orgSigned-off-by: NShay Drory <shayd@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      116a1b9f
    • L
      RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata · 6eefa839
      Leon Romanovsky 提交于
      Don't deref udata if it is NULL
      
        BUG: kernel NULL pointer dereference, address: 0000000000000030
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000   SMP PTI
        CPU: 2 PID: 1592 Comm: python3 Not tainted 5.7.0-rc6+ #1
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
        RIP: 0010:create_qp+0x39e/0xae0 [mlx5_ib]
        Code: c0 0d 00 00 bf 10 01 00 00 e8 be a9 e4 e0 48 85 c0 49 89 c2 0f 84 0c 07 00 00 41 8b 85 74 63 01 00 0f c8 a9 00 00 00 10 74 0a <41> 8b 46 30 0f c8 41 89 42 14 41 8b 52 18 41 0f b6 4a 1c 0f ca 89
        RSP: 0018:ffffc9000067f8b0 EFLAGS: 00010206
        RAX: 0000000010170000 RBX: ffff888441313000 RCX: 0000000000000000
        RDX: 0000000000000200 RSI: 0000000000000000 RDI: ffff88845b1d4400
        RBP: ffffc9000067fa60 R08: 0000000000000200 R09: ffff88845b1d4200
        R10: ffff88845b1d4200 R11: ffff888441313000 R12: ffffc9000067f950
        R13: ffff88846ac00140 R14: 0000000000000000 R15: ffff88846c2bc000
        FS:  00007faa1a3c0540(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000030 CR3: 0000000446dca003 CR4: 0000000000760ea0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         mlx5_ib_create_qp+0x897/0xfa0 [mlx5_ib]
         ib_create_qp+0x9e/0x300 [ib_core]
         create_qp+0x92d/0xb20 [ib_uverbs]
         ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs]
         ? release_resource+0x30/0x30
         ib_uverbs_create_qp+0xc4/0xe0 [ib_uverbs]
         ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc8/0xf0 [ib_uverbs]
         ib_uverbs_run_method+0x223/0x770 [ib_uverbs]
         ? track_pfn_remap+0xa7/0x100
         ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
         ? remap_pfn_range+0x358/0x490
         ib_uverbs_cmd_verbs.isra.6+0x19b/0x370 [ib_uverbs]
         ? rdma_umap_priv_init+0x82/0xe0 [ib_core]
         ? vm_mmap_pgoff+0xec/0x120
         ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
         ksys_ioctl+0x92/0xb0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x48/0x130
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: e383085c ("RDMA/mlx5: Set ECE options during QP create")
      Link: https://lore.kernel.org/r/20200621115959.60126-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6eefa839
    • M
      RDMA/counter: Query a counter before release · c1d869d6
      Mark Zhang 提交于
      Query a dynamically-allocated counter before release it, to update it's
      hwcounters and log all of them into history data. Otherwise all values of
      these hwcounters will be lost.
      
      Fixes: f34a55e4 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
      Link: https://lore.kernel.org/r/20200621110000.56059-1-leon@kernel.orgSigned-off-by: NMark Zhang <markz@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c1d869d6
  8. 19 6月, 2020 5 次提交
  9. 18 6月, 2020 9 次提交
    • L
      RDMA/core: Check that type_attrs is not NULL prior access · 4121fb0d
      Leon Romanovsky 提交于
      In disassociate flow, the type_attrs is set to be NULL, which is in an
      implicit way is checked in alloc_uobj() by "if (!attrs->context)".
      
      Change the logic to rely on that check, to be consistent with other
      alloc_uobj() places that will fix the following kernel splat.
      
       BUG: kernel NULL pointer dereference, address: 0000000000000018
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP PTI
       CPU: 3 PID: 2743 Comm: python3 Not tainted 5.7.0-rc6-for-upstream-perf-2020-05-23_19-04-38-5 #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       RIP: 0010:alloc_begin_fd_uobject+0x18/0xf0 [ib_uverbs]
       Code: 89 43 48 eb 97 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 f5 41 54 55 48 89 fd 53 48 83 ec 08 48 8b 1f <48> 8b 43 18 48 8b 80 80 00 00 00 48 3d 20 10 33 a0 74 1c 48 3d 30
       RSP: 0018:ffffc90001127b70 EFLAGS: 00010282
       RAX: ffffffffa0339fe0 RBX: 0000000000000000 RCX: 8000000000000007
       RDX: fffffffffffffffb RSI: ffffc90001127d28 RDI: ffff88843fe1f600
       RBP: ffff88843fe1f600 R08: ffff888461eb06d8 R09: ffff888461eb06f8
       R10: ffff888461eb0700 R11: 0000000000000000 R12: ffff88846a5f6450
       R13: ffffc90001127d28 R14: ffff88845d7d6ea0 R15: ffffc90001127cb8
       FS: 00007f469bff1540(0000) GS:ffff88846f980000(0000) knlGS:0000000000000000
       CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000018 CR3: 0000000450018003 CR4: 0000000000760ee0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
       ? xa_store+0x28/0x40
       rdma_alloc_begin_uobject+0x4f/0x90 [ib_uverbs]
       ib_uverbs_create_comp_channel+0x87/0xf0 [ib_uverbs]
       ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs]
       ib_uverbs_cmd_verbs.isra.8+0x96d/0xae0 [ib_uverbs]
       ? get_page_from_freelist+0x3bb/0xf70
       ? _copy_to_user+0x22/0x30
       ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
       ? __wake_up_common_lock+0x87/0xc0
       ib_uverbs_ioctl+0xbc/0x130 [ib_uverbs]
       ksys_ioctl+0x83/0xc0
       ? ksys_write+0x55/0xd0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x48/0x130
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f469ac43267
      
      Fixes: 849e1490 ("RDMA/core: Do not allow alloc_commit to fail")
      Link: https://lore.kernel.org/r/20200617061826.2625359-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4121fb0d
    • Y
      RDMA/hns: Fix an cmd queue issue when resetting · 3ec5f54f
      Yangyang Li 提交于
      If a IMP reset caused by some hardware errors and hns RoCE driver reset
      occurred at the same time, there is a possiblity that the IMP will stop
      dealing with command and users can't use the hardware. The logs are as
      follows:
      
       hns3 0000:fd:00.1: cleaned 0, need to clean 1
       hns3 0000:fd:00.1: firmware version query failed -11
       hns3 0000:fd:00.1: Cmd queue init failed
       hns3 0000:fd:00.1: Upgrade reset level
       hns3 0000:fd:00.1: global reset interrupt
      
      The hns NIC driver divides the reset process into 3 status:
      initialization, hardware resetting and softwaring restting. RoCE driver
      gets reset status by interfaces provided by NIC driver and commands will
      not be sent to the IMP if the driver is in any above status. The main
      reason for this issue is that there is a time gap between status 1 and 2,
      if the RoCE driver sends commands to the IMP during this gap, the IMP will
      stop working because it is not ready.
      
      To eliminate the time gap, the hns NIC driver has added a new interface in
      commit a4de0228 ("net: hns3: provide .get_cmdq_stat interface for the
      client"), so RoCE driver can ensure that no commands will be sent during
      resetting.
      
      Link: https://lore.kernel.org/r/1592314778-52822-1-git-send-email-liweihang@huawei.comSigned-off-by: NYangyang Li <liyangyang20@huawei.com>
      Signed-off-by: NWeihang Li <liweihang@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3ec5f54f
    • Y
      RDMA/hns: Fix a calltrace when registering MR from userspace · 98a61519
      Yangyang Li 提交于
      ibmr.device is assigned after MR is successfully registered, but both
      write_mtpt() and frmr_write_mtpt() accesses it during the mr registration
      process, which may cause the following error when trying to register MR in
      userspace and pbl_hop_num is set to 0.
      
        pc : hns_roce_mtr_find+0xa0/0x200 [hns_roce]
        lr : set_mtpt_pbl+0x54/0x118 [hns_roce_hw_v2]
        sp : ffff00023e73ba20
        x29: ffff00023e73ba20 x28: ffff00023e73bad8
        x27: 0000000000000000 x26: 0000000000000000
        x25: 0000000000000002 x24: 0000000000000000
        x23: ffff00023e73bad0 x22: 0000000000000000
        x21: ffff0000094d9000 x20: 0000000000000000
        x19: ffff8020a6bdb2c0 x18: 0000000000000000
        x17: 0000000000000000 x16: 0000000000000000
        x15: 0000000000000000 x14: 0000000000000000
        x13: 0140000000000000 x12: 0040000000000041
        x11: ffff000240000000 x10: 0000000000001000
        x9 : 0000000000000000 x8 : ffff802fb7558480
        x7 : ffff802fb7558480 x6 : 000000000003483d
        x5 : ffff00023e73bad0 x4 : 0000000000000002
        x3 : ffff00023e73bad8 x2 : 0000000000000000
        x1 : 0000000000000000 x0 : ffff0000094d9708
        Call trace:
         hns_roce_mtr_find+0xa0/0x200 [hns_roce]
         set_mtpt_pbl+0x54/0x118 [hns_roce_hw_v2]
         hns_roce_v2_write_mtpt+0x14c/0x168 [hns_roce_hw_v2]
         hns_roce_mr_enable+0x6c/0x148 [hns_roce]
         hns_roce_reg_user_mr+0xd8/0x130 [hns_roce]
         ib_uverbs_reg_mr+0x14c/0x2e0 [ib_uverbs]
         ib_uverbs_write+0x27c/0x3e8 [ib_uverbs]
         __vfs_write+0x60/0x190
         vfs_write+0xac/0x1c0
         ksys_write+0x6c/0xd8
         __arm64_sys_write+0x24/0x30
         el0_svc_common+0x78/0x130
         el0_svc_handler+0x38/0x78
         el0_svc+0x8/0xc
      
      Solve above issue by adding a pointer of structure hns_roce_dev as a
      parameter of write_mtpt() and frmr_write_mtpt(), so that both of these
      functions can access it before finishing MR's registration.
      
      Fixes: 9b2cf76c ("RDMA/hns: Optimize PBL buffer allocation process")
      Link: https://lore.kernel.org/r/1592314629-51715-1-git-send-email-liweihang@huawei.comSigned-off-by: NYangyang Li <liyangyang20@huawei.com>
      Signed-off-by: NWeihang Li <liweihang@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      98a61519
    • L
      RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake · ab183d46
      Leon Romanovsky 提交于
      Missed steps during ECE handshake left userspace application with less
      options for the ECE handshake. Pass ECE options in the additional
      transitions.
      
      Fixes: 50aec2c3 ("RDMA/mlx5: Return ECE data after modify QP")
      Link: https://lore.kernel.org/r/20200616104536.2426384-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ab183d46
    • M
      RDMA/cma: Protect bind_list and listen_list while finding matching cm id · 730c8912
      Mark Zhang 提交于
      The bind_list and listen_list must be accessed under a lock, add the
      missing locking around the access in cm_ib_id_from_event()
      
      In addition add lockdep asserts to make it clearer what the locking
      semantic is here.
      
        general protection fault: 0000 [#1] SMP NOPTI
        CPU: 226 PID: 126135 Comm: kworker/226:1 Tainted: G OE 4.12.14-150.47-default #1 SLE15
        Hardware name: Cray Inc. Windom/Windom, BIOS 0.8.7 01-10-2020
        Workqueue: ib_cm cm_work_handler [ib_cm]
        task: ffff9c5a60a1d2c0 task.stack: ffffc1d91f554000
        RIP: 0010:cma_ib_req_handler+0x3f1/0x11b0 [rdma_cm]
        RSP: 0018:ffffc1d91f557b40 EFLAGS: 00010286
        RAX: deacffffffffff30 RBX: 0000000000000001 RCX: ffff9c2af5bb6000
        RDX: 00000000000000a9 RSI: ffff9c5aa4ed2f10 RDI: ffffc1d91f557b08
        RBP: ffffc1d91f557d90 R08: ffff9c340cc80000 R09: ffff9c2c0f901900
        R10: 0000000000000000 R11: 0000000000000001 R12: deacffffffffff30
        R13: ffff9c5a48aeec00 R14: ffffc1d91f557c30 R15: ffff9c5c2eea3688
        FS: 0000000000000000(0000) GS:ffff9c5c2fa80000(0000) knlGS:0000000000000000
        CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00002b5cc03fa320 CR3: 0000003f8500a000 CR4: 00000000003406e0
        Call Trace:
        ? rdma_addr_cancel+0xa0/0xa0 [ib_core]
        ? cm_process_work+0x28/0x140 [ib_cm]
        cm_process_work+0x28/0x140 [ib_cm]
        ? cm_get_bth_pkey.isra.44+0x34/0xa0 [ib_cm]
        cm_work_handler+0xa06/0x1a6f [ib_cm]
        ? __switch_to_asm+0x34/0x70
        ? __switch_to_asm+0x34/0x70
        ? __switch_to_asm+0x40/0x70
        ? __switch_to_asm+0x34/0x70
        ? __switch_to_asm+0x40/0x70
        ? __switch_to_asm+0x34/0x70
        ? __switch_to_asm+0x40/0x70
        ? __switch_to+0x7c/0x4b0
        ? __switch_to_asm+0x40/0x70
        ? __switch_to_asm+0x34/0x70
        process_one_work+0x1da/0x400
        worker_thread+0x2b/0x3f0
        ? process_one_work+0x400/0x400
        kthread+0x118/0x140
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x22/0x40
        Code: 00 66 83 f8 02 0f 84 ca 05 00 00 49 8b 84 24 d0 01 00 00 48 85 c0 0f 84 68 07 00 00 48 2d d0 01
        00 00 49 89 c4 0f 84 59 07 00 00 <41> 0f b7 44 24 20 49 8b 77 50 66 83 f8 0a 75 9e 49 8b 7c 24 28
      
      Fixes: 4c21b5bc ("IB/cma: Add net_dev and private data checks to RDMA CM")
      Link: https://lore.kernel.org/r/20200616104304.2426081-1-leon@kernel.orgSigned-off-by: NMark Zhang <markz@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      730c8912
    • M
      RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532 · 0dfbd5ec
      Michal Kalderon 提交于
      Private data passed to iwarp_cm_handler is copied for connection request /
      response, but ignored otherwise.  If junk is passed, it is stored in the
      event and used later in the event processing.
      
      The driver passes an old junk pointer during connection close which leads
      to a use-after-free on event processing.  Set private data to NULL for
      events that don 't have private data.
      
        BUG: KASAN: use-after-free in ucma_event_handler+0x532/0x560 [rdma_ucm]
        kernel: Read of size 4 at addr ffff8886caa71200 by task kworker/u128:1/5250
        kernel:
        kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm]
        kernel: Call Trace:
        kernel: dump_stack+0x8c/0xc0
        kernel: print_address_description.constprop.0+0x1b/0x210
        kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm]
        kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm]
        kernel: __kasan_report.cold+0x1a/0x33
        kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm]
        kernel: kasan_report+0xe/0x20
        kernel: check_memory_region+0x130/0x1a0
        kernel: memcpy+0x20/0x50
        kernel: ucma_event_handler+0x532/0x560 [rdma_ucm]
        kernel: ? __rpc_execute+0x608/0x620 [sunrpc]
        kernel: cma_iw_handler+0x212/0x330 [rdma_cm]
        kernel: ? iw_conn_req_handler+0x6e0/0x6e0 [rdma_cm]
        kernel: ? enqueue_timer+0x86/0x140
        kernel: ? _raw_write_lock_irq+0xd0/0xd0
        kernel: cm_work_handler+0xd3d/0x1070 [iw_cm]
      
      Fixes: e411e058 ("RDMA/qedr: Add iWARP connection management functions")
      Link: https://lore.kernel.org/r/20200616093408.17827-1-michal.kalderon@marvell.comSigned-off-by: NAriel Elior <ariel.elior@marvell.com>
      Signed-off-by: NMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0dfbd5ec
    • G
      RDMA/efa: Set maximum pkeys device attribute · 0133654d
      Gal Pressman 提交于
      The max_pkeys device attribute was not set in query device verb, set it to
      one in order to account for the default pkey (0xffff). This information is
      exposed to userspace and can cause malfunction
      
      Fixes: 40909f66 ("RDMA/efa: Add EFA verbs implementation")
      Link: https://lore.kernel.org/r/20200614103534.88060-1-galpress@amazon.comReviewed-by: NFiras JahJah <firasj@amazon.com>
      Reviewed-by: NYossi Leybovich <sleybo@amazon.com>
      Signed-off-by: NGal Pressman <galpress@amazon.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0133654d
    • A
      RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq · 90a239ee
      Aditya Pakki 提交于
      In case of failure of alloc_ud_wq_attr(), the memory allocated by
      rvt_alloc_rq() is not freed. Fix it by calling rvt_free_rq() using the
      existing clean-up code.
      
      Fixes: d310c4bf ("IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs")
      Link: https://lore.kernel.org/r/20200614041148.131983-1-pakki001@umn.eduSigned-off-by: NAditya Pakki <pakki001@umn.edu>
      Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      90a239ee
    • L
      RDMA/core: Annotate CMA unlock helper routine · 1ea7c546
      Leon Romanovsky 提交于
      Fix the following sparse error by adding annotation to
      cm_queue_work_unlock() that it releases cm_id_priv->lock lock.
      
       drivers/infiniband/core/cm.c:936:24: warning: context imbalance in
       'cm_queue_work_unlock' - unexpected unlock
      
      Fixes: e83f195a ("RDMA/cm: Pull duplicated code into cm_queue_work_unlock()")
      Link: https://lore.kernel.org/r/20200611130045.1994026-1-leon@kernel.orgReported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1ea7c546
  10. 16 6月, 2020 3 次提交
  11. 15 6月, 2020 1 次提交
  12. 14 6月, 2020 1 次提交
    • M
      treewide: replace '---help---' in Kconfig files with 'help' · a7f7f624
      Masahiro Yamada 提交于
      Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over
      '---help---'"), the number of '---help---' has been gradually
      decreasing, but there are still more than 2400 instances.
      
      This commit finishes the conversion. While I touched the lines,
      I also fixed the indentation.
      
      There are a variety of indentation styles found.
      
        a) 4 spaces + '---help---'
        b) 7 spaces + '---help---'
        c) 8 spaces + '---help---'
        d) 1 space + 1 tab + '---help---'
        e) 1 tab + '---help---'    (correct indentation)
        f) 1 tab + 1 space + '---help---'
        g) 1 tab + 2 spaces + '---help---'
      
      In order to convert all of them to 1 tab + 'help', I ran the
      following commend:
      
        $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      a7f7f624
  13. 10 6月, 2020 5 次提交
    • M
      mmap locking API: convert mmap_sem comments · c1e8d7c6
      Michel Lespinasse 提交于
      Convert comments that reference mmap_sem to reference mmap_lock instead.
      
      [akpm@linux-foundation.org: fix up linux-next leftovers]
      [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
      [akpm@linux-foundation.org: more linux-next fixups, per Michel]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Liam Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Han <yinghan@google.com>
      Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1e8d7c6
    • M
      mmap locking API: use coccinelle to convert mmap_sem rwsem call sites · d8ed45c5
      Michel Lespinasse 提交于
      This change converts the existing mmap_sem rwsem calls to use the new mmap
      locking API instead.
      
      The change is generated using coccinelle with the following rule:
      
      // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .
      
      @@
      expression mm;
      @@
      (
      -init_rwsem
      +mmap_init_lock
      |
      -down_write
      +mmap_write_lock
      |
      -down_write_killable
      +mmap_write_lock_killable
      |
      -down_write_trylock
      +mmap_write_trylock
      |
      -up_write
      +mmap_write_unlock
      |
      -downgrade_write
      +mmap_write_downgrade
      |
      -down_read
      +mmap_read_lock
      |
      -down_read_killable
      +mmap_read_lock_killable
      |
      -down_read_trylock
      +mmap_read_trylock
      |
      -up_read
      +mmap_read_unlock
      )
      -(&mm->mmap_sem)
      +(mm)
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Liam Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Han <yinghan@google.com>
      Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8ed45c5
    • M
      mm: reorder includes after introduction of linux/pgtable.h · 65fddcfc
      Mike Rapoport 提交于
      The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
      of the latter in the middle of asm includes.  Fix this up with the aid of
      the below script and manual adjustments here and there.
      
      	import sys
      	import re
      
      	if len(sys.argv) is not 3:
      	    print "USAGE: %s <file> <header>" % (sys.argv[0])
      	    sys.exit(1)
      
      	hdr_to_move="#include <linux/%s>" % sys.argv[2]
      	moved = False
      	in_hdrs = False
      
      	with open(sys.argv[1], "r") as f:
      	    lines = f.readlines()
      	    for _line in lines:
      		line = _line.rstrip('
      ')
      		if line == hdr_to_move:
      		    continue
      		if line.startswith("#include <linux/"):
      		    in_hdrs = True
      		elif not moved and in_hdrs:
      		    moved = True
      		    print hdr_to_move
      		print line
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65fddcfc
    • M
      mm: introduce include/linux/pgtable.h · ca5999fd
      Mike Rapoport 提交于
      The include/linux/pgtable.h is going to be the home of generic page table
      manipulation functions.
      
      Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
      make the latter include asm/pgtable.h.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca5999fd
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4