- 21 9月, 2018 6 次提交
-
-
由 Jason Gunthorpe 提交于
This is the first step to make ODP use the owning_mm that is now part of struct ib_umem. Each ODP umem is linked to a single per_mm structure, which in turn, is linked to a single mm, via the embedded mmu_notifier. This first patch introduces the structure and reworks eveything to use it. This also needs to introduce tgid into the ib_ucontext_per_mm, as get_user_pages_remote() requires the originating task for statistics tracking. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
This no longer has any use, we can use container_of to get to the umem_odp, and a simple flag to indicate if this is an odp MR. Remove the few remaining references to it. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
These two structures are linked together, use the container_of pattern instead of a double allocation to make the code simpler and easier to follow. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
All of these functions already require the ODP version of the umem struct, make this very clear by having the signature require it. This paves the way to using the container_of() pattern to link umem_odp and umem together. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
This is just wrong, the process that calls into the reg_mr is the process associated with the umem, and that does not have to be the same process that created the context. When this code was first written mmgrab() didn't exist, however these days we can just directly hold the mm_struct pointer in the umem and have no ambiguity when it comes to releasing the umem as to which mm it was associated with. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 18 9月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
This enum has become part of the uABI, as both RXE and the ib_uverbs_post_send() command expect userspace to supply values from this enum. So it should be properly placed in include/uapi/rdma. In userspace this enum is called 'enum ibv_wr_opcode' as part of libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV, IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it turns out) into libiberbs in 2015. The kernel has changed its mind on the numbering for several of the IB_WC values over the years, but has remained stable on IB_WR_LOCAL_INV and below. Based on this we can conclude that there is no real user space user of the values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via rdma-core. This is confirmed by inspection, only rxe uses the kernel enum and implements the latter operations. rxe has clearly never worked with these attributes from userspace. Other drivers that support these opcodes implement the functionality without calling out to the kernel. To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we choose to renumber the IB_WR enum in the kernel to match the uABI that userspace has bee using since before Soft RoCE was merged. This is an overall simpler configuration for the whole software stack, and obviously can't break anything existing. Reported-by: NSeth Howell <seth.howell@intel.com> Tested-by: NSeth Howell <seth.howell@intel.com> Fixes: 8700e3e7 ("Soft RoCE driver") Cc: <stable@vger.kernel.org> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 14 9月, 2018 1 次提交
-
-
由 YueHaibing 提交于
Remove duplicated include. Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 13 9月, 2018 2 次提交
-
-
由 Parav Pandit 提交于
When resolving destination address or route, when net namespace is unavailable, refer to the net namespace of the netdevice of the SGID attribute. This is typically the case for requests arriving from the network for RoCE ports. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Now that rdma_copy_addr() only copies the source addresses and all callers are interested in copying only source addresses, simplify it to drop the destination address argument. Given that it only copies source layer2 addresses, rename it to rdma_copy_src_l2_addr for better code readability. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 11 9月, 2018 7 次提交
-
-
由 Michael J. Ruhl 提交于
The post_send() path determines if it should post directly or, schedule the post for later. The current logic is: if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold post the send else schedule This can allow large requests to call the send engine directly. Large requests can potentially produce a large number of packets prior to returning to the caller, blocking the caller from posting more requests, and allowing better parallel processing. Allow the driver(s) more say in this logic (pass call_send to the driver, rather than examining a return value). Update hfi1/qib logic to schedule the send engine if an RC or UC message is larger than the QP MTU size. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
Currently a matcher can only be created and attached to a NIC RX flow table. Extend it to allow it on NIC TX flow tables as well. In order to achieve that, we: 1) Expose a new attribute: MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS. enum ib_flow_flags is used as valid flags. Only IB_FLOW_ATTR_FLAGS_EGRESS is supported. 2) Remove the requirement to have a DEVX or QP destination when creating a flow. A flow added to NIC TX flow table will forward the packet outside of the vport (Wire or E-Switch in the SR-iOV case). Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
Support attaching flow actions to a flow rule via raw create flow. For now only NIC RX path is supported. This change requires to export flow resources management functions so we can maintain proper bookkeeping of flow actions. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
Use ib_set_flow() when initializing flow related resources. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Guy Levi 提交于
Methods sometimes need to get a flexible set of IDRs and not a strict set as can be achieved today by the conventional IDR attribute. Add a new IDRS_ARRAY attribute to the generic uverbs ioctl layer. IDRS_ARRAY points to array of idrs of the same object type and same access rights, only write and read are supported. Signed-off-by: NGuy Levi <guyle@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>`` Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Chuck Lever 提交于
Add helpful warning for RDMA consumer implementers. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Chuck Lever 提交于
Code audit suggests that the RDMA CM event handler callback function is _always_ invoked in a context that is safe to block. That's important for consumer implementers to know, so document that in the comment before rdma_create_id (where the handler function is set up by the consumer). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 07 9月, 2018 1 次提交
-
-
由 Parav Pandit 提交于
Even though device registration/unregistration and client registration/unregistration is not a performance path, define the client_data_lock as rwlock for code clarity. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 06 9月, 2018 8 次提交
-
-
由 Parav Pandit 提交于
Instead of adding/removing device attribute files, depend on device_add() which considers adding these device files based on NULL terminated attributes group array. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Leon Romanovsky 提交于
The "closing" variable is used as boolean and set to "true" in one place, update the declaration of that variable and their other assignment to proper type. Fixes: e951747a ("IB/uverbs: Rework the locking for cleaning up the ucontext") Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jack Morgenstein 提交于
The upstream kernel commit cited below modified the workqueue in the new CQ API to be bound to a specific CPU (instead of being unbound). This caused ALL users of the new CQ API to use the same bound WQ. Specifically, MAD handling was severely delayed when the CPU bound to the WQ was busy handling (higher priority) interrupts. This caused a delay in the MAD "heartbeat" response handling, which resulted in ports being incorrectly classified as "down". To fix this, add a new "unbound" WQ type to the new CQ API, so that users have the option to choose either a bound WQ or an unbound WQ. For MADs, choose the new "unbound" WQ. Fixes: b7363e67 ("IB/device: Convert ib-comp-wq to be CPU-bound") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.m> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
We expose new actions: L2_TO_L2_TUNNEL - A generic encap from L2 to L2, the data passed should be the encapsulating headers. L3_TUNNEL_TO_L2 - Will do decap where the inner packet starts from L3, the data should be mac or mac + vlan (14 or 18 bytes). L2_TO_L3_TUNNEL - Will do encap where is L2 of the original packet will not be included, the data should be the encapsulating header. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
For now, only add L2_TUNNEL_TO_L2 option. This will allow to perform generic decap operation if the encapsulating protocol is L2 based, and the inner packet is also L2 based. For example this can be used to decap VXLAN packets. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
Refactor the initialization of a flow action object to a common function. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
Expose the ability to create a flow action which changes packet headers. The data passed from userspace should be modify header actions as defined by HW specification. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mark Bloch 提交于
This makes it clear and safe to access constants passed in from user space. We define a consistent ABI of u64 for all constants, and verify that the data passed in can be represented by the type the user supplies. The expectation is this will always be used with an enum declaring the constant values, and the user will use the enum type as input to the accessor. To retrieve the attribute value we introduce two helper calls - one standard which may fail if attribute is not valid and one where caller can provide a default value which will be used in case the attribute is not valid (useful when attribute is optional). Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NAriel Levkovich <lariel@mellanox.com> Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 05 9月, 2018 7 次提交
-
-
由 Mark Bloch 提交于
This will allow for the RDMA side to allocate packet reformat context. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Mark Bloch 提交于
Expose new abilities when creating a packet reformat context. The new types which can be created are: MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic encap operation to be done by the HW. MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic decap operation where the inner packet doesn't contain L2. MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic encap operation to be done by the HW. The L2 of the original packet is dropped. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Mark Bloch 提交于
Renames all encap mlx5_{core,ib} code to use the new naming of packet reformat. This change doesn't introduce any function change and is needed to properly reflect the operation being done by this action. For example not only can we encapsulate a packet, but also decapsulate it. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Mark Bloch 提交于
Those bits are hardware specification and should be defined in the IFC header file. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Mark Bloch 提交于
Today we are able to attach encap and decap actions only to the FDB. In preparation to enable those actions on the NIC flow tables, break the single flag into two. Those flags control whatever a decap or encap operations can be attached to the flow table created. For FDB, if encapsulation is required, we set both of them. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Mark Bloch 提交于
Those functions will be used by the RDMA side to create modify header actions to be attached to flow steering rules via verbs. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Mark Bloch 提交于
Extend the ability to add steering rules to NIC TX flow tables. For now, we are only adding TX bypass (egress) which is used by the RDMA side. This will allow to shape outgoing traffic and tweak it if needed, for example performing encapsulation or rewriting headers. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 04 9月, 2018 1 次提交
-
-
由 Moni Shoua 提交于
The field atomic_mode is 4 bits wide and therefore can hold values from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values be incorrect. While that, remove unused enum values. Fixes: 57cda166 ("net/mlx5: Add DCT command interface") Signed-off-by: NMoni Shoua <monis@mellanox.com> Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 31 8月, 2018 3 次提交
-
-
由 Rob Herring 提交于
In preparation to remove direct access to device_node.type, add of_node_is_type() and of_node_get_device_type() helpers to check and retrieve the device type. Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: NRob Herring <robh@kernel.org>
-
由 Wolfram Sang 提交于
a) rename to 'put' instead of 'release' to match 'get' when obtaining the buffer b) change the argument order to have the buffer as first argument c) add a new argument telling the function if the message was transferred. This allows the function to be used also in cases where setting up DMA failed, so the buffer needs to be freed without syncing to the message buffer. Also convert the only user. Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
-
由 Rob Herring 提交于
In preparation to remove device_node.name pointer, add helper functions for node name comparisons which are a common pattern throughout the kernel. Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: NRob Herring <robh@kernel.org>
-
- 30 8月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
If someone has the silly idea to write something along those lines: extern u64 foo(void); void bar(struct arm_smccc_res *res) { arm_smccc_1_1_smc(0xbad, foo(), res); } they are in for a surprise, as this gets compiled as: 0000000000000588 <bar>: 588: a9be7bfd stp x29, x30, [sp, #-32]! 58c: 910003fd mov x29, sp 590: f9000bf3 str x19, [sp, #16] 594: aa0003f3 mov x19, x0 598: aa1e03e0 mov x0, x30 59c: 94000000 bl 0 <_mcount> 5a0: 94000000 bl 0 <foo> 5a4: aa0003e1 mov x1, x0 5a8: d4000003 smc #0x0 5ac: b4000073 cbz x19, 5b8 <bar+0x30> 5b0: a9000660 stp x0, x1, [x19] 5b4: a9010e62 stp x2, x3, [x19, #16] 5b8: f9400bf3 ldr x19, [sp, #16] 5bc: a8c27bfd ldp x29, x30, [sp], #32 5c0: d65f03c0 ret 5c4: d503201f nop The call to foo "overwrites" the x0 register for the return value, and we end up calling the wrong secure service. A solution is to evaluate all the parameters before assigning anything to specific registers, leading to the expected result: 0000000000000588 <bar>: 588: a9be7bfd stp x29, x30, [sp, #-32]! 58c: 910003fd mov x29, sp 590: f9000bf3 str x19, [sp, #16] 594: aa0003f3 mov x19, x0 598: aa1e03e0 mov x0, x30 59c: 94000000 bl 0 <_mcount> 5a0: 94000000 bl 0 <foo> 5a4: aa0003e1 mov x1, x0 5a8: d28175a0 mov x0, #0xbad 5ac: d4000003 smc #0x0 5b0: b4000073 cbz x19, 5bc <bar+0x34> 5b4: a9000660 stp x0, x1, [x19] 5b8: a9010e62 stp x2, x3, [x19, #16] 5bc: f9400bf3 ldr x19, [sp, #16] 5c0: a8c27bfd ldp x29, x30, [sp], #32 5c4: d65f03c0 ret Reported-by: NJulien Grall <julien.grall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 29 8月, 2018 2 次提交
-
-
由 Johan Hovold 提交于
Add of_get_compatible_child() helper that can be used to lookup compatible child nodes. Several drivers currently use of_find_compatible_node() to lookup child nodes while failing to notice that the of_find_ functions search the entire tree depth-first (from a given start node) and therefore can match unrelated nodes. The fact that these functions also drop a reference to the node they start searching from (e.g. the parent node) is typically also overlooked, something which can lead to use-after-free bugs. Signed-off-by: NJohan Hovold <johan@kernel.org> Signed-off-by: NRob Herring <robh@kernel.org>
-
由 Marc Zyngier 提交于
An unfortunate consequence of having a strong typing for the input values to the SMC call is that it also affects the type of the return values, limiting r0 to 32 bits and r{1,2,3} to whatever was passed as an input. Let's turn everything into "unsigned long", which satisfies the requirements of both architectures, and allows for the full range of return values. Reported-by: NJulien Grall <julien.grall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-