- 19 2月, 2017 5 次提交
-
-
由 Brian Welty 提交于
Convert copy_sge and related SGE state functions to use boolean. For determining if QP is in user mode, add helper function in rdmavt_qp.h. This is used to determine if QP needs the last byte ordering. While here, change rvt_pd.user to a boolean. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NBrian Welty <brian.welty@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
To move common code across target to rdmavt for code reuse. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: NBrian Welty <brian.welty@intel.com> Signed-off-by: NVenkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Brian Welty 提交于
Add rvt_compute_aeth() and rvt_get_credit() as shared functions in rdmavt, moved from hfi1/qib logic. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NBrian Welty <brian.welty@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Brian Welty 提交于
Add rvt_rc_error() and rvt_comm_est() as shared functions in rdmavt, moved from hfi1/qib logic. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NBrian Welty <brian.welty@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sebastian Sanchez 提交于
Having per-CPU reference count for each MR prevents cache-line bouncing across the system. Thus, it prevents bottlenecks. Use per-CPU reference counts per MR. The per-CPU reference count for FMRs is used in atomic mode to allow accurate testing of the busy state. Other MR types run in per-CPU mode MR until they're freed. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 14 2月, 2017 1 次提交
-
-
由 Selvin Xavier 提交于
This patch introduces the RoCE driver for the Broadcom NetXtreme-E 10/25/40/50G RoCE HCAs. The RoCE driver is a two part driver that relies on the parent bnxt_en NIC driver to operate. The changes needed in the bnxt_en driver have already been incorporated via Dave Miller's net tree into the mainline kernel. The vendor official git repository for this driver is available on github as: https://github.com/Broadcom/linux-rdma-nxt/Signed-off-by: NEddie Wai <eddie.wai@broadcom.com> Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 09 2月, 2017 1 次提交
-
-
由 Leon Romanovsky 提交于
Remove references to private kernel header and defines from exported ib_user_verb.h file. The code snippet below is used to reproduce the issue: #include <stdio.h> #include <rdma/ib_user_verb.h> int main(void) { printf("IB_USER_VERBS_ABI_VERSION = %d\n", IB_USER_VERBS_ABI_VERSION); return 0; } It fails during compilation phase with an error: ➜ /tmp gcc main.c main.c:2:31: fatal error: rdma/ib_user_verb.h: No such file or directory #include <rdma/ib_user_verb.h> ^ compilation terminated. Fixes: 189aba99 ("IB/uverbs: Extend modify_qp and support packet pacing") CC: Bodong Wang <bodong@mellanox.com> CC: Matan Barak <matanb@mellanox.com> CC: Christoph Hellwig <hch@infradead.org> Tested-by: NSlava Shwartsman <slavash@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 28 1月, 2017 2 次提交
-
-
由 Yuval Shaia 提交于
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Port and ACL information must be configured before an initiator logs in. Make it possible to configure this information before a subnet prefix has been assigned to a port by not only accepting GIDs as target port and initiator port names but by also accepting port GUIDs. Add a 'priv' member to struct se_wwn to allow target drivers to associate their own data with struct se_wwn. Reported-by: NDoug Ledford <dledford@redhat.com> References: http://www.spinics.net/lists/linux-rdma/msg39505.htmlSigned-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 25 1月, 2017 4 次提交
-
-
由 Jack Wang 提交于
As Jason suggested, we have 4 elements for per port arrays, it's better to have a separate structure to represent them. It simplifies code a bit, ~ 30 lines of code less :) Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com> Reviewed-by: NMichael Wang <yun.wang@profitbricks.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Amrani, Ram 提交于
Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Amrani, Ram 提交于
As the functionality to convert the MTU from a number to enum_ib_mtu is ubiquitous, define a dedicated function and remove the duplicated code. Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Nicolas Iooss 提交于
Use CXGB3_... instead of CXBG3_... Fixes: a85fb338 ("IB/cxgb3: Move user vendor structures") Cc: stable@vger.kernel.org # 4.9 Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Acked-by: NSteve Wise <swise@chelsio.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 13 1月, 2017 3 次提交
-
-
由 Jack Wang 提交于
Export function for rdma_cm, patch for rdma_cm to follow. Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com> Reviewed-by: NMichael Wang <yun.wang@profitbricks.com> Acked-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jack Wang 提交于
We need a port state cache in ib_core, later we will use in rdma_cm. Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com> Reviewed-by: NMichael Wang <yun.wang@profitbricks.com> Acked-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The RDMA core uses ib_pack() to convert from unpacked CPU structs to on-the-wire bitpacked structs. This process requires that 1 bit fields are declared as u8 in the unpacked struct, otherwise the packing process does not read the value properly and the packed result is wired to 0. Several places wrongly used int. Crucially this means the kernel has never, set reversible correctly in the path record request. It has always asked for irreversible paths even if the ULP requests otherwise. When the kernel is used with a SM that supports this feature, it completely breaks communication management if reversible paths are not properly requested. The only reason this ever worked is because opensm ignores the reversible bit. Cc: stable@vger.kernel.org Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 11 1月, 2017 7 次提交
-
-
由 Selvin Xavier 提交于
Update the if_ether.h with the ethertype for Infiniband over Ethernet packets. Also, removing the occurances of 0x8915 from infiniband vendor drivers. Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
MAD and HFI1 have different naming convention, this patch simplifies and unifies their defines and names. As part of cleanup, the HFI1 _NUM() macro and command indexes were removed (controversial). This will cause intentional (and arguably unnecessary) breakage to the PSM user space library. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
Rename RDMA magic number to better describe IOCTLs. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
Move HFI1 IOCTL declarations to rdma_user_ioctl.h file. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
Move hfi1 ioctl definitions to a new header which can be included by both the hfi1 and qib drivers to avoid a duplicate enum definition as shown in this build error for qib: CC [M] drivers/infiniband/hw/qib/qib_sysfs.o In file included from ./include/uapi/rdma/rdma_user_ioctl.h:39:0, from include/uapi/rdma/ib_user_mad.h:38, from include/rdma/ib_mad.h:43, from include/rdma/ib_pma.h:38, from drivers/infiniband/hw/qib/qib_mad.h:37, from drivers/infiniband/hw/qib/qib_init.c:49: ./include/uapi/rdma/hfi/hfi1_user.h:370:2: error: redeclaration of enumerator ‘ur_rcvhdrtail’ ur_rcvhdrtail = 0, Move hfi1 structures to separate file to avoid this failure. The actual move of the ioctl definitions comes in a follow on patch. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
Move legacy MAD IOCTL declarations to rdma_user_ioctl.h file. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
This patch provides one common file (rdma_user_ioctl.h) for all RDMA UAPI IOCTLs. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 08 1月, 2017 1 次提交
-
-
由 Johannes Weiner 提交于
Several people report seeing warnings about inconsistent radix tree nodes followed by crashes in the workingset code, which all looked like use-after-free access from the shadow node shrinker. Dave Jones managed to reproduce the issue with a debug patch applied, which confirmed that the radix tree shrinking indeed frees shadow nodes while they are still linked to the shadow LRU: WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200 CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3 Call Trace: delete_node+0x1e4/0x200 __radix_tree_delete_node+0xd/0x10 shadow_lru_isolate+0xe6/0x220 __list_lru_walk_one.isra.4+0x9b/0x190 list_lru_walk_one+0x23/0x30 scan_shadow_nodes+0x2e/0x40 shrink_slab.part.44+0x23d/0x5d0 shrink_node+0x22c/0x330 kswapd+0x392/0x8f0 This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the inlined radix_tree_shrink(). The problem is with 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking"), which passes an update callback into the radix tree to link and unlink shadow leaf nodes when tree entries change, but forgot to pass the callback when reclaiming a shadow node. While the reclaimed shadow node itself is unlinked by the shrinker, its deletion from the tree can cause the left-most leaf node in the tree to be shrunk. If that happens to be a shadow node as well, we don't unlink it from the LRU as we should. Consider this tree, where the s are shadow entries: root->rnode | [0 n] | | [s ] [sssss] Now the shadow node shrinker reclaims the rightmost leaf node through the shadow node LRU: root->rnode | [0 ] | [s ] Because the parent of the deleted node is the first level below the root and has only one child in the left-most slot, the intermediate level is shrunk and the node containing the single shadow is put in its place: root->rnode | [s ] The shrinker again sees a single left-most slot in a first level node and thus decides to store the shadow in root->rnode directly and free the node - which is a leaf node on the shadow node LRU. root->rnode | s Without the update callback, the freed node remains on the shadow LRU, where it causes later shrinker runs to crash. Pass the node updater callback into __radix_tree_delete_node() in case the deletion causes the left-most branch in the tree to collapse too. Also add warnings when linked nodes are freed right away, rather than wait for the use-after-free when the list is scanned much later. Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking") Reported-by: NDave Chinner <david@fromorbit.com> Reported-by: NHugh Dickins <hughd@google.com> Reported-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-and-tested-by: NDave Jones <davej@codemonkey.org.uk> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Leech <cleech@redhat.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 1月, 2017 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
So they can figure out what is the optimal number of pages that can be contingously stitched together without fear of bounce buffer. We also expose an mechanism for sub-users of SWIOTLB API, such as Xen-SWIOTLB to set the max segment value. And lastly if swiotlb=force is set (which mandates we bounce buffer everything) we set max_segment so at least we can bounce buffer one 4K page instead of a giant 512KB one for which we may not have space. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: NJuergen Gross <jgross@suse.com>
-
- 05 1月, 2017 2 次提交
-
-
由 Michal Marek 提交于
The asm-prototypes.h file is used to provide dummy function declarations for genksyms, when processing asm files with EXPORT_SYMBOL. Make sure that any architecture defines get out of our way. x86 currently has an issue with memcpy on 64bit with CONFIG_KMEMCHECK=y and with memset/__memset on 32bit: $ cat init/test.c #include <asm/asm-prototypes.h> $ make -s init/test.o In file included from ./arch/x86/include/asm/string.h:4:0, from ./include/linux/string.h:18, from ./include/linux/bitmap.h:8, from ./include/linux/cpumask.h:11, from ./arch/x86/include/asm/cpumask.h:4, from ./arch/x86/include/asm/msr.h:10, from ./arch/x86/include/asm/processor.h:20, from ./arch/x86/include/asm/cpufeature.h:4, from ./arch/x86/include/asm/thread_info.h:52, from ./include/linux/thread_info.h:25, from ./arch/x86/include/asm/preempt.h:6, from ./include/linux/preempt.h:59, from ./include/linux/spinlock.h:50, from ./include/linux/seqlock.h:35, from ./include/linux/time.h:5, from ./include/uapi/linux/timex.h:56, from ./include/linux/timex.h:56, from ./include/linux/sched.h:19, from ./include/linux/uaccess.h:4, from ./arch/x86/include/asm/asm-prototypes.h:2, from init/test.c:1: ./arch/x86/include/asm/string_64.h:52:47: error: expected declaration specifiers or ‘...’ before ‘(’ token #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len)) ./include/asm-generic/asm-prototypes.h:6:14: note: in expansion of macro ‘memcpy’ extern void *memcpy(void *, const void *, __kernel_size_t); ^ ... During real build, this manifests itself by genksyms segfaulting. Fixes: 334bb773 ("x86/kbuild: enable modversions for symbols exported from asm") Reported-and-tested-by: NBorislav Petkov <bp@alien8.de> Cc: Adam Borowski <kilobyte@angband.pl> Signed-off-by: NMichal Marek <mmarek@suse.com>
-
由 Paul Gortmaker 提交于
What appears to be a copy and paste error from the line above gets the ioctl a ssize_t return value instead of the traditional "int". The associated sample code used "long" which meant it would compile for x86-64 but not i386, with the latter failing as follows: CC [M] samples/vfio-mdev/mtty.o samples/vfio-mdev/mtty.c:1418:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types] .ioctl = mtty_ioctl, ^ samples/vfio-mdev/mtty.c:1418:20: note: (near initialization for ‘mdev_fops.ioctl’) cc1: some warnings being treated as errors Since in this case, vfio is working with struct file_operations; as such: long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); ...and so here we just standardize on long vs. the normal int that user space typically sees and documents as per "man ioctl" and similar. Fixes: 9d1a546c ("docs: Sample driver to demonstrate how to use Mediated device framework.") Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Neo Jia <cjia@nvidia.com> Cc: kvm@vger.kernel.org Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 02 1月, 2017 1 次提交
-
-
由 Vincent Pelletier 提交于
When FUNCTIONFS_EVENTFD flag is set, __ffs_data_got_descs reads a 32bits, little-endian value right after the fixed structure header, and passes it to eventfd_ctx_fdget. Document this. Also, rephrase a comment to be affirmative about the role of string descriptor at index 0. Ref: USB 2.0 spec paragraph "9.6.7 String", and also checked to still be current in USB 3.0 spec paragraph "9.6.9 String". Signed-off-by: NVincent Pelletier <plr.vincent@gmail.com> Signed-off-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
-
- 31 12月, 2016 1 次提交
-
-
由 Linus Walleij 提交于
The LIS3LV02 has a special bit that need to be set to get the read values left aligned. Before this patch we get gibberish like this: iio_generic_buffer -a -c10 -n lis3lv02dl_accel (...) 0.000000 -0.010042 -0.642688 19155832931907 0.000000 -0.010042 -0.642688 19155858751073 Which is because we read a raw value for 1g as 64 which is the nominal 1024 for 1g shifted 4 bits to the left by being right-aligned rather than left aligned. Since all other sensors are left aligned, add some code to set the special DAS (data alignment setting) bit to 1 so that the right value is now read like this: iio_generic_buffer -a -c10 -n lis3lv02dl_accel (...) 0.000000 -0.147095 -10.120135 24761614364956 -0.029419 -0.176514 -10.120135 24761631624540 The scaling was weird as well: we have a gain of 1000 for 1g and 3000 for 6g. I don't even remember how I came up with the old values but they are wrong. Fixes: 3acddf74 ("iio: st-sensors: add support for lis3lv02d accelerometer") Cc: Lorenzo Bianconi <lorenzo.bianconi@st.com> Cc: Giuseppe Barba <giuseppe.barba@st.com> Cc: Denis Ciocca <denis.ciocca@st.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NJonathan Cameron <jic23@kernel.org>
-
- 30 12月, 2016 5 次提交
-
-
由 Alex Williamson 提交于
Abstract access to mdev_device so that we can define which interfaces are public rather than relying on comments in the structure. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed-by: NJike Song <jike.song@intel.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
-
由 Alex Williamson 提交于
Rather than hoping for good behavior by marking some elements internal, enforce it by making the entire structure private and creating an accessor function for the one useful external field. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Jike Song <jike.song@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
-
由 Alex Williamson 提交于
Add an mdev_ prefix so we're not poluting the namespace so much. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Jike Song <jike.song@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
-
由 Jack Morgenstein 提交于
Demoting simple flow steering rule priority (for DPDK) was achieved by wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper function for the PF and all VFs. In function mlx4_ib_create_flow(), this change caused the main rule creation for the PF to be wrapped, while it left the associated tunnel steering rule creation unwrapped for the PF. This mismatch caused rule deletion failures in mlx4_ib_destroy_flow() for the PF when the detach wrapper function did not find the associated tunnel-steering rule (since creation of that rule for the PF did not go through the wrapper function). Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native" (so that the PF invocation does not go through the wrapper), and perform the required priority demotion for the PF in the mlx4_ib_create_flow() code path. Fixes: 48564135 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
In commit 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: NNick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 12月, 2016 1 次提交
-
-
由 Gal Pressman 提交于
This reverts commit 7f503169. Fixes: 7f503169 ("net/mlx5: Add MPCNT register infrastructure") Signed-off-by: NGal Pressman <galp@mellanox.com> Reported-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 12月, 2016 3 次提交
-
-
由 Milo Kim 提交于
Interrupt numbers are from the datasheet, so no need to keep them in the ABI. Use the number in the DT file. Signed-off-by: NMilo Kim <woogyom.kim@gmail.com> Acked-by: NRob Herring <robh@kernel.org> Signed-off-by: NTony Lindgren <tony@atomide.com>
-
由 Jason Wang 提交于
After commit 73b62bd0 ("virtio-net: remove the warning before XDP linearizing"), there's no users for bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for commit f23bc46c. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haishuang Yan 提交于
Different namespaces might have different requirements to reuse TIME-WAIT sockets for new connections. This might be required in cases where different namespace applications are in place which require TIME_WAIT socket connections to be reduced independently of the host. Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 12月, 2016 1 次提交
-
-
由 Jan Kara 提交于
Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 26 12月, 2016 1 次提交
-
-
由 Nicholas Piggin 提交于
Add a new page flag, PageWaiters, to indicate the page waitqueue has tasks waiting. This can be tested rather than testing waitqueue_active which requires another cacheline load. This bit is always set when the page has tasks on page_waitqueue(page), and is set and cleared under the waitqueue lock. It may be set when there are no tasks on the waitqueue, which will cause a harmless extra wakeup check that will clears the bit. The generic bit-waitqueue infrastructure is no longer used for pages. Instead, waitqueues are used directly with a custom key type. The generic code was not flexible enough to have PageWaiters manipulation under the waitqueue lock (which simplifies concurrency). This improves the performance of page lock intensive microbenchmarks by 2-3%. Putting two bits in the same word opens the opportunity to remove the memory barrier between clearing the lock bit and testing the waiters bit, after some work on the arch primitives (e.g., ensuring memory operand widths match and cover both bits). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-