提交 · 0128fceaf934dbfca4537d4eb8c3a5f7e84562c8 · openeuler / Kernel

19 2月, 2017 5 次提交

IB/hfi1, rdmavt: Update copy_sge to use boolean arguments · 0128fcea

由 Brian Welty 提交于 2月 08, 2017

Convert copy_sge and related SGE state functions to use boolean.
For determining if QP is in user mode, add helper function in rdmavt_qp.h.
This is used to determine if QP needs the last byte ordering.
While here, change rvt_pd.user to a boolean.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0128fcea

IB/rdmavt: Adding timer logic to rdmavt · 11a10d4b

由 Venkata Sandeep Dhanalakota 提交于 2月 08, 2017

To move common code across target to rdmavt for code reuse.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NVenkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

11a10d4b

IB/hfi1, qib, rdmavt: Move AETH credit functions into rdmavt · 696513e8

由 Brian Welty 提交于 2月 08, 2017

Add rvt_compute_aeth() and rvt_get_credit() as shared functions in
rdmavt, moved from hfi1/qib logic.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

696513e8

IB/hfi1, qib, rdmavt: Move two IB event functions into rdmavt · beb5a042

由 Brian Welty 提交于 2月 08, 2017

Add rvt_rc_error() and rvt_comm_est() as shared functions in
rdmavt, moved from hfi1/qib logic.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

beb5a042

IB/rdmavt: Use per-CPU reference count for MRs · 338adfdd

由 Sebastian Sanchez 提交于 2月 08, 2017

Having per-CPU reference count for each MR prevents
cache-line bouncing across the system. Thus, it
prevents bottlenecks. Use per-CPU reference counts
per MR.

The per-CPU reference count for FMRs is used in
atomic mode to allow accurate testing of the busy
state. Other MR types run in per-CPU mode MR until
they're freed.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

338adfdd

14 2月, 2017 1 次提交

RDMA/bnxt_re: Add bnxt_re RoCE driver · 1ac5a404

由 Selvin Xavier 提交于 2月 10, 2017

This patch introduces the RoCE driver for the Broadcom
NetXtreme-E 10/25/40/50G RoCE HCAs.

The RoCE driver is a two part driver that relies on the parent
bnxt_en NIC driver to operate.  The changes needed in the bnxt_en
driver have already been incorporated via Dave Miller's net tree
into the mainline kernel.

The vendor official git repository for this driver is available
on github as:
https://github.com/Broadcom/linux-rdma-nxt/Signed-off-by: NEddie Wai <eddie.wai@broadcom.com>
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1ac5a404

09 2月, 2017 1 次提交

RDMA: Don't reference kernel private header from UAPI header · 646ebd41

由 Leon Romanovsky 提交于 2月 08, 2017

Remove references to private kernel header and defines from exported
ib_user_verb.h file.

The code snippet below is used to reproduce the issue:

 #include <stdio.h>
 #include <rdma/ib_user_verb.h>

 int main(void)
 {
	printf("IB_USER_VERBS_ABI_VERSION = %d\n", IB_USER_VERBS_ABI_VERSION);
	return 0;
 }

It fails during compilation phase with an error:
➜  /tmp gcc main.c
main.c:2:31: fatal error: rdma/ib_user_verb.h: No such file or directory
 #include <rdma/ib_user_verb.h>
                               ^
compilation terminated.

Fixes: 189aba99 ("IB/uverbs: Extend modify_qp and support packet pacing")
CC: Bodong Wang <bodong@mellanox.com>
CC: Matan Barak <matanb@mellanox.com>
CC: Christoph Hellwig <hch@infradead.org>
Tested-by: NSlava Shwartsman <slavash@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

646ebd41

28 1月, 2017 2 次提交

IB/core: Add inline function to validate port · 24dc831b

由 Yuval Shaia 提交于 1月 25, 2017

Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24dc831b

IB/srpt: Accept GUIDs as port names · 2bce1a6d

由 Bart Van Assche 提交于 12月 09, 2016

Port and ACL information must be configured before an initiator
logs in.  Make it possible to configure this information before
a subnet prefix has been assigned to a port by not only accepting
GIDs as target port and initiator port names but by also accepting
port GUIDs.

Add a 'priv' member to struct se_wwn to allow target drivers to
associate their own data with struct se_wwn.
Reported-by: NDoug Ledford <dledford@redhat.com>
References: http://www.spinics.net/lists/linux-rdma/msg39505.htmlSigned-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2bce1a6d

25 1月, 2017 4 次提交

RDMA/core: create struct ib_port_cache · 21d6454a

由 Jack Wang 提交于 1月 17, 2017

As Jason suggested, we have 4 elements for per port arrays,
it's better to have a separate structure to represent them.

It simplifies code a bit, ~ 30 lines of code less :)
Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: NMichael Wang <yun.wang@profitbricks.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

21d6454a

RDMA/qedr: Add uapi header qedr-abi.h · 20f5e10e

由 Amrani, Ram 提交于 1月 24, 2017

Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

20f5e10e

RDMA/core: Add the function ib_mtu_int_to_enum · d3f4aadd

由 Amrani, Ram 提交于 12月 26, 2016

As the functionality to convert the MTU from a number to enum_ib_mtu
is ubiquitous, define a dedicated function and remove the duplicated
code.
Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d3f4aadd

IB/cxgb3: fix misspelling in header guard · b1a27eac

由 Nicolas Iooss 提交于 1月 22, 2017

Use CXGB3_... instead of CXBG3_...

Fixes: a85fb338 ("IB/cxgb3: Move user vendor structures")
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Acked-by: NSteve Wise <swise@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b1a27eac

13 1月, 2017 3 次提交

RDMA/core: export ib_get_cached_port_state · 9e2c3f1c

由 Jack Wang 提交于 1月 02, 2017

Export function for rdma_cm, patch for rdma_cm to follow.
Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: NMichael Wang <yun.wang@profitbricks.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9e2c3f1c

RDMA/core: add port state cache · aaaca121

由 Jack Wang 提交于 1月 02, 2017

We need a port state cache in ib_core, later we will use in rdma_cm.
Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: NMichael Wang <yun.wang@profitbricks.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

aaaca121

RDMA/core: Fix incorrect structure packing for booleans · 55efcfcd

由 Jason Gunthorpe 提交于 12月 22, 2016

The RDMA core uses ib_pack() to convert from unpacked CPU structs
to on-the-wire bitpacked structs.

This process requires that 1 bit fields are declared as u8 in the
unpacked struct, otherwise the packing process does not read the
value properly and the packed result is wired to 0. Several
places wrongly used int.

Crucially this means the kernel has never, set reversible
correctly in the path record request. It has always asked for
irreversible paths even if the ULP requests otherwise.

When the kernel is used with a SM that supports this feature, it
completely breaks communication management if reversible paths are
not properly requested.

The only reason this ever worked is because opensm ignores the
reversible bit.

Cc: stable@vger.kernel.org
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55efcfcd

11 1月, 2017 7 次提交

RDMA: Adding ethertype ETH_P_IBOE · 69ae5439

由 Selvin Xavier 提交于 12月 19, 2016

Update the if_ether.h with the  ethertype for Infiniband over
Ethernet packets. Also, removing the occurances of 0x8915
from infiniband vendor drivers.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

69ae5439

RDMA/core: Unify style of IOCTL commands · fa83b793

由 Leon Romanovsky 提交于 9月 04, 2016

MAD and HFI1 have different naming convention, this patch
simplifies and unifies their defines and names.

As part of cleanup, the HFI1 _NUM() macro and command indexes
were removed (controversial). This will cause intentional (and
arguably unnecessary) breakage to the PSM user space library.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fa83b793

RDMA/core: Rename RDMA magic number · 10b31e79

由 Leon Romanovsky 提交于 9月 04, 2016

Rename RDMA magic number to better describe IOCTLs.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

10b31e79

RDMA/core: Move HFI1 IOCTL declarations to common file · 8edec0b5

由 Leon Romanovsky 提交于 9月 04, 2016

Move HFI1 IOCTL declarations to rdma_user_ioctl.h file.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8edec0b5

RDMA/hfi1: Avoid redeclaration error · 38e8b671

由 Leon Romanovsky 提交于 9月 04, 2016

Move hfi1 ioctl definitions to a new header which can be included by
both the hfi1 and qib drivers to avoid a duplicate enum definition
as shown in this build error for qib:

  CC [M] drivers/infiniband/hw/qib/qib_sysfs.o
In file included from ./include/uapi/rdma/rdma_user_ioctl.h:39:0,
		 from include/uapi/rdma/ib_user_mad.h:38,
		 from include/rdma/ib_mad.h:43,
		 from include/rdma/ib_pma.h:38,
		 from drivers/infiniband/hw/qib/qib_mad.h:37,
		 from drivers/infiniband/hw/qib/qib_init.c:49:
./include/uapi/rdma/hfi/hfi1_user.h:370:2: error: redeclaration of
enumerator ‘ur_rcvhdrtail’
  ur_rcvhdrtail = 0,

Move hfi1 structures to separate file to avoid this failure.

The actual move of the ioctl definitions comes in a follow on patch.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

38e8b671

RDMA/core: Move legacy MAD IOCTL declarations to common file · 06393bc3

由 Leon Romanovsky 提交于 9月 04, 2016

Move legacy MAD IOCTL declarations to rdma_user_ioctl.h file.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

06393bc3

RDMA/core: Commonize RDMA IOCTL declarations location · 843debb8

由 Leon Romanovsky 提交于 9月 04, 2016

This patch provides one common file (rdma_user_ioctl.h)
for all RDMA UAPI IOCTLs.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

843debb8

08 1月, 2017 1 次提交

mm: workingset: fix use-after-free in shadow node shrinker · ea07b862

由 Johannes Weiner 提交于 1月 06, 2017

Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.

Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
     delete_node+0x1e4/0x200
     __radix_tree_delete_node+0xd/0x10
     shadow_lru_isolate+0xe6/0x220
     __list_lru_walk_one.isra.4+0x9b/0x190
     list_lru_walk_one+0x23/0x30
     scan_shadow_nodes+0x2e/0x40
     shrink_slab.part.44+0x23d/0x5d0
     shrink_node+0x22c/0x330
     kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().

The problem is with 14b46879 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.

While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk.  If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.

Consider this tree, where the s are shadow entries:

       root->rnode
            |
       [0       n]
        |       |
     [s    ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

       root->rnode
            |
       [0        ]
        |
    [s     ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

       root->rnode
            |
       [s        ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.

  root->rnode
       |
       s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: NDave Chinner <david@fromorbit.com>
Reported-by: NHugh Dickins <hughd@google.com>
Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
Reported-and-tested-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Leech <cleech@redhat.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea07b862

07 1月, 2017 1 次提交

swiotlb: Export swiotlb_max_segment to users · 7453c549

由 Konrad Rzeszutek Wilk 提交于 12月 20, 2016

So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.

We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: NJuergen Gross <jgross@suse.com>

7453c549

05 1月, 2017 2 次提交

asm-prototypes: Clear any CPP defines before declaring the functions · c7858bf1

由 Michal Marek 提交于 1月 03, 2017

The asm-prototypes.h file is used to provide dummy function declarations
for genksyms, when processing asm files with EXPORT_SYMBOL. Make sure
that any architecture defines get out of our way. x86 currently has an
issue with memcpy on 64bit with CONFIG_KMEMCHECK=y and with
memset/__memset on 32bit:

	$ cat init/test.c
	#include <asm/asm-prototypes.h>
	$ make -s init/test.o
	In file included from ./arch/x86/include/asm/string.h:4:0,
			 from ./include/linux/string.h:18,
			 from ./include/linux/bitmap.h:8,
			 from ./include/linux/cpumask.h:11,
			 from ./arch/x86/include/asm/cpumask.h:4,
			 from ./arch/x86/include/asm/msr.h:10,
			 from ./arch/x86/include/asm/processor.h:20,
			 from ./arch/x86/include/asm/cpufeature.h:4,
			 from ./arch/x86/include/asm/thread_info.h:52,
			 from ./include/linux/thread_info.h:25,
			 from ./arch/x86/include/asm/preempt.h:6,
			 from ./include/linux/preempt.h:59,
			 from ./include/linux/spinlock.h:50,
			 from ./include/linux/seqlock.h:35,
			 from ./include/linux/time.h:5,
			 from ./include/uapi/linux/timex.h:56,
			 from ./include/linux/timex.h:56,
			 from ./include/linux/sched.h:19,
			 from ./include/linux/uaccess.h:4,
			 from ./arch/x86/include/asm/asm-prototypes.h:2,
			 from init/test.c:1:
	./arch/x86/include/asm/string_64.h:52:47: error: expected declaration specifiers or ‘...’ before ‘(’ token
	 #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len))
	 ./include/asm-generic/asm-prototypes.h:6:14: note: in expansion of macro ‘memcpy’
	  extern void *memcpy(void *, const void *, __kernel_size_t);

						       ^
	...

During real build, this manifests itself by genksyms segfaulting.

Fixes: 334bb773 ("x86/kbuild: enable modversions for symbols exported from asm")
Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: NMichal Marek <mmarek@suse.com>

c7858bf1

vfio-mdev: fix non-standard ioctl return val causing i386 build fail · c6ef7fd4

由 Paul Gortmaker 提交于 1月 04, 2017

What appears to be a copy and paste error from the line above gets
the ioctl a ssize_t return value instead of the traditional "int".

The associated sample code used "long" which meant it would compile
for x86-64 but not i386, with the latter failing as follows:

  CC [M]  samples/vfio-mdev/mtty.o
samples/vfio-mdev/mtty.c:1418:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
  .ioctl          = mtty_ioctl,
                    ^
samples/vfio-mdev/mtty.c:1418:20: note: (near initialization for ‘mdev_fops.ioctl’)
cc1: some warnings being treated as errors

Since in this case, vfio is working with struct file_operations; as such:

    long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
    long (*compat_ioctl) (struct file *, unsigned int, unsigned long);

...and so here we just standardize on long vs. the normal int that user
space typically sees and documents as per "man ioctl" and similar.

Fixes: 9d1a546c ("docs: Sample driver to demonstrate how to use Mediated device framework.")
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

c6ef7fd4

02 1月, 2017 1 次提交

usb: gadget: f_fs: Document eventfd effect on descriptor format. · 96a420d2

由 Vincent Pelletier 提交于 12月 15, 2016

When FUNCTIONFS_EVENTFD flag is set, __ffs_data_got_descs reads a 32bits,
little-endian value right after the fixed structure header, and passes it
to eventfd_ctx_fdget. Document this.

Also, rephrase a comment to be affirmative about the role of string
descriptor at index 0. Ref: USB 2.0 spec paragraph "9.6.7 String", and
also checked to still be current in USB 3.0 spec paragraph "9.6.9 String".
Signed-off-by: NVincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: NFelipe Balbi <felipe.balbi@linux.intel.com>

96a420d2

31 12月, 2016 1 次提交

iio: accel: st_accel: fix LIS3LV02 reading and scaling · 65e4345c

由 Linus Walleij 提交于 12月 30, 2016

The LIS3LV02 has a special bit that need to be set to get the
read values left aligned. Before this patch we get gibberish
like this:

iio_generic_buffer -a -c10 -n lis3lv02dl_accel
(...)
0.000000 -0.010042 -0.642688 19155832931907
0.000000 -0.010042 -0.642688 19155858751073

Which is because we read a raw value for 1g as 64 which is
the nominal 1024 for 1g shifted 4 bits to the left by being
right-aligned rather than left aligned.

Since all other sensors are left aligned, add some code to
set the special DAS (data alignment setting) bit to 1 so that
the right value is now read like this:

iio_generic_buffer -a -c10 -n lis3lv02dl_accel
(...)
0.000000 -0.147095 -10.120135 24761614364956
-0.029419 -0.176514 -10.120135 24761631624540

The scaling was weird as well: we have a gain of 1000 for 1g
and 3000 for 6g. I don't even remember how I came up with the
old values but they are wrong.

Fixes: 3acddf74 ("iio: st-sensors: add support for lis3lv02d accelerometer")
Cc: Lorenzo Bianconi <lorenzo.bianconi@st.com>
Cc: Giuseppe Barba <giuseppe.barba@st.com>
Cc: Denis Ciocca <denis.ciocca@st.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NJonathan Cameron <jic23@kernel.org>

65e4345c

30 12月, 2016 5 次提交

vfio-mdev: Make mdev_device private and abstract interfaces · 99e3123e

由 Alex Williamson 提交于 12月 30, 2016

Abstract access to mdev_device so that we can define which interfaces
are public rather than relying on comments in the structure.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NJike Song <jike.song@intel.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>

99e3123e

vfio-mdev: Make mdev_parent private · 9372e6fe

由 Alex Williamson 提交于 12月 30, 2016

Rather than hoping for good behavior by marking some elements
internal, enforce it by making the entire structure private and
creating an accessor function for the one useful external field.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Jike Song <jike.song@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>

9372e6fe

vfio-mdev: de-polute the namespace, rename parent_device & parent_ops · 42930553

由 Alex Williamson 提交于 12月 30, 2016

Add an mdev_ prefix so we're not poluting the namespace so much.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Jike Song <jike.song@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>

42930553

net/mlx4_core: Fix raw qp flow steering rules under SRIOV · 10b1c04e

由 Jack Morgenstein 提交于 12月 29, 2016

Demoting simple flow steering rule priority (for DPDK) was achieved by
wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF
as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper
function for the PF and all VFs.

In function mlx4_ib_create_flow(), this change caused the main rule
creation for the PF to be wrapped, while it left the associated
tunnel steering rule creation unwrapped for the PF.

This mismatch caused rule deletion failures in mlx4_ib_destroy_flow()
for the PF when the detach wrapper function did not find the associated
tunnel-steering rule (since creation of that rule for the PF did not
go through the wrapper function).

Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native"
(so that the PF invocation does not go through the wrapper), and perform
the required priority demotion for the PF in the mlx4_ib_create_flow()
code path.

Fixes: 48564135 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10b1c04e

mm: optimize PageWaiters bit use for unlock_page() · b91e1302

由 Linus Torvalds 提交于 12月 27, 2016

In commit 62906027 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.

However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.

On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd.  The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.

On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result.  However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.

So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too.  And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.

This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.

The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.

So this introduces the new architecture primitive

    clear_bit_unlock_is_negative_byte();

and adds the trivial implementation for x86.  We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better.  According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.

All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test".  After this, it's down to
0.66%, so just a quarter of the cost it used to be.

(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: NNick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b91e1302

29 12月, 2016 1 次提交

Revert "net/mlx5: Add MPCNT register infrastructure" · 1efbd205

由 Gal Pressman 提交于 12月 28, 2016

This reverts commit 7f503169.

Fixes: 7f503169 ("net/mlx5: Add MPCNT register infrastructure")
Signed-off-by: NGal Pressman <galp@mellanox.com>
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1efbd205

28 12月, 2016 3 次提交

dt-bindings: mfd: Remove TPS65217 interrupts · be53e38f

由 Milo Kim 提交于 12月 09, 2016

Interrupt numbers are from the datasheet, so no need to keep them in
the ABI. Use the number in the DT file.
Signed-off-by: NMilo Kim <woogyom.kim@gmail.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NTony Lindgren <tony@atomide.com>

be53e38f

net: xdp: remove unused bfp_warn_invalid_xdp_buffer() · be267277

由 Jason Wang 提交于 12月 27, 2016

After commit 73b62bd0 ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be267277

ipv4: Namespaceify tcp_tw_reuse knob · 56ab6b93

由 Haishuang Yan 提交于 12月 25, 2016

Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56ab6b93

27 12月, 2016 1 次提交

mm: Invalidate DAX radix tree entries only if appropriate · c6dcf52c

由 Jan Kara 提交于 8月 10, 2016

Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
just delete all exceptional radix tree entries they find. For DAX this
is not desirable as we track cache dirtiness in these entries and when
they are evicted, we may not flush caches although it is necessary. This
can for example manifest when we write to the same block both via mmap
and via write(2) (to different offsets) and fsync(2) then does not
properly flush CPU caches when modification via write(2) was the last
one.

Create appropriate DAX functions to handle invalidation of DAX entries
for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
wire them up into the corresponding mm functions.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c6dcf52c

26 12月, 2016 1 次提交

mm: add PageWaiters indicating tasks are waiting for a page bit · 62906027

由 Nicholas Piggin 提交于 12月 25, 2016

Add a new page flag, PageWaiters, to indicate the page waitqueue has
tasks waiting. This can be tested rather than testing waitqueue_active
which requires another cacheline load.

This bit is always set when the page has tasks on page_waitqueue(page),
and is set and cleared under the waitqueue lock. It may be set when
there are no tasks on the waitqueue, which will cause a harmless extra
wakeup check that will clears the bit.

The generic bit-waitqueue infrastructure is no longer used for pages.
Instead, waitqueues are used directly with a custom key type. The
generic code was not flexible enough to have PageWaiters manipulation
under the waitqueue lock (which simplifies concurrency).

This improves the performance of page lock intensive microbenchmarks by
2-3%.

Putting two bits in the same word opens the opportunity to remove the
memory barrier between clearing the lock bit and testing the waiters
bit, after some work on the arch primitives (e.g., ensuring memory
operand widths match and cover both bits).
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62906027

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功