提交 · 2f62747c77e2e5a8acb720aaec9ee4860d55118f · openeuler / Kernel

11 12月, 2018 6 次提交

net/mlx5: Remove the get protocol device interface entry · 6c22a119

由 Or Gerlitz 提交于 12月 10, 2018

This isn't used anywhere across the mlx5 driver stack,
remove it.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6c22a119

net/mlx5: Support extended destination format in flow steering command · a2c6162b

由 Eli Britstein 提交于 12月 10, 2018

Update the flow steering command formatting according to the extended
destination API.
Note that the FW dictates that multi destination FTEs that involve at
least one encap must use the extended destination format, while single
destination ones must use the legacy format.
Using extended destination format requires FW support. Check for its
capabilities and return error if not supported.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a2c6162b

net/mlx5: E-Switch, Change vhca id valid bool field to bit flag · aa39c2c0

由 Eli Britstein 提交于 12月 10, 2018

Change the driver flow destination struct to use bit flags with the vhca
id valid being the 1st one. The flags field is more extendable and will
be used in downstream patch.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

aa39c2c0

net/mlx5: Introduce extended destination fields · 1b115498

由 Eli Britstein 提交于 12月 10, 2018

Extended destinations provide the ability to configure different
encapsulation properties per destination on a single FTE. This is
needed for use-cases such as remote mirroring over tunneled networks.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

1b115498

net/mlx5: Revise gre and nvgre key formats · 5886a96a

由 Oz Shlomo 提交于 12月 10, 2018

GRE RFC defines a 32 bit key field. NVGRE RFC splits the 32 bit
key field to 24 bit VSID (gre_key_h) and 8 bit flow entropy (gre_key_l).

Define the two key parsing alternatives in a union, thus enabling both
access methods.
Signed-off-by: NOz Shlomo <ozsh@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5886a96a

net/mlx5: Add monitor commands layout and event data · fd4572b3

由 Eyal Davidovich 提交于 12月 10, 2018

Will be used in downstream patch to monitor counter changes
by the HCA and report it to the driver by an event.
The driver will update its counters cached data accordingly.
Signed-off-by: NEyal Davidovich <eyald@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

fd4572b3

10 12月, 2018 1 次提交

net/mlx5: Use helper to get CQE opcode · 6254adeb

由 Tariq Toukan 提交于 12月 04, 2018

Introduce and use a helper that extracts the opcode
from a CQE (completion queue entry) structure.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6254adeb

09 12月, 2018 3 次提交

net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform data · dc9d38ce

由 Andrew Lunn 提交于 12月 08, 2018

The Marvell 6390 Ethernet switch family does not perform MDIO
turnaround correctly. Many hardware MDIO bus masters don't care about
this, but the bitbangging implementation in Linux does by default. Add
phy_ignore_ta_mask to the platform data so that the bitbangging code
can be told which devices are known to get TA wrong.

v2
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc9d38ce

net: phy: mdio-gpio: Add platform_data support for phy_mask · 04fa26ba

由 Andrew Lunn 提交于 12月 08, 2018

It is sometimes necessary to instantiate a bit-banging MDIO bus as a
platform device, without the aid of device tree.

When device tree is being used, the bus is not scanned for devices,
only those devices which are in device tree are probed. Without device
tree, by default, all addresses on the bus are scanned. This may then
find a device which is not a PHY, e.g. a switch. And the switch may
have registers containing values which look like a PHY. So during the
scan, a PHY device is wrongly created.

After the bus has been registered, a search is made for
mdio_board_info structures which indicates devices on the bus, and the
driver which should be used for them. This is typically used to
instantiate Ethernet switches from platform drivers.  However, if the
scanning of the bus has created a PHY device at the same location as
indicated into the board info for a switch, the switch device is not
created, since the address is already busy.

This can be avoided by setting the phy_mask of the mdio bus. This mask
prevents addresses on the bus being scanned.

v2
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04fa26ba

Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask" · 356ff8a9

由 David Rientjes 提交于 12月 07, 2018

This reverts commit 89c83fb5.

This should have been done as part of 2f0799a0 ("mm, thp: restore
node-local hugepage allocations").  The movement of the thp allocation
policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
intended to only set __GFP_THISNODE for mempolicies that are not
MPOL_BIND whereas the revert could set this regardless of mempolicy.

While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
and alloc_pages_vma() was racy, that has since been removed since the
revert.  What is left is the possibility to use __GFP_THISNODE in
policy_node() when it is unexpected because the special handling for
hugepages in alloc_pages_vma()  was removed as part of the consolidation.

Secondly, prior to 89c83fb5, alloc_pages_vma() implemented a somewhat
different policy for hugepage allocations, which were allocated through
alloc_hugepage_vma().  For hugepage allocations, if the allocating
process's node is in the set of allowed nodes, allocate with
__GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
__GFP_THISNODE instead).  This was changed for shmem_alloc_hugepage() to
allow fallback to other nodes in 89c83fb5 as it did for new_page() in
mm/mempolicy.c which is functionally different behavior and removes the
requirement to only allocate hugepages locally.

So this commit does a full revert of 89c83fb5 instead of the partial
revert that was done in 2f0799a0.  The result is the same thp
allocation policy for 4.20 that was in 4.19.

Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
Fixes: 2f0799a0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

356ff8a9

08 12月, 2018 1 次提交

bridge: Add br_fdb_clear_offload() · 43920edf

由 Petr Machata 提交于 12月 07, 2018

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that unsets the
offload flag on FDB entries on a given bridge port and VLAN.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43920edf

07 12月, 2018 4 次提交

net/mlx5: Expose packet based credit mode · 3fd3c80a

由 Danit Goldberg 提交于 11月 30, 2018

Packet based credit mode bit determines whether the credit mode
is done per message or packet. Expose the QP creation flag and
the HCA capability.
Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

3fd3c80a

net: core: dev: Add extack argument to __dev_change_flags() · 6d040321

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. The last missing API is __dev_change_flags().

Therefore extend __dev_change_flags() with and extra extack argument and
update the two existing users.

Since the function declaration line is changed anyway, name the struct
net_device argument to placate checkpatch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d040321

net: core: dev: Add extack argument to dev_change_flags() · 567c5e13

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().

Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

567c5e13

net: core: dev: Add extack argument to dev_open() · 00f54e68

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().

Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00f54e68

06 12月, 2018 6 次提交

net: mii: mii_lpa_mod_linkmode_lpa_t: Make use of linkmode_mod_bit helper · 6dbd0090

由 Andrew Lunn 提交于 12月 05, 2018

Replace the if else code structure with a call to the helper
linkmode_mod_bit.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dbd0090

net: mii: Add mii_lpa_mod_linkmode_lpa_t · d3351931

由 Andrew Lunn 提交于 12月 05, 2018

Add a _mod_ variant of mii_lpa_to_linkmode_lpa_t. Use this to fix the
genphy_read_status() where the 1G link partner features are getting
lost.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3351931

net: mii: Rename mii_stat1000_to_linkmode_lpa_t · 78a24df3

由 Andrew Lunn 提交于 12月 05, 2018

Rename mii_stat1000_to_linkmode_lpa_t to
mii_stat1000_mod_linkmode_lpa_t to indicate it modifies the passed
linkmode bitmap, without clearing any other bits.

Add a helper to set/clear bits in a linkmode.

Use this helper to ensure bit are clear which the stat1000 indicates
should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78a24df3

net: mii: Fix autoneg in mii_lpa_to_linkmode_lpa_t() · 5f15eed2

由 Andrew Lunn 提交于 12月 05, 2018

mii_adv_to_linkmode_adv_t() clears all bits before setting it needs to
set. This means the freshly set Autoneg gets cleared.

Change the order, and add comments about it clearing the old content
of the bitmap.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f15eed2

mm, thp: restore node-local hugepage allocations · 2f0799a0

由 David Rientjes 提交于 12月 05, 2018

This is a full revert of ac5b2c18 ("mm: thp: relax __GFP_THISNODE for
MADV_HUGEPAGE mappings") and a partial revert of 89c83fb5 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").

By not setting __GFP_THISNODE, applications can allocate remote hugepages
when the local node is fragmented or low on memory when either the thp
defrag setting is "always" or the vma has been madvised with
MADV_HUGEPAGE.

Remote access to hugepages often has much higher latency than local pages
of the native page size.  On Haswell, ac5b2c18 was shown to have a
13.9% access regression after this commit for binaries that remap their
text segment to be backed by transparent hugepages.

The intent of ac5b2c18 is to address an issue where a local node is
low on memory or fragmented such that a hugepage cannot be allocated.  In
every scenario where this was described as a fix, there is abundant and
unfragmented remote memory available to allocate from, even with a greater
access latency.

If remote memory is also low or fragmented, not setting __GFP_THISNODE was
also measured on Haswell to have a 40% regression in allocation latency.

Restore __GFP_THISNODE for thp allocations.

Fixes: ac5b2c18 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f0799a0

USB: check usb_get_extra_descriptor for proper size · 704620af

由 Mathias Payer 提交于 12月 05, 2018

When reading an extra descriptor, we need to properly check the minimum
and maximum size allowed, to prevent from invalid data being sent by a
device.
Reported-by: NHui Peng <benquike@gmail.com>
Reported-by: NMathias Payer <mathias.payer@nebelwelt.net>
Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHui Peng <benquike@gmail.com>
Signed-off-by: NMathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

704620af

05 12月, 2018 3 次提交

USB: serial: console: fix reported terminal settings · f51ccf46

由 Johan Hovold 提交于 12月 04, 2018

The USB-serial console implementation has never reported the actual
terminal settings used. Despite storing the corresponding cflags in its
struct console, these were never honoured on later tty open() where the
tty termios would be left initialised to the driver defaults.

Unlike the serial console implementation, the USB-serial code calls
subdriver open() already at console setup. While calling set_termios()
and write() before open() looks like it could work for some USB-serial
drivers, others definitely do not expect this, so modelling this after
serial core is going to be intrusive, if at all possible.

Instead, use a (renamed) tty helper to save the termios data used at
console setup so that the tty termios reflects the actual terminal
settings after a subsequent tty open().

Note that the calls to tty_init_termios() (tty_driver_install()) and
tty_save_termios() are serialised using the disconnect mutex.

This specifically fixes a regression that was triggered by a recent
change adding software flow control to the pl2303 driver: a getty trying
to disable flow control while leaving the baud rate unchanged would now
also set the baud rate to the driver default (prior to the flow-control
change this had been a noop).

Fixes: 7041d9c3 ("USB: serial: pl2303: add support for tx xon/xoff flow control")
Cc: stable <stable@vger.kernel.org> # 4.18
Cc: Florian Zumbiehl <florz@florz.de>
Reported-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJohan Hovold <johan@kernel.org>

f51ccf46

dax: Fix unlock mismatch with updated API · 27359fd6

由 Matthew Wilcox 提交于 11月 30, 2018

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

27359fd6

skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' · 875e8939

由 Ido Schimmel 提交于 12月 04, 2018

Commit abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field") added
the 'offload_mr_fwd_mark' field to indicate that a packet has already
undergone L3 multicast routing by a capable device. The field is used to
prevent the kernel from forwarding a packet through a netdev through
which the device has already forwarded the packet.

Currently, no unicast packet is routed by both the device and the
kernel, but this is about to change by subsequent patches and we need to
be able to mark such packets, so that they will no be forwarded twice.

Instead of adding yet another field to 'struct sk_buff', we can just
rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet
either has a multicast or a unicast destination IP.

While at it, add a comment about both 'offload_fwd_mark' and
'offload_l3_fwd_mark'.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

875e8939

04 12月, 2018 10 次提交

net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits · 9d43faac

由 Yishai Hadas 提交于 11月 26, 2018

Expose device capabilities for DEVX user context, it includes which caps
the device is supported and a matching bit to set as part of user
context creation.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

9d43faac

RDMA/mlx5: Initialize SRQ tables on mlx5_ib · f3da6577

由 Leon Romanovsky 提交于 11月 28, 2018

Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

f3da6577

net/mlx5: Move SRQ functions to RDMA part · f02d0d6e

由 Leon Romanovsky 提交于 11月 28, 2018

There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

f02d0d6e

net/mlx5: Remove dead transobj code · 5b5f0f16

由 Leon Romanovsky 提交于 11月 28, 2018

Delete functions which are not called and not needed.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

5b5f0f16

net/mlx5: Align SRQ licenses and copyright information · 6cd0014a

由 Leon Romanovsky 提交于 11月 28, 2018

Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

6cd0014a

udp: elide zerocopy operation in hot path · 52900d22

由 Willem de Bruijn 提交于 11月 30, 2018

With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.

The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.

The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.

Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.

This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.

Changes
  v3 -> v4
    - Move skb_zcopy_set below the only kfree_skb that might cause
      a premature uarg destroy before skb_zerocopy_put_abort
      - Move the entire skb_shinfo assignment block, to keep that
        cacheline access in one place
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52900d22

udp: msg_zerocopy · b5947e5d

由 Willem de Bruijn 提交于 11月 30, 2018

Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
interpret flag MSG_ZEROCOPY.

This patch was previously part of the zerocopy RFC patchsets. Zerocopy
is not effective at small MTU. With segmentation offload building
larger datagrams, the benefit of page flipping outweights the cost of
generating a completion notification.

tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
test patch and making skb_orphan_frags_rx same as skb_orphan_frags:

    ipv4 udp -t 1
    tx=191312 (11938 MB) txc=0 zc=n
    rx=191312 (11938 MB)
    ipv4 udp -z -t 1
    tx=304507 (19002 MB) txc=304507 zc=y
    rx=304507 (19002 MB)
    ok
    ipv6 udp -t 1
    tx=174485 (10888 MB) txc=0 zc=n
    rx=174485 (10888 MB)
    ipv6 udp -z -t 1
    tx=294801 (18396 MB) txc=294801 zc=y
    rx=294801 (18396 MB)
    ok

Changes
  v1 -> v2
    - Fixup reverse christmas tree violation
  v2 -> v3
    - Split refcount avoidance optimization into separate patch
      - Fix refcount leak on error in fragmented case
        (thanks to Paolo Abeni for pointing this one out!)
      - Fix refcount inc on zero
      - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
        This is needed since commit 5cf4a853 ("tcp: really ignore
	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5947e5d

of: net: kill of_get_nvmem_mac_address() · afa64a72

由 Bartosz Golaszewski 提交于 11月 30, 2018

We've switched all users to nvmem_get_mac_address(). Remove the now
dead code.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: NRob Herring <robh@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afa64a72

net: ethernet: provide nvmem_get_mac_address() · 0e839df9

由 Bartosz Golaszewski 提交于 11月 30, 2018

We already have of_get_nvmem_mac_address() but some non-DT systems want
to read the MAC address from NVMEM too. Implement a generalized routine
that takes struct device as argument.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e839df9

rhashtable: detect when object movement between tables might have invalidated a lookup · 82208d0d

由 NeilBrown 提交于 11月 30, 2018

Some users of rhashtables might need to move an object from one table
to another -  this appears to be the reason for the incomplete usage
of NULLS markers.

To support these, we store a unique NULLS_MARKER at the end of
each chain, and when a search fails to find a match, we check
if the NULLS marker found was the expected one.  If not, the search
may not have examined all objects in the target bucket, so it is
repeated.

The unique NULLS_MARKER is derived from the address of the
head of the chain.  As this cannot be derived at load-time the
static rhnull in rht_bucket_nested() needs to be initialised
at run time.

Any caller of a lookup function must still be prepared for the
possibility that the object returned is in a different table - it
might have been there for some time.

Note that this does NOT provide support for other uses of
NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing
the key of an object and re-inserting it in the same table.
These could only be done safely if new objects were inserted
at the *start* of a hash chain, and that is not currently the case.
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82208d0d

03 12月, 2018 1 次提交

Drivers: hv: vmbus: Offload the handling of channels to two workqueues · 37c2578c

由 Dexuan Cui 提交于 12月 03, 2018

vmbus_process_offer() mustn't call channel->sc_creation_callback()
directly for sub-channels, because sc_creation_callback() ->
vmbus_open() may never get the host's response to the
OPEN_CHANNEL message (the host may rescind a channel at any time,
e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
may not wake up the vmbus_open() as it's blocked due to a non-zero
vmbus_connection.offer_in_progress, and finally we have a deadlock.

The above is also true for primary channels, if the related device
drivers use sync probing mode by default.

And, usually the handling of primary channels and sub-channels can
depend on each other, so we should offload them to different
workqueues to avoid possible deadlock, e.g. in sync-probing mode,
NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
and waits for all the sub-channels to appear, but the latter
can't get the rtnl_lock and this blocks the handling of sub-channels.

The patch can fix the multiple-NIC deadlock described above for
v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
but don't enable async-probing for Hyper-V drivers (yet).

The patch can also fix the hang issue in sub-channel's handling described
above for all versions of kernels, including v4.19 and v4.20-rc4.

So actually the patch should be applied to all the existing kernels,
not only the kernels that have 8195b139.

Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
Cc: stable@vger.kernel.org
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

37c2578c

02 12月, 2018 1 次提交

SUNRPC: Fix a memory leak in call_encode() · 71700bb9

由 Trond Myklebust 提交于 11月 30, 2018

If we retransmit an RPC request, we currently end up clobbering the
value of req->rq_rcv_buf.bvec that was allocated by the initial call to
xprt_request_prepare(req).
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

71700bb9

01 12月, 2018 4 次提交

bpf: fix pointer offsets in context for 32 bit · b7df9ada

由 Daniel Borkmann 提交于 12月 01, 2018

Currently, pointer offsets in three BPF context structures are
broken in two scenarios: i) 32 bit compiled applications running
on 64 bit kernels, and ii) LLVM compiled BPF programs running
on 32 bit kernels. The latter is due to BPF target machine being
strictly 64 bit. So in each of the cases the offsets will mismatch
in verifier when checking / rewriting context access. Fix this by
providing a helper macro __bpf_md_ptr() that will enforce padding
up to 64 bit and proper alignment, and for context access a macro
bpf_ctx_range_ptr() which will cover full 64 bit member range on
32 bit archs. For flow_keys, we additionally need to force the
size check to sizeof(__u64) as with other pointer types.

Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Reported-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b7df9ada

psi: make disabling/enabling easier for vendor kernels · e0c27447

由 Johannes Weiner 提交于 11月 30, 2018

Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.

With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge.  Do the following things to
make it easier:

1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
   to enable CONFIG_PSI in their kernel but leave the feature disabled
   unless a user requests it at boot-time.

   To avoid double negatives, rename psi_disabled= to psi=.

2. Make psi_disabled a static branch to eliminate any branch costs
   when the feature is disabled.

In terms of numbers before and after this patch, Mel says:

: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
:                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
:                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
: Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
: Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
: Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
: Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
: Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
: Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
: Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
: Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
: Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;

Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NMel Gorman <mgorman@techsingularity.net>
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0c27447

qed: Expose the doorbell overflow recovery mechanism to the protocol drivers · 0e1f1044

由 Ariel Elior 提交于 11月 28, 2018

Most of the doorbelling entities are outside of the core module.
L2 queues, Roce queues, iscsi and fcoe all need to register.
Make the APIs available for these drivers.
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e1f1044

qed: Add doorbell overflow recovery mechanism · 36907cd5

由 Ariel Elior 提交于 11月 28, 2018

Add the database used to register doorbelling entities, and APIs for adding
and deleting entries, and logic for traversing the database and doorbelling
once on behalf of all entities.
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36907cd5

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功