提交 · 2f62747c77e2e5a8acb720aaec9ee4860d55118f · openeuler / Kernel

11 12月, 2018 6 次提交

net/mlx5: Remove the get protocol device interface entry · 6c22a119

由 Or Gerlitz 提交于 12月 10, 2018

This isn't used anywhere across the mlx5 driver stack,
remove it.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6c22a119

net/mlx5: Support extended destination format in flow steering command · a2c6162b

由 Eli Britstein 提交于 12月 10, 2018

Update the flow steering command formatting according to the extended
destination API.
Note that the FW dictates that multi destination FTEs that involve at
least one encap must use the extended destination format, while single
destination ones must use the legacy format.
Using extended destination format requires FW support. Check for its
capabilities and return error if not supported.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a2c6162b

net/mlx5: E-Switch, Change vhca id valid bool field to bit flag · aa39c2c0

由 Eli Britstein 提交于 12月 10, 2018

Change the driver flow destination struct to use bit flags with the vhca
id valid being the 1st one. The flags field is more extendable and will
be used in downstream patch.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

aa39c2c0

net/mlx5: Introduce extended destination fields · 1b115498

由 Eli Britstein 提交于 12月 10, 2018

Extended destinations provide the ability to configure different
encapsulation properties per destination on a single FTE. This is
needed for use-cases such as remote mirroring over tunneled networks.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

1b115498

net/mlx5: Revise gre and nvgre key formats · 5886a96a

由 Oz Shlomo 提交于 12月 10, 2018

GRE RFC defines a 32 bit key field. NVGRE RFC splits the 32 bit
key field to 24 bit VSID (gre_key_h) and 8 bit flow entropy (gre_key_l).

Define the two key parsing alternatives in a union, thus enabling both
access methods.
Signed-off-by: NOz Shlomo <ozsh@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5886a96a

net/mlx5: Add monitor commands layout and event data · fd4572b3

由 Eyal Davidovich 提交于 12月 10, 2018

Will be used in downstream patch to monitor counter changes
by the HCA and report it to the driver by an event.
The driver will update its counters cached data accordingly.
Signed-off-by: NEyal Davidovich <eyald@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

fd4572b3

10 12月, 2018 1 次提交

net/mlx5: Use helper to get CQE opcode · 6254adeb

由 Tariq Toukan 提交于 12月 04, 2018

Introduce and use a helper that extracts the opcode
from a CQE (completion queue entry) structure.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6254adeb

09 12月, 2018 3 次提交

net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform data · dc9d38ce

由 Andrew Lunn 提交于 12月 08, 2018

The Marvell 6390 Ethernet switch family does not perform MDIO
turnaround correctly. Many hardware MDIO bus masters don't care about
this, but the bitbangging implementation in Linux does by default. Add
phy_ignore_ta_mask to the platform data so that the bitbangging code
can be told which devices are known to get TA wrong.

v2
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc9d38ce

net: phy: mdio-gpio: Add platform_data support for phy_mask · 04fa26ba

由 Andrew Lunn 提交于 12月 08, 2018

It is sometimes necessary to instantiate a bit-banging MDIO bus as a
platform device, without the aid of device tree.

When device tree is being used, the bus is not scanned for devices,
only those devices which are in device tree are probed. Without device
tree, by default, all addresses on the bus are scanned. This may then
find a device which is not a PHY, e.g. a switch. And the switch may
have registers containing values which look like a PHY. So during the
scan, a PHY device is wrongly created.

After the bus has been registered, a search is made for
mdio_board_info structures which indicates devices on the bus, and the
driver which should be used for them. This is typically used to
instantiate Ethernet switches from platform drivers.  However, if the
scanning of the bus has created a PHY device at the same location as
indicated into the board info for a switch, the switch device is not
created, since the address is already busy.

This can be avoided by setting the phy_mask of the mdio bus. This mask
prevents addresses on the bus being scanned.

v2
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04fa26ba

Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask" · 356ff8a9

由 David Rientjes 提交于 12月 07, 2018

This reverts commit 89c83fb5.

This should have been done as part of 2f0799a0 ("mm, thp: restore
node-local hugepage allocations").  The movement of the thp allocation
policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
intended to only set __GFP_THISNODE for mempolicies that are not
MPOL_BIND whereas the revert could set this regardless of mempolicy.

While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
and alloc_pages_vma() was racy, that has since been removed since the
revert.  What is left is the possibility to use __GFP_THISNODE in
policy_node() when it is unexpected because the special handling for
hugepages in alloc_pages_vma()  was removed as part of the consolidation.

Secondly, prior to 89c83fb5, alloc_pages_vma() implemented a somewhat
different policy for hugepage allocations, which were allocated through
alloc_hugepage_vma().  For hugepage allocations, if the allocating
process's node is in the set of allowed nodes, allocate with
__GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
__GFP_THISNODE instead).  This was changed for shmem_alloc_hugepage() to
allow fallback to other nodes in 89c83fb5 as it did for new_page() in
mm/mempolicy.c which is functionally different behavior and removes the
requirement to only allocate hugepages locally.

So this commit does a full revert of 89c83fb5 instead of the partial
revert that was done in 2f0799a0.  The result is the same thp
allocation policy for 4.20 that was in 4.19.

Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
Fixes: 2f0799a0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

356ff8a9

08 12月, 2018 5 次提交

neighbour: Avoid writing before skb->head in neigh_hh_output() · e6ac64d4

由 Stefano Brivio 提交于 12月 06, 2018

While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
 - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
 - if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
 - use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6ac64d4

neighbor: Improve garbage collection · 58956317

由 David Ahern 提交于 12月 07, 2018

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58956317

bridge: Add br_fdb_clear_offload() · 43920edf

由 Petr Machata 提交于 12月 07, 2018

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that unsets the
offload flag on FDB entries on a given bridge port and VLAN.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43920edf

vxlan: Add vxlan_fdb_clear_offload() · e5ff4b19

由 Petr Machata 提交于 12月 07, 2018

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that walks the
FDB table, unsetting the offload flag on RDST with a given VNI.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5ff4b19

vxlan: Add vxlan_fdb_replay() · 4f89f5b5

由 Petr Machata 提交于 12月 07, 2018

When a VXLAN device becomes relevant to a driver (such as when it is
attached to an offloaded bridge), the driver will generally need to walk
the existing FDB entries and offload them.

Add a function vxlan_fdb_replay() to call a given notifier block for
each FDB entry with a given VNI.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f89f5b5

07 12月, 2018 5 次提交

net/mlx5: Expose packet based credit mode · 3fd3c80a

由 Danit Goldberg 提交于 11月 30, 2018

Packet based credit mode bit determines whether the credit mode
is done per message or packet. Expose the QP creation flag and
the HCA capability.
Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

3fd3c80a

net: core: dev: Add extack argument to __dev_change_flags() · 6d040321

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. The last missing API is __dev_change_flags().

Therefore extend __dev_change_flags() with and extra extack argument and
update the two existing users.

Since the function declaration line is changed anyway, name the struct
net_device argument to placate checkpatch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d040321

net: core: dev: Add extack argument to dev_change_flags() · 567c5e13

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().

Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

567c5e13

net: core: dev: Add extack argument to dev_open() · 00f54e68

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().

Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00f54e68

net: dsa: Add overhead to tag protocol ops. · a5dd3087

由 Andrew Lunn 提交于 12月 06, 2018

Each DSA tag protocol needs to add additional headers to the Ethernet
frame in order to direct it towards a specific switch egress port. It
must also remove the head from a frame received from a
switch. Indicate the maximum size of these headers in the tag protocol
ops structure, so the core can take these overheads into account.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5dd3087

06 12月, 2018 8 次提交

asm-generic: unistd.h: fixup broken macro include. · b7d624ab

由 Guo Ren 提交于 12月 06, 2018

The broken macros make the glibc compile error. If there is no
__NR3264_fstat*, we should also removed related definitions.
Reported-by: NMarcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Fixes: bf4b6a7d ("y2038: Remove stat64 family from default syscall set")
[arnd: Both Marcin and Guo provided this patch to fix up my clearly
       broken commit, I applied the version with the better changelog.]
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
Signed-off-by: NMao Han <han_mao@c-sky.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

b7d624ab

sctp: frag_point sanity check · afd0a800

由 Jakub Audykowicz 提交于 12月 04, 2018

If for some reason an association's fragmentation point is zero,
sctp_datamsg_from_user will try to endlessly try to divide a message
into zero-sized chunks. This eventually causes kernel panic due to
running out of memory.

Although this situation is quite unlikely, it has occurred before as
reported. I propose to add this simple last-ditch sanity check due to
the severity of the potential consequences.
Signed-off-by: NJakub Audykowicz <jakub.audykowicz@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afd0a800

net: mii: mii_lpa_mod_linkmode_lpa_t: Make use of linkmode_mod_bit helper · 6dbd0090

由 Andrew Lunn 提交于 12月 05, 2018

Replace the if else code structure with a call to the helper
linkmode_mod_bit.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dbd0090

net: mii: Add mii_lpa_mod_linkmode_lpa_t · d3351931

由 Andrew Lunn 提交于 12月 05, 2018

Add a _mod_ variant of mii_lpa_to_linkmode_lpa_t. Use this to fix the
genphy_read_status() where the 1G link partner features are getting
lost.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3351931

net: mii: Rename mii_stat1000_to_linkmode_lpa_t · 78a24df3

由 Andrew Lunn 提交于 12月 05, 2018

Rename mii_stat1000_to_linkmode_lpa_t to
mii_stat1000_mod_linkmode_lpa_t to indicate it modifies the passed
linkmode bitmap, without clearing any other bits.

Add a helper to set/clear bits in a linkmode.

Use this helper to ensure bit are clear which the stat1000 indicates
should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78a24df3

net: mii: Fix autoneg in mii_lpa_to_linkmode_lpa_t() · 5f15eed2

由 Andrew Lunn 提交于 12月 05, 2018

mii_adv_to_linkmode_adv_t() clears all bits before setting it needs to
set. This means the freshly set Autoneg gets cleared.

Change the order, and add comments about it clearing the old content
of the bitmap.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f15eed2

mm, thp: restore node-local hugepage allocations · 2f0799a0

由 David Rientjes 提交于 12月 05, 2018

This is a full revert of ac5b2c18 ("mm: thp: relax __GFP_THISNODE for
MADV_HUGEPAGE mappings") and a partial revert of 89c83fb5 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").

By not setting __GFP_THISNODE, applications can allocate remote hugepages
when the local node is fragmented or low on memory when either the thp
defrag setting is "always" or the vma has been madvised with
MADV_HUGEPAGE.

Remote access to hugepages often has much higher latency than local pages
of the native page size.  On Haswell, ac5b2c18 was shown to have a
13.9% access regression after this commit for binaries that remap their
text segment to be backed by transparent hugepages.

The intent of ac5b2c18 is to address an issue where a local node is
low on memory or fragmented such that a hugepage cannot be allocated.  In
every scenario where this was described as a fix, there is abundant and
unfragmented remote memory available to allocate from, even with a greater
access latency.

If remote memory is also low or fragmented, not setting __GFP_THISNODE was
also measured on Haswell to have a 40% regression in allocation latency.

Restore __GFP_THISNODE for thp allocations.

Fixes: ac5b2c18 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f0799a0

USB: check usb_get_extra_descriptor for proper size · 704620af

由 Mathias Payer 提交于 12月 05, 2018

When reading an extra descriptor, we need to properly check the minimum
and maximum size allowed, to prevent from invalid data being sent by a
device.
Reported-by: NHui Peng <benquike@gmail.com>
Reported-by: NMathias Payer <mathias.payer@nebelwelt.net>
Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHui Peng <benquike@gmail.com>
Signed-off-by: NMathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

704620af

05 12月, 2018 4 次提交

USB: serial: console: fix reported terminal settings · f51ccf46

由 Johan Hovold 提交于 12月 04, 2018

The USB-serial console implementation has never reported the actual
terminal settings used. Despite storing the corresponding cflags in its
struct console, these were never honoured on later tty open() where the
tty termios would be left initialised to the driver defaults.

Unlike the serial console implementation, the USB-serial code calls
subdriver open() already at console setup. While calling set_termios()
and write() before open() looks like it could work for some USB-serial
drivers, others definitely do not expect this, so modelling this after
serial core is going to be intrusive, if at all possible.

Instead, use a (renamed) tty helper to save the termios data used at
console setup so that the tty termios reflects the actual terminal
settings after a subsequent tty open().

Note that the calls to tty_init_termios() (tty_driver_install()) and
tty_save_termios() are serialised using the disconnect mutex.

This specifically fixes a regression that was triggered by a recent
change adding software flow control to the pl2303 driver: a getty trying
to disable flow control while leaving the baud rate unchanged would now
also set the baud rate to the driver default (prior to the flow-control
change this had been a noop).

Fixes: 7041d9c3 ("USB: serial: pl2303: add support for tx xon/xoff flow control")
Cc: stable <stable@vger.kernel.org> # 4.18
Cc: Florian Zumbiehl <florz@florz.de>
Reported-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJohan Hovold <johan@kernel.org>

f51ccf46

dax: Fix unlock mismatch with updated API · 27359fd6

由 Matthew Wilcox 提交于 11月 30, 2018

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

27359fd6

tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0

由 Eric Dumazet 提交于 12月 04, 2018

TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
as a step to enable bigger tcp sndbuf limits.

It works reasonably well, but the following happens :

Once the limit is reached, TCP stack generates
an [E]POLLOUT event for every incoming ACK packet.

This causes a high number of context switches.

This patch implements the strategy David Miller added
in sock_def_write_space() :

 - If TCP socket has a notsent_lowat constraint of X bytes,
   allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
   only if number of notsent bytes is below X/2

This considerably reduces TCP_NOTSENT_LOWAT overhead,
while allowing to keep the pipe full.

Tested:
 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM

A:/# cat /proc/sys/net/ipv4/tcp_wmem
4096	262144	64000000
A:/# super_netperf 100 -H B -l 1000 -- -K bbr &

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/

A:/# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
 2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
 0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
 1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
 2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0

A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow

A:/# vmstat 5 5  # check that context switches have not inflated too much.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
 0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
 1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
 1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
 1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a74f0fa0

skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' · 875e8939

由 Ido Schimmel 提交于 12月 04, 2018

Commit abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field") added
the 'offload_mr_fwd_mark' field to indicate that a packet has already
undergone L3 multicast routing by a capable device. The field is used to
prevent the kernel from forwarding a packet through a netdev through
which the device has already forwarded the packet.

Currently, no unicast packet is routed by both the device and the
kernel, but this is about to change by subsequent patches and we need to
be able to mark such packets, so that they will no be forwarded twice.

Instead of adding yet another field to 'struct sk_buff', we can just
rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet
either has a multicast or a unicast destination IP.

While at it, add a comment about both 'offload_fwd_mark' and
'offload_l3_fwd_mark'.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

875e8939

04 12月, 2018 8 次提交

net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits · 9d43faac

由 Yishai Hadas 提交于 11月 26, 2018

Expose device capabilities for DEVX user context, it includes which caps
the device is supported and a matching bit to set as part of user
context creation.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

9d43faac

RDMA/mlx5: Initialize SRQ tables on mlx5_ib · f3da6577

由 Leon Romanovsky 提交于 11月 28, 2018

Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

f3da6577

net/mlx5: Move SRQ functions to RDMA part · f02d0d6e

由 Leon Romanovsky 提交于 11月 28, 2018

There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

f02d0d6e

net/mlx5: Remove dead transobj code · 5b5f0f16

由 Leon Romanovsky 提交于 11月 28, 2018

Delete functions which are not called and not needed.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

5b5f0f16

net/mlx5: Align SRQ licenses and copyright information · 6cd0014a

由 Leon Romanovsky 提交于 11月 28, 2018

Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

6cd0014a

udp: elide zerocopy operation in hot path · 52900d22

由 Willem de Bruijn 提交于 11月 30, 2018

With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.

The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.

The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.

Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.

This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.

Changes
  v3 -> v4
    - Move skb_zcopy_set below the only kfree_skb that might cause
      a premature uarg destroy before skb_zerocopy_put_abort
      - Move the entire skb_shinfo assignment block, to keep that
        cacheline access in one place
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52900d22

udp: msg_zerocopy · b5947e5d

由 Willem de Bruijn 提交于 11月 30, 2018

Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
interpret flag MSG_ZEROCOPY.

This patch was previously part of the zerocopy RFC patchsets. Zerocopy
is not effective at small MTU. With segmentation offload building
larger datagrams, the benefit of page flipping outweights the cost of
generating a completion notification.

tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
test patch and making skb_orphan_frags_rx same as skb_orphan_frags:

    ipv4 udp -t 1
    tx=191312 (11938 MB) txc=0 zc=n
    rx=191312 (11938 MB)
    ipv4 udp -z -t 1
    tx=304507 (19002 MB) txc=304507 zc=y
    rx=304507 (19002 MB)
    ok
    ipv6 udp -t 1
    tx=174485 (10888 MB) txc=0 zc=n
    rx=174485 (10888 MB)
    ipv6 udp -z -t 1
    tx=294801 (18396 MB) txc=294801 zc=y
    rx=294801 (18396 MB)
    ok

Changes
  v1 -> v2
    - Fixup reverse christmas tree violation
  v2 -> v3
    - Split refcount avoidance optimization into separate patch
      - Fix refcount leak on error in fragmented case
        (thanks to Paolo Abeni for pointing this one out!)
      - Fix refcount inc on zero
      - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
        This is needed since commit 5cf4a853 ("tcp: really ignore
	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5947e5d

sctp: kfree_rcu asoc · fb6df5a6

由 Xin Long 提交于 12月 01, 2018

In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences
a transport's asoc under rcu_read_lock while asoc is freed not after
a grace period, which leads to a use-after-free panic.

This patch fixes it by calling kfree_rcu to make asoc be freed after
a grace period.

Note that only the asoc's memory is delayed to free in the patch, it
won't cause sk to linger longer.

Thanks Neil and Marcelo to make this clear.

Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
Fixes: cd2b7087 ("sctp: check duplicate node before inserting a new transport")
Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com
Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com
Suggested-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb6df5a6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功