提交 · 67e194007be08d071294456274dd53e0a04fdf90 · openanolis / cloud-kernel

14 3月, 2017 1 次提交

ipv6: make ECMP route replacement less greedy · 67e19400

由 Sabrina Dubroca 提交于 3月 13, 2017

Commit 27596472 ("ipv6: fix ECMP route replacement") introduced a
loop that removes all siblings of an ECMP route that is being
replaced. However, this loop doesn't stop when it has replaced
siblings, and keeps removing other routes with a higher metric.
We also end up triggering the WARN_ON after the loop, because after
this nsiblings < 0.

Instead, stop the loop when we have taken care of all routes with the
same metric as the route being replaced.

  Reproducer:
  ===========
    #!/bin/sh

    ip netns add ns1
    ip netns add ns2
    ip -net ns1 link set lo up

    for x in 0 1 2 ; do
        ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
        ip -net ns1 link set eth$x up
        ip -net ns2 link set veth$x up
    done

    ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
            nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
    ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
    ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048

    echo "before replace, 3 routes"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
    echo

    ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
            nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2

    echo "after replace, only 2 routes, metric 2048 is gone"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'

Fixes: 27596472 ("ipv6: fix ECMP route replacement")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67e19400

13 3月, 2017 9 次提交

bpf: improve read-only handling · 65869a47

由 Daniel Borkmann 提交于 3月 11, 2017

Improve bpf_{prog,jit_binary}_{un,}lock_ro() by throwing a
one-time warning in case of an error when the image couldn't
be set read-only, and also mark struct bpf_prog as locked when
bpf_prog_lock_ro() was called.

Reason for the latter is that bpf_prog_unlock_ro() is called from
various places including error paths, and we shouldn't mess with
page attributes when really not needed.

For bpf_jit_binary_unlock_ro() this is not needed as jited flag
implicitly indicates this, thus for archs with ARCH_HAS_SET_MEMORY
we're guaranteed to have a previously locked image. Overall, this
should also help us to identify any further potential issues with
set_memory_*() helpers.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65869a47

selftests/bpf: fix broken build · 1da8ac7c

由 Alexei Starovoitov 提交于 3月 10, 2017

Recent merge of 'linux-kselftest-4.11-rc1' tree broke bpf test build.
None of the tests were building and test_verifier.c had tons of compiler errors.
Fix it and add #ifdef CAP_IS_SUPPORTED to support old versions of libcap.
Tested on centos 6.8 and 7
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1da8ac7c

mpls: Do not decrement alive counter for unregister events · 79099aab

由 David Ahern 提交于 3月 10, 2017

Multipath routes can be rendered usesless when a device in one of the
paths is deleted. For example:

$ ip -f mpls ro ls
100
	nexthop as to 200 via inet 172.16.2.2  dev virt12
	nexthop as to 300 via inet 172.16.3.2  dev br0
101
	nexthop as to 201 via inet6 2000:2::2  dev virt12
	nexthop as to 301 via inet6 2000:3::2  dev br0

$ ip li del br0

When br0 is deleted the other hop is not considered in
mpls_select_multipath because of the alive check -- rt_nhn_alive
is 0.

rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
a 2 hop route, deleting one device drops the alive count to 0. Since
devices are taken down before unregistering, the decrement on
NETDEV_UNREGISTER is redundant.

Fixes: c89359a4 ("mpls: support for dead routes")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79099aab

xen-netback: fix race condition on XenBus disconnect · b17075d5

由 Igor Druzhinin 提交于 3月 10, 2017

In some cases during XenBus disconnect event handling and subsequent
queue resource release there may be some TX handlers active on
other processors. Use RCU in order to synchronize with them.
Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b17075d5

mpls: Send route delete notifications when router module is unloaded · e37791ec

由 David Ahern 提交于 3月 10, 2017

When the mpls_router module is unloaded, mpls routes are deleted but
notifications are not sent to userspace leaving userspace caches
out of sync. Add the call to mpls_notify_route in mpls_net_exit as
routes are freed.

Fixes: 0189197f ("mpls: Basic routing support")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e37791ec

act_connmark: avoid crashing on malformed nlattrs with null parms · 52491c76

由 Etienne Noss 提交于 3月 10, 2017

tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
is set, resulting in a null pointer dereference when trying to access it.

[501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
...
[501099.044334] Call Trace:
[501099.044345]  [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
[501099.044363]  [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
[501099.044380]  [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
[501099.044398]  [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
[501099.044417]  [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
[501099.044436]  [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
[501099.044454]  [<ffffffffa44a23a1>] ? security_capable+0x41/0x60
[501099.044471]  [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
[501099.044490]  [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
[501099.044507]  [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
[501099.044524]  [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
[501099.044541]  [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230
[501099.044558]  [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
[501099.044576]  [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
[501099.044592]  [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
[501099.044608]  [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
[501099.044626]  [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b

Fixes: 22a5dc0e ("net: sched: Introduce connmark action")
Signed-off-by: NÉtienne Noss <etienne.noss@wifirst.fr>
Signed-off-by: NVictorien Molle <victorien.molle@wifirst.fr>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52491c76

Make IP 'forwarding' doc more precise · 88a7cddc

由 Neil Jerram 提交于 3月 10, 2017

It wasn't clear if the 'forwarding' setting needs to be enabled on the
interface that packets are received from, or on the interface that
packets are forwarded to, or both.

In fact (according to my code reading) the setting is relevant on the
interface that packets are received from, so this change updates the doc
to say that.
Signed-off-by: NNeil Jerram <neil@tigera.io>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88a7cddc

netvsc: handle select_queue when device is being removed · 7ce10124

由 stephen hemminger 提交于 3月 09, 2017

Move the send indirection table from the inner device (netvsc)
to the network device context.

It is possible that netvsc_device is not present (remove in progress).
This solves potential use after free issues when packet is being
created during MTU change, shutdown, or queue count changes.

Fixes: d8e18ee0 ("netvsc: enhance transmit select_queue")
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ce10124

net: ethernet: aquantia: call set_irq_affinity_hint before free_irq · ecd05225

由 David Arcari 提交于 3月 09, 2017

When a network interface controlled by the aquantia ethernet driver is brought
down a warning is output in dmesg (see below).

The problem is that aq_pci_func_free_irqs() is calling free_irq() before it is
calling irq_set_affinity_hint().

WARNING: CPU: 4 PID: 10068 at kernel/irq/manage.c:1503 __free_irq+0x24d/0x2b0
<snip>
Call Trace:
 dump_stack+0x63/0x87
 __warn+0xd1/0xf0
 warn_slowpath_null+0x1d/0x20
 __free_irq+0x24d/0x2b0
 free_irq+0x39/0x90
 aq_pci_func_free_irqs+0x52/0xa0 [atlantic]
 aq_nic_stop+0xca/0xd0 [atlantic]
 aq_ndev_close+0x1d/0x40 [atlantic]
 __dev_close_many+0x99/0x100
 __dev_close+0x67/0xb0
<snip>

Fixes: 36a4a50f ("net: ethernet: aquantia: switch to pci_alloc_irq_vectors")

Cc: Christoph Hellwig <hch@lst.de>
Cc: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: NDavid Arcari <darcari@redhat.com>
Tested-by: NPavel Belous <pavel.belous@aquantia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecd05225

11 3月, 2017 7 次提交

Merge branch 'mlx5-fixes' · 609a807a

由 David S. Miller 提交于 3月 10, 2017

Saeed Mahameed says:

====================
Mellanox mlx5 fixes 2017-03-09

This series contains some mlx5 core and ethernet driver fixes.

For -stable:
net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode (for kernel >= 4.10)
net/mlx5e: Avoid wrong identification of rules on deletion (for kernel >= 4.9)
net/mlx5: Don't save PCI state when PCI error is detected (for kernel >= 4.9)
net/mlx5: Fix create autogroup prev initializer (for kernel >=4.9)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

609a807a

net/mlx5e: Fix loopback selftest · ea29bd30

由 Eugenia Emantayev 提交于 3月 10, 2017

Change packet type handler to ETH_P_IP instead of ETH_P_ALL
since we are already expecting an IP packet.

Also, using ETH_P_ALL will cause the loopback test packet type handler
to be called on all outgoing packets, especially our own self loopback
test SKB, which will be validated on xmit as well, and we don't want that.

Tested with:
ethtool -t ethX
validated that the loopback test passes.

Fixes: 0952da79 ('net/mlx5e: Add support for loopback selftest')
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea29bd30

net/mlx5e: Avoid wrong identification of rules on deletion · 65ba8fb7

由 Or Gerlitz 提交于 3月 10, 2017

When deleting offloaded TC flows, we must correctly identify E-switch
rules. The current check could get us wrong w.r.t to rules set on the
PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
offloads mode and then attempt to delete a NIC rule.

To solve that, we add a flags field to offloaded rules, set it on
creation time and use that over the code where currently needed.

Fixes: 8b32580d ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65ba8fb7

net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode · 33e21c59

由 Huy Nguyen 提交于 3月 10, 2017

Currently, the function setdcbx fails if the request dcbx mode
is either IEEE or CEE. We remove the IEEE/CEE mode check because
we support both IEEE and CEE interfaces.

Fixes: 3a6a931d ("net/mlx5e: Support DCBX CEE API")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33e21c59

net/mlx5: Don't save PCI state when PCI error is detected · 5d47f6c8

由 Daniel Jurgens 提交于 3月 10, 2017

When a PCI error is detected the PCI state could be corrupt, don't save
it in that flow. Save the state after initialization. After restoring the
PCI state during slot reset save it again, restoring the state destroys
the previously saved state info.

Fixes: 05ac2c0b ('net/mlx5: Fix race between PCI error handlers and
health work')
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d47f6c8

net/mlx5: Fix create autogroup prev initializer · af363705

由 Paul Blakey 提交于 3月 10, 2017

The autogroups list is a list of non overlapping group boundaries
sorted by their start index. If the autogroups list wasn't empty
and an empty group slot was found at the start of the list,
the new group was added to the end of the list instead of the
beginning, as the prev initializer was incorrect.
When this was repeated, it caused multiple groups to have
overlapping boundaries.

Fixed that by correctly initializing the prev pointer to the
start of the list.

Fixes: eccec8da ('net/mlx5: Keep autogroups list ordered')
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af363705

rxrpc: Wake up the transmitter if Rx window size increases on the peer · 702f2ac8

由 David Howells 提交于 3月 10, 2017

The RxRPC ACK packet may contain an extension that includes the peer's
current Rx window size for this call.  We adjust the local Tx window size
to match.  However, the transmitter can stall if the receive window is
reduced to 0 by the peer and then reopened.

This is because the normal way that the transmitter is re-energised is by
dropping something out of our Tx queue and thus making space.  When a
single gap is made, the transmitter is woken up.  However, because there's
nothing in the Tx queue at this point, this doesn't happen.

To fix this, perform a wake_up() any time we see the peer's Rx window size
increasing.

The observable symptom is that calls start failing on ETIMEDOUT and the
following:

	kAFS: SERVER DEAD state=-62

appears in dmesg.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

702f2ac8

10 3月, 2017 23 次提交

net: phy: marvell: Fix double free of hwmon device · 29673983

由 Andrew Lunn 提交于 3月 09, 2017

The hwmon temperature sensor devices is registered using a devm_hwmon
API call.  The marvell_release() would then manually free the device,
not using a devm_hmon API, resulting in the device being removed
twice, leading to a crash in kernfs_find_ns() during the second
removal.

Remove the manual removal, which makes marvell_release() empty, so
remove it as well.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Fixes: 0b04680f ("phy: marvell: Add support for temperature sensor")
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29673983

Merge branch 'bcmgenet-minor-bug-fixes' · b3b8812e

由 David S. Miller 提交于 3月 09, 2017

Doug Berger says:

====================
net: bcmgenet: minor bug fixes

v2: Accidentally sent the wrong set after rebasing.

This collection contains a number of fixes for minor issues with the
bcmgenet driver most of which were present in the initial submission
of the driver.

Some bugs were uncovered by inspection prior to the upcoming update for
GENETv5 support:
  net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
  net: bcmgenet: correct MIB access of UniMAC RUNT counters
  net: bcmgenet: reserved phy revisions must be checked first
  net: bcmgenet: synchronize irq0 status between the isr and task

Others bugs were found in power management testing:
  net: bcmgenet: power down internal phy if open or resume fails
  net: bcmgenet: Power up the internal PHY before probing the MII
  net: bcmgenet: decouple flow control from bcmgenet_tx_reclaim
  net: bcmgenet: add begin/complete ethtool ops

Doug Berger (7):
  net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
  net: bcmgenet: correct MIB access of UniMAC RUNT counters
  net: bcmgenet: reserved phy revisions must be checked first
  net: bcmgenet: power down internal phy if open or resume fails
  net: bcmgenet: synchronize irq0 status between the isr and task
  net: bcmgenet: Power up the internal PHY before probing the MII
  net: bcmgenet: decouple flow control from bcmgenet_tx_reclaim

Edwin Chan (1):
  net: bcmgenet: add begin/complete ethtool ops
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3b8812e

net: bcmgenet: decouple flow control from bcmgenet_tx_reclaim · 6d22fe14

由 Doug Berger 提交于 3月 09, 2017

The bcmgenet_tx_reclaim() function is used to reclaim transmit
resources in different places within the driver.  Most of them
should not affect the state of the transmit flow control.

This commit relocates the logic for waking tx queues based on
freed resources to the napi polling function where it is more
appropriate.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d22fe14

net: bcmgenet: add begin/complete ethtool ops · 89316fa3

由 Edwin Chan 提交于 3月 09, 2017

Make sure clock is enabled for ethtool ops.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NEdwin Chan <edwin.chan@broadcom.com>
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89316fa3

net: bcmgenet: Power up the internal PHY before probing the MII · 6be371b0

由 Doug Berger 提交于 3月 09, 2017

When using the internal PHY it must be powered up when the MII is probed
or the PHY will not be detected.  Since the PHY is powered up at reset
this has not been a problem.  However, when the kernel is restarted with
kexec the PHY will likely be powered down when the kernel starts so it
will not be detected and the Ethernet link will not be established.

This commit explicitly powers up the internal PHY when the GENET driver
is probed to correct this behavior.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6be371b0

net: bcmgenet: synchronize irq0 status between the isr and task · 07c52d6a

由 Doug Berger 提交于 3月 09, 2017

Add a spinlock to ensure that irq0_stat is not unintentionally altered
as the result of preemption.  Also removed unserviced irq0 interrupts
and removed irq1_stat since there is no bottom half service for those
interrupts.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07c52d6a

net: bcmgenet: power down internal phy if open or resume fails · 7627409c

由 Doug Berger 提交于 3月 09, 2017

Since the internal PHY is powered up during the open and resume
functions it should be powered back down if the functions fail.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7627409c

net: bcmgenet: reserved phy revisions must be checked first · eca4bad7

由 Doug Berger 提交于 3月 09, 2017

The reserved gphy_rev value of 0x01ff must be tested before the old
or new scheme for GPHY major versioning are tested, otherwise it will
be treated as 0xff00 according to the old scheme.

Fixes: b04a2f5b ("net: bcmgenet: add support for new GENET PHY revision scheme")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eca4bad7

net: bcmgenet: correct MIB access of UniMAC RUNT counters · 1ad3d225

由 Doug Berger 提交于 3月 09, 2017

The gap between the Tx status counters and the Rx RUNT counters is now
being added to allow correct reporting of the registers.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ad3d225

net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values · ffff7132

由 Doug Berger 提交于 3月 09, 2017

The location of the RBUF overflow and error counters has moved between
different version of the GENET MAC.  This commit corrects the driver to
read from the correct locations depending on the version of the GENET
MAC.

Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: NDoug Berger <opendmb@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffff7132

amd-xgbe: Enable IRQs only if napi_complete_done() is true · d7aba644

由 Lendacky, Thomas 提交于 3月 09, 2017

Depending on the hardware, the amd-xgbe driver may use disable_irq_nosync()
and enable_irq() when an interrupt is received to process Rx packets. If
the napi_complete_done() return value isn't checked an unbalanced enable
for the IRQ could result, generating a warning stack trace.

Update the driver to only enable interrupts if napi_complete_done() returns
true.
Reported-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7aba644

rxrpc: rxrpc_kernel_send_data() needs to handle failed call better · 6fc166d6

由 David Howells 提交于 3月 09, 2017

If rxrpc_kernel_send_data() is asked to send data through a call that has
already failed (due to a remote abort, received protocol error or network
error), then return the associated error code saved in the call rather than
ESHUTDOWN.

This allows the caller to work out whether to ask for the abort code or not
based on this.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fc166d6

udp: avoid ufo handling on IP payload compression packets · 4b3b45ed

由 Alexey Kodanev 提交于 3月 09, 2017

commit c146066a ("ipv4: Don't use ufo handling on later transformed
packets") and commit f89c56ce ("ipv6: Don't use ufo handling on
later transformed packets") added a check that 'rt->dst.header_len' isn't
zero in order to skip UFO, but it doesn't include IPcomp in transport mode
where it equals zero.

Packets, after payload compression, may not require further fragmentation,
and if original length exceeds MTU, later compressed packets will be
transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:

* IPv4 case, offset is wrong + unnecessary fragmentation
    udp_ipsec.sh -p comp -m transport -s 2000 &
    tcpdump -ni ltp_ns_veth2
    ...
    IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
      proto Compressed IP (108), length 49)
      10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
    IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
      proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17

* IPv6 case, sending small fragments
    udp_ipsec.sh -6 -p comp -m transport -s 2000 &
    tcpdump -ni ltp_ns_veth2
    ...
    IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
      payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
    IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
      payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)

Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
if xfrm is set. So the new check will include both cases: IPcomp and IPsec.

Fixes: c146066a ("ipv4: Don't use ufo handling on later transformed packets")
Fixes: f89c56ce ("ipv6: Don't use ufo handling on later transformed packets")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b3b45ed

net: Work around lockdep limitation in sockets that use sockets · cdfbabfb

由 David Howells 提交于 3月 09, 2017

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdfbabfb

Merge branch 'bnxt_en-misc-small-fixes' · 81dca07b

由 David S. Miller 提交于 3月 09, 2017

Michael Chan says:

====================
bnxt_en: Misc. small fixes.

Fixes include moving the initial function reset, notifying the RDMA driver
during tx timeout, setting dcbx_cap properly depending on whether the
firmware agent is running or not, and an autoneg related improvement.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81dca07b

bnxt_en: Ignore 0 value in autoneg supported speed from firmware. · 520ad89a

由 Michael Chan 提交于 3月 08, 2017

In some situations, the firmware will return 0 for autoneg supported
speed. This may happen if the firmware detects no SFP module, for
example. The driver should ignore this so that we don't end up with
an invalid autoneg setting with nothing advertised. When SFP module
is inserted, we'll get the updated settings from firmware at that time.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

520ad89a

bnxt_en: Check if firmware LLDP agent is running. · bc39f885

由 Michael Chan 提交于 3月 08, 2017

Set DCB_CAP_DCBX_HOST capability flag only if the firmware LLDP agent
is not running.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc39f885

bnxt_en: Call bnxt_ulp_stop() during tx timeout. · b386cd36

由 Michael Chan 提交于 3月 08, 2017

If we call bnxt_reset_task() due to tx timeout, we should call
bnxt_ulp_stop() to inform the RDMA driver about the error and the
impending reset.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b386cd36

bnxt_en: Perform function reset earlier during probe. · 3c2217a6

由 Michael Chan 提交于 3月 08, 2017

The firmware call to do function reset is done too late.  It is causing
the rings that have been reserved to be freed.  In NPAR mode, this bug
is causing us to run out of rings.

Fixes: 391be5c2 ("bnxt_en: Implement new scheme to reserve tx rings.")
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c2217a6

tun: remove copyright printing · 6cbac982

由 LABBE Corentin 提交于 3月 08, 2017

Printing copyright does not give any useful information on the boot
process.
Furthermore, the email address printed is obsolete since
commit ba57b6f2 ("MAINTAINERS: fix bouncing tun/tap entries")
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cbac982

net: initialize msg.msg_flags in recvfrom · 9f138fa6

由 Alexander Potapenko 提交于 3月 08, 2017

KMSAN reports a use of uninitialized memory in put_cmsg() because
msg.msg_flags in recvfrom haven't been initialized properly.
The flag values don't affect the result on this path, but it's still a
good idea to initialize them explicitly.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f138fa6

Merge branch 'bpf-htab-fixes' · 8ddbb312

由 David S. Miller 提交于 3月 09, 2017

Alexei Starovoitov says:

====================
bpf: htab fixes

Two bpf hashtable fixes. See individual patches for details.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ddbb312

bpf: convert htab map to hlist_nulls · 4fe84359

由 Alexei Starovoitov 提交于 3月 07, 2017

when all map elements are pre-allocated one cpu can delete and reuse htab_elem
while another cpu is still walking the hlist. In such case the lookup may
miss the element. Convert hlist to hlist_nulls to avoid such scenario.
When bucket lock is taken there is no need to take such precautions,
so only convert map_lookup and map_get_next to nulls.
The race window is extremely small and only reproducible with explicit
udelay() inside lookup_nulls_elem_raw()

Similar to hlist add hlist_nulls_for_each_entry_safe() and
hlist_nulls_entry_safe() helpers.

Fixes: 6c905981 ("bpf: pre-allocate hash map elements")
Reported-by: NJonathan Perry <jonperry@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fe84359

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功