提交 · 9b6c14d51bd2304b92f842e96172a9cc822fc77c · _Walt / cloud-kernel

10 11月, 2016 16 次提交

net: tcp response should set oif only if it is L3 master · 9b6c14d5

由 David Ahern 提交于 11月 09, 2016

Lorenzo noted an Android unit test failed due to e0d56fdd:
"The expectation in the test was that the RST replying to a SYN sent to a
closed port should be generated with oif=0. In other words it should not
prefer the interface where the SYN came in on, but instead should follow
whatever the routing table says it should do."

Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
that the oif in the flow is set to the skb_iif only if skb_iif is an L3
master.

Fixes: e0d56fdd ("net: l3mdev: remove redundant calls")
Reported-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Tested-by: NLorenzo Colitti <lorenzo@google.com>
Acked-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b6c14d5

Net Driver: Add Cypress GX3 VID=04b4 PID=3610. · 8da3cf2a

由 Allan Chou 提交于 11月 08, 2016

Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
Bridge Controller (Vendor=04b4 ProdID=3610).

Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
with the Kensington SD4600P USB-C Universal Dock with Power,
which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
Controller.

A similar patch was signed-off and tested-by Allan Chou
<allan@asix.com.tw> on 2015-12-01.

Allan verified his similar patch on x86 Linux kernel 4.1.6 system
with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.
Tested-by: NAllan Chou <allan@asix.com.tw>
Tested-by: NChris Roth <chris.roth@usask.ca>
Tested-by: NArtjom Simon <artjom.simon@gmail.com>
Signed-off-by: NAllan Chou <allan@asix.com.tw>
Signed-off-by: NChris Roth <chris.roth@usask.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8da3cf2a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 9fa684ec

由 David S. Miller 提交于 11月 09, 2016

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:

1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
   support the set element updates from the packet path. From Liping
   Zhang.

2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

3) Fix a race when inserting new elements to the set hash from the
   packet path, also from Liping.

4) Handle segmented TCP SIP packets properly, basically avoid that the
   INVITE in the allow header create bogus expectations by performing
   stricter SIP message parsing, from Ulrich Weber.

5) nft_parse_u32_check() should return signed integer for errors, from
   John Linville.

6) Fix wrong allocation instead of connlabels, allocate 16 instead of
   32 bytes, from Florian Westphal.

7) Fix compilation breakage when building the ip_vs_sync code with
   CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

8) Destroy the new set if the transaction object cannot be allocated,
   also from Liping Zhang.

9) Use device to route duplicated packets via nft_dup only when set by
   the user, otherwise packets may not follow the right route, again
   from Liping.

10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fa684ec

rtnl: reset calcit fptr in rtnl_unregister() · f567e950

由 Mathias Krause 提交于 11月 07, 2016

To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f567e950

vxlan: hide unused local variable · 4053ab1b

由 Arnd Bergmann 提交于 11月 07, 2016

A bugfix introduced a harmless warning in v4.9-rc4:

drivers/net/vxlan.c: In function 'vxlan_group_used':
drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]

This hides the variable inside of the same #ifdef that is
around its user. The extraneous initialization is removed
at the same time, it was accidentally introduced in the
same commit.

Fixes: c6fcc4fc ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4053ab1b

ibmvnic: Start completion queue negotiation at server-provided optimum values · 6dbcd8fb

由 John Allen 提交于 11月 07, 2016

Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.
Signed-off-by: NJohn Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dbcd8fb

net: icmp_route_lookup should use rt dev to determine L3 domain · 9d1a6c4e

由 David Ahern 提交于 11月 07, 2016

icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.

Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d1a6c4e

Merge branch 'qcom-emac-pause' · fd6f24d7

由 David S. Miller 提交于 11月 09, 2016

Timur Tabi says:

====================
net: qcom/emac: ensure that pause frames are enabled

The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it.  Therefore, we
need to enable flow control and force the phy to pass pause frames.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd6f24d7

net: qcom/emac: enable flow control if requested · df63022e

由 Timur Tabi 提交于 11月 07, 2016

If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.
Signed-off-by: NTimur Tabi <timur@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df63022e

net: qcom/emac: configure the external phy to allow pause frames · 3e884493

由 Timur Tabi 提交于 11月 07, 2016

Pause frames are used to enable flow control.  A MAC can send and
receive pause frames in order to throttle traffic.  However, the PHY
must be configured to allow those frames to pass through.
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NTimur Tabi <timur@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e884493

net: bgmac: fix reversed checks for clock control flag · cdb26d33

由 Rafał Miłecki 提交于 11月 07, 2016

This fixes regression introduced by patch adding feature flags. It was
already reported and patch followed (it got accepted) but it appears it
was incorrect. Instead of fixing reversed condition it broke a good one.

This patch was verified to actually fix SoC hanges caused by bgmac on
BCM47186B0.

Fixes: db791eb2 ("net: ethernet: bgmac: convert to feature flags")
Fixes: 4af1474e ("net: bgmac: Fix errant feature flag check")
Cc: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdb26d33

bna: Add synchronization for tx ring. · d667f785

由 Benjamin Poirier 提交于 11月 07, 2016

We received two reports of BUG_ON in bnad_txcmpl_process() where
hw_consumer_index appeared to be ahead of producer_index. Out of order
write/read of these variables could explain these reports.

bnad_start_xmit(), as a producer of tx descriptors, has a few memory
barriers sprinkled around writes to producer_index and the device's
doorbell but they're not paired with anything in bnad_txcmpl_process(), a
consumer.

Since we are synchronizing with a device, we must use mandatory barriers,
not smp_*. Also, I didn't see the purpose of the last smp_mb() in
bnad_start_xmit().
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d667f785

Revert "net/mlx4_en: Fix panic during reboot" · f91d7181

由 Tariq Toukan 提交于 11月 06, 2016

This reverts commit 9d2afba0.

The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.

The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f91d7181

net-ipv6: on device mtu change do not add mtu to mtu-less routes · fb56be83

由 Maciej Żenczykowski 提交于 11月 04, 2016

Routes can specify an mtu explicitly or inherit the mtu from
the underlying device - this inheritance is implemented in
dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().

Currently changing the mtu of a device adds mtu explicitly
to routes using that device.

ie.
  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1

After this patch the route entry no longer changes unless it already has an mtu.
There is no need: this inheritance is already done in ip6_mtu()

  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route add local 2000::2 dev lo mtu 2000
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 1501
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1
  # ip -6 route del local 2000::2

This is desirable because changing device mtu and then resetting it
to the previous value shouldn't change the user visible routing table.
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb56be83

sock: fix sendmmsg for partial sendmsg · 3023898b

由 Soheil Hassas Yeganeh 提交于 11月 04, 2016

Do not send the next message in sendmmsg for partial sendmsg
invocations.

sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.

For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.

Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.

Fixes: 228e548e ("net: Add sendmmsg socket system call")
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3023898b

driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed. · aa5fd0fb

由 Gao Feng 提交于 11月 04, 2016

When there is no existing macvlan port in lowdev, one new macvlan port
would be created. But it doesn't be destoried when something failed later.
It casues some memleak.

Now add one flag to indicate if new macvlan port is created.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa5fd0fb

09 11月, 2016 5 次提交

netfilter: nf_tables: fix oops when inserting an element into a verdict map · 58c78e10

由 Liping Zhang 提交于 11月 06, 2016

Dalegaard says:
 The following ruleset, when loaded with 'nft -f bad.txt'
 ----snip----
 flush ruleset
 table ip inlinenat {
   map sourcemap {
     type ipv4_addr : verdict;
   }

   chain postrouting {
     ip saddr vmap @sourcemap accept
   }
 }
 add chain inlinenat test
 add element inlinenat sourcemap { 100.123.10.2 : jump test }
 ----snip----

 results in a kernel oops:
 BUG: unable to handle kernel paging request at 0000000000001344
 IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
 [...]
 Call Trace:
  [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
  [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
  [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
  [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
  [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
  [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
  [<ffffffff8132c400>] ? nla_validate+0x60/0x80
  [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.
Reported-by: NDalegaard <dalegaard@gmail.com>
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

58c78e10

netfilter: conntrack: refine gc worker heuristics · e0df8cae

由 Florian Westphal 提交于 11月 04, 2016

Nicolas Dichtel says:
  After commit b87a2f91 ("netfilter: conntrack: add gc worker to
  remove timed-out entries"), netlink conntrack deletion events may be
  sent with a huge delay.

Nicolas further points at this line:

  goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);

and indeed, this isn't optimal at all.  Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge.  But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.

We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).

So, after some discussion with Nicolas:

1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)

2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries

3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead

4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.

The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.

Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').
Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e0df8cae

netfilter: conntrack: fix CT target for UNSPEC helpers · 6114cc51

由 Florian Westphal 提交于 11月 03, 2016

Thomas reports its not possible to attach the H.245 helper:

iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
iptables: No chain/target/match by that name.
xt_CT: No such helper "H.245"

This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.

We should treat UNSPEC as wildcard and ignore the l3num instead.
Reported-by: NThomas Woerner <twoerner@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6114cc51

netfilter: connmark: ignore skbs with magic untracked conntrack objects · fb9c9649

由 Florian Westphal 提交于 10月 29, 2016

The (percpu) untracked conntrack entries can end up with nonzero connmarks.

The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).

Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.

Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.
Reported-by: NXU Tianwen <evan.xu.tianwen@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

fb9c9649

ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr · 8fbfef7f

由 WANG Cong 提交于 11月 03, 2016

family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8fbfef7f

08 11月, 2016 12 次提交

fib_trie: Correct /proc/net/route off by one error · fd0285a3

由 Alexander Duyck 提交于 11月 04, 2016

The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.

In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.

This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key.  In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid.  With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.

Fixes: 25b97c01 ("ipv4: off-by-one in continuation handling in /proc/net/route")
Cc: Andy Whitcroft <apw@canonical.com>
Reported-by: NJason Baron <jbaron@akamai.com>
Tested-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd0285a3

Documentation: networking: dsa: Update tagging protocols · 8e0140a2

由 Fabian Mewes 提交于 11月 04, 2016

Add Qualcomm QCA tagging introduced in cafdc45c to the
list of supported protocols.
Signed-off-by: NFabian Mewes <architekt@coding4coffee.org>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e0140a2

virtio-net: drop legacy features in virtio 1 mode · f3358507

由 Michael S. Tsirkin 提交于 11月 04, 2016

Virtio 1.0 spec says VIRTIO_F_ANY_LAYOUT and VIRTIO_NET_F_GSO are
legacy-only feature bits. Do not negotiate them in virtio 1 mode.  Note
this is a spec violation so we need to backport it to stable/downstream
kernels.

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3358507

net: icmp6_send should use dst dev to determine L3 domain · 5d41ce29

由 David Ahern 提交于 11月 03, 2016

icmp6_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have a dst set.
Update icmp6_send to use the dst on the skb to determine L3 domain.

Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d41ce29

bpf: fix map not being uncharged during map creation failure · 20b2b24f

由 Daniel Borkmann 提交于 11月 04, 2016

In map_create(), we first find and create the map, then once that
suceeded, we charge it to the user's RLIMIT_MEMLOCK, and then fetch
a new anon fd through anon_inode_getfd(). The problem is, once the
latter fails f.e. due to RLIMIT_NOFILE limit, then we only destruct
the map via map->ops->map_free(), but without uncharging the previously
locked memory first. That means that the user_struct allocation is
leaked as well as the accounted RLIMIT_MEMLOCK memory not released.
Make the label names in the fix consistent with bpf_prog_load().

Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20b2b24f

bpf: fix htab map destruction when extra reserve is in use · 483bed2b

由 Daniel Borkmann 提交于 11月 04, 2016

Commit a6ed3ea6 ("bpf: restore behavior of bpf_map_update_elem")
added an extra per-cpu reserve to the hash table map to restore old
behaviour from pre prealloc times. When non-prealloc is in use for a
map, then problem is that once a hash table extra element has been
linked into the hash-table, and the hash table is destroyed due to
refcount dropping to zero, then htab_map_free() -> delete_all_elements()
will walk the whole hash table and drop all elements via htab_elem_free().
The problem is that the element from the extra reserve is first fed
to the wrong backend allocator and eventually freed twice.

Fixes: a6ed3ea6 ("bpf: restore behavior of bpf_map_update_elem")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

483bed2b

sctp: assign assoc_id earlier in __sctp_connect · 7233bc84

由 Marcelo Ricardo Leitner 提交于 11月 03, 2016

sctp_wait_for_connect() currently already holds the asoc to keep it
alive during the sleep, in case another thread release it. But Andrey
Konovalov and Dmitry Vyukov reported an use-after-free in such
situation.

Problem is that __sctp_connect() doesn't get a ref on the asoc and will
do a read on the asoc after calling sctp_wait_for_connect(), but by then
another thread may have closed it and the _put on sctp_wait_for_connect
will actually release it, causing the use-after-free.

Fix is, instead of doing the read after waiting for the connect, do it
before so, and avoid this issue as the socket is still locked by then.
There should be no issue on returning the asoc id in case of failure as
the application shouldn't trust on that number in such situations
anyway.

This issue doesn't exist in sctp_sendmsg() path.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7233bc84

Merge branch 'phy-ref-leaks' · ee0475a5

由 David S. Miller 提交于 11月 07, 2016

Johan Hovold says:

====================
net: fix device reference leaks

This series fixes a number of device reference leaks (and one of_node
leak) due to failure to drop the references taken by bus_find_device()
and friends.

Note that the final two patches have been compile tested only.

v2
 - hold reference to cpsw-phy-sel device while accessing private data as
   requested by David. Also update the commit message. (patch 1/4)
 - add linux-omap on CC where appropriate
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee0475a5

net: hns: fix device reference leaks · 2271150b

由 Johan Hovold 提交于 11月 03, 2016

Make sure to drop the reference taken by class_find_device() in
hnae_get_handle() on errors and when later releasing the handle.

Fixes: 6fe6611f ("net: add Hisilicon Network Subsystem...")
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2271150b

net: ethernet: ti: davinci_emac: fix device reference leak · 6bed0118

由 Johan Hovold 提交于 11月 03, 2016

Make sure to drop the references taken by bus_find_device() before
returning from emac_dev_open().

Note that phy_connect still takes a reference to the phy device.

Fixes: 5d69e007 ("net: davinci_emac: switch to new mdio")
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bed0118

net: ethernet: ti: cpsw: fix device and of_node leaks · c7262aaa

由 Johan Hovold 提交于 11月 03, 2016

Make sure to drop the references taken by of_get_child_by_name() and
bus_find_device() before returning from cpsw_phy_sel().

Note that holding a reference to the cpsw-phy-sel device does not
prevent the devres-managed private data from going away.

Fixes: 5892cd13 ("drivers: net: cpsw-phy-sel: Add new driver...")
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7262aaa

phy: fix device reference leaks · 17ae1c65

由 Johan Hovold 提交于 11月 03, 2016

Make sure to drop the reference taken by bus_find_device_by_name()
before returning from phy_connect() and phy_attach().

Note that both function still take a reference to the phy device
through phy_attach_direct().

Fixes: e1393456 ("[PATCH] PHY Layer fixup")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17ae1c65

05 11月, 2016 7 次提交

Merge branch 'mlx5-fixes' · 6a0c9f68

由 David S. Miller 提交于 11月 04, 2016

Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 2016-11-04

This series contains six hot fixes of the mlx5 core and mlx5e driver.

Huy fixed an invalid pointer dereference on initialization flow for when
the selected mlx5 load profile is out of range.

Or provided three eswitch offloads related fixes
 - Prevent changing NS of a VF representor.
 - Handle matching on vlan priority for offloaded TC rules
 - Set the actions for offloaded rules properly

On my part I here addressed the error flow related issues in
mlx5e_open_channel reported by Jesper just this week.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a0c9f68

net/mlx5: Fix invalid pointer reference when prof_sel parameter is invalid · 0e97a340

由 Huy Nguyen 提交于 11月 04, 2016

When prof_sel is invalid, mlx5_core_warn is called but the
mlx5_core_dev is not initialized yet. Solution is moving the prof_sel code
after dev->pdev assignment

Fixes: 2974ab6e ('net/mlx5: Improve driver log messages')
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e97a340

net/mlx5: E-Switch, Set the actions for offloaded rules properly · ee39fbc4

由 Or Gerlitz 提交于 11月 04, 2016

As for the current generation of the mlx5 HW (CX4/CX4-Lx) per flow vlan
push/pop actions are emulated, we must not program them to the firmware.

Fixes: f5f82476 ('net/mlx5: E-Switch, Support VLAN actions in the offloads mode')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reported-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee39fbc4

net/mlx5e: Handle matching on vlan priority for offloaded TC rules · 358d79a4

由 Or Gerlitz 提交于 11月 04, 2016

We ignored the vlan priority in offloaded TC rules matching part,
fix that.

Fixes: 095b6cfd ('net/mlx5e: Add TC vlan match parsing')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reported-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

358d79a4

net/mlx5e: Disallow changing name-space for VF representors · abd32772

由 Or Gerlitz 提交于 11月 04, 2016

VF reps should be altogether on the same NS as they were created.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abd32772

net/mlx5e: Re-arrange XDP SQ/CQ creation · d7a0ecab

由 Saeed Mahameed 提交于 11月 04, 2016

In mlx5e_open_channel CQs must be created before napi is enabled.
Here we move the XDP CQ creation to satisfy that fact.

mlx5e_close_channel is already working according to the right order.

Fixes: b5503b99 ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7a0ecab

net/mlx5e: Fix XDP error path of mlx5e_open_channel() · 87dc0255

由 Saeed Mahameed 提交于 11月 04, 2016

In case of mlx5e_open_rq fails the error handling will jump to
label err_close_xdp_sq and will try to close the xdp_sq unconditionally.
xdp_sq is valid only in case of XDP use cases, i.e priv->xdp_prog is
not null.

To fix this in this patch we test xdp_sq validity prior to closing it.

In addition we now close the xdp_sq.cq as well.

Fixes: b5503b99 ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87dc0255

_Walt / cloud-kernel 与 Fork 源项目一致

_Walt / cloud-kernel
与 Fork 源项目一致