提交 · c42a6e8b2907d68844bb750f839982ba2556c970 · openeuler / Kernel

25 7月, 2015 2 次提交

ipv4: consider TOS in fib_select_default · 2392debc

由 Julian Anastasov 提交于 7月 22, 2015

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1
Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2392debc

ipv4: fib_select_default should match the prefix · 18a912e9

由 Julian Anastasov 提交于 7月 22, 2015

fib_trie starting from 4.1 can link fib aliases from
different prefixes in same list. Make sure the alternative
gateways are in same table and for same prefix (0) by
checking tb_id and fa_slen.

Fixes: 79e5ad2c ("fib_trie: Remove leaf_info")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18a912e9

22 7月, 2015 4 次提交

openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes · bac541e4

由 Chris J Arges 提交于 7月 21, 2015

Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bac541e4

netlink: don't hold mutex in rcu callback when releasing mmapd ring · 0470eb99

由 Florian Westphal 提交于 7月 21, 2015

Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
	int fd;

	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

	ring_size = req.nm_block_nr * req.nm_block_size;
	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
 #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
 #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
 #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
 <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
 [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
 [<ffffffff81085bed>] __might_sleep+0x4d/0x90
 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
 [<ffffffff817e484d>] __sk_free+0x1d/0x160
 [<ffffffff817e49a9>] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.
Reported-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: NCong Wang <cwang@twopensource.com>
Suggested-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0470eb99

tcp: suppress a division by zero warning · 89e478a2

由 Eric Dumazet 提交于 7月 22, 2015

Andrew Morton reported following warning on one ARM build
with gcc-4.4 :

net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero

Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.

Remove the warning by using a temporary variable.

Fixes: 095dc8e0 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89e478a2

inet: frags: fix defragmented packet's IP header for af_packet · 0848f642

由 Edward Hyunkoo Jee 提交于 7月 21, 2015

When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.

However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.

Also, IPv4 checksum is not corrected after reassembly.

Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: NEdward Hyunkoo Jee <edjee@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0848f642

21 7月, 2015 6 次提交

sched: cls_flow: fix panic on filter replace · 32b2f4b1

由 Daniel Borkmann 提交于 7月 17, 2015

The following test case causes a NULL pointer dereference in cls_flow:

tc filter add dev foo parent 1: handle 0x1 flow hash keys dst action ok
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
flow hash keys mark action drop

To be more precise, actually two different panics are fixed, the first
occurs because tcf_exts_init() is not called on the newly allocated
filter when we do a replace. And the second panic uncovered after that
happens since the arguments of list_replace_rcu() are swapped, the old
element needs to be the first argument and the new element the second.

Fixes: 70da9f0b ("net: sched: cls_flow use RCU")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32b2f4b1

sched: cls_flower: fix panic on filter replace · ff3532f2

由 Daniel Borkmann 提交于 7月 17, 2015

The following test case causes a NULL pointer dereference in cls_flower:

  tc filter add dev foo parent 1: flower eth_type ipv4 action ok flowid 1:1
  tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
            flower eth_type ipv6 action ok flowid 1:1

The problem is that commit 77b9900e ("tc: introduce Flower classifier")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.

Fixes: 77b9900e ("tc: introduce Flower classifier")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff3532f2

sched: cls_bpf: fix panic on filter replace · f6bfc46d

由 Daniel Borkmann 提交于 7月 17, 2015

The following test case causes a NULL pointer dereference in cls_bpf:

  FOO="1,6 0 0 4294967295,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
  tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
            bpf bytecode "$FOO" flowid 1:1 action drop

The problem is that commit 1f947bf1 ("net: sched: rcu'ify cls_bpf")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.

Fixes: 1f947bf1 ("net: sched: rcu'ify cls_bpf")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6bfc46d

net: ratelimit warnings about dst entry refcount underflow or overflow · 8bf4ada2

由 Konstantin Khlebnikov 提交于 7月 17, 2015

Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bf4ada2

caif: fix leaks and race in caif_queue_rcv_skb() · b8a23e8d

由 Eric Dumazet 提交于 7月 17, 2015

1) If sk_filter() is applied, skb was leaked (not freed)
2) Testing SOCK_DEAD twice is racy :
   packet could be freed while already queued.
3) Remove obsolete comment about caching skb->len
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8a23e8d

Revert "sit: Add gro callbacks to sit_offload" · fdbf5b09

由 Herbert Xu 提交于 7月 20, 2015

This patch reverts 19424e05 ("sit:
Add gro callbacks to sit_offload") because it generates packets
that cannot be handled even by our own GSO.
Reported-by: NWolfgang Walter <linux@stwm.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdbf5b09

17 7月, 2015 8 次提交

cfg80211: use RTNL locked reg_can_beacon for IR-relaxation · 923b352f

由 Arik Nemtsov 提交于 7月 08, 2015

The RTNL is required to check for IR-relaxation conditions that allow
more channels to beacon. Export an RTNL locked version of reg_can_beacon
and use it where possible in AP/STA interface type flows, where
IR-relaxation may be applicable.

Fixes: 06f207fc ("cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA")
Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

923b352f

mac80211: add missing length check for confirm frames · b3e7de87

由 Bob Copeland 提交于 7月 14, 2015

Although mesh_rx_plink_frame() already checks that frames have enough
bytes for the action code plus another two bytes for capability/reason
code, it doesn't take into account that confirm frames also have an
additional two-byte aid.  As a result, a corrupt frame could cause a
subsequent subtraction to wrap around to ill effect.  Add another
check for this case.
Signed-off-by: NBob Copeland <me@bobcopeland.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

b3e7de87

mac80211: correct aid location in peering frames · 2ea752cd

由 Bob Copeland 提交于 7月 14, 2015

According to 802.11-2012 8.5.16.3.2 AID comes directly after the
capability bytes in mesh peering confirm frames.  The existing
code, however, was adding a 2 byte offset to this location,
resulting in garbage data going out over the air.  Remove the
offset to fix it.
Signed-off-by: NBob Copeland <me@bobcopeland.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2ea752cd

wireless: regulatory: reduce log level of CRDA related messages · 042ab5fc

由 Thomas Petazzoni 提交于 7月 09, 2015

With a basic Linux userspace, the messages "Calling CRDA to update
world regulatory domain" appears 10 times after boot every second or
so, followed by a final "Exceeded CRDA call max attempts. Not calling
CRDA". For those of us not having the corresponding userspace parts,
having those messages repeatedly displayed at boot time is a bit
annoying, so this commit reduces their log level to pr_debug().
Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

042ab5fc

mac80211: shut down interfaces before destroying interface list · d8d9008c

由 Johannes Berg 提交于 7月 08, 2015

If the hardware is unregistered while interfaces are up, mac80211 will
unregister all interfaces, which in turns causes mac80211 to be called
again to remove them all from the driver and eventually shut down the
hardware.

During this shutdown, however, it's currently already unsafe to iterate
the list of interfaces atomically, as the list is manipulated in an
unsafe manner. This puts an undue burden on the driver - it must stop
all its activities before calling ieee80211_unregister_hw(), while in
the normal stop path it can do all cleanup in the stop method. If, for
example, it's using the iteration during RX for some reason, it would
have to stop RX before unregistering to avoid crashes.

Fix this problem by closing all interfaces before unregistering them.
This will cause the driver stop to have completed before we manipulate
the interface list, and after the driver is stopped *and* has called
ieee80211_unregister_hw() it really musn't be iterating any more as
the memory will be freed as well.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

d8d9008c

mac80211: wowlan: enable powersave if suspend while ps-polling · 541b6ed7

由 Chaitanya T K 提交于 6月 10, 2015

If for any reason we're in the middle of PS-polling or awake after
TX due to dynamic powersave while going to suspend, go back to save
power. This might cause a response frame to get lost, but since we
can't really wait for it while going to suspend that's still better
than not enabling powersave which would cause higher power usage
during (and possibly even after) suspend.

Note that this really only affects the very few drivers that use
the powersave implementation in mac80211.
Signed-off-by: NChaitanya T K <chaitanya.mgit@gmail.com>
[rewrite misleading commit log]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

541b6ed7

mac80211: don't clear all tx flags when requeing · e9de0190

由 Michal Kazior 提交于 7月 02, 2015

When acting as AP and a PS-Poll frame is received
associated station is marked as one in a Service
Period. This state is kept until Tx status for
released frame is reported. While a station is in
Service Period PS-Poll frames are ignored.

However if PS-Poll was received during A-MPDU
teardown it was possible to have the to-be
released frame re-queued back to pending queue.
In such case the frame was stripped of 2 important
flags:

 (a) IEEE80211_TX_CTL_NO_PS_BUFFER
 (b) IEEE80211_TX_STATUS_EOSP

Stripping of (a) led to the frame that was to be
released to be queued back to ps_tx_buf queue. If
station remained to use only PS-Poll frames the
re-queued frame (and new ones) was never actually
transmitted because mac80211 would ignore
subsequent PS-Poll frames due to station being in
Service Period. There was nothing left to clear
the Service Period bit (no xmit -> no tx status ->
no SP end), i.e. the AP would have the station
stuck in Service Period. Beacon TIM would
repeatedly prompt station to poll for frames but
it would get none.

Once (a) is not stripped (b) becomes important
because it's the main condition to clear the
Service Period bit of the station when Tx status
for the released frame is reported back.

This problem was observed with ath9k acting as P2P
GO in some testing scenarios but isn't limited to
it. AP operation with mac80211 based Tx A-MPDU
control combined with clients using PS-Poll frames
is subject to this race.
Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

e9de0190

mac80211: clear subdir_stations when removing debugfs · 4479004e

由 Tom Hughes 提交于 6月 29, 2015

If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:

BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]

Cc: stable@kernel.org
Signed-off-by: NTom Hughes <tom@compton.nu>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

4479004e

16 7月, 2015 12 次提交

tc: act_bpf: fix memory leak · ddf06c1e

由 Alexei Starovoitov 提交于 7月 14, 2015

prog->bpf_ops is populated when act_bpf is used with classic BPF and
prog->bpf_name is optionally used with extended BPF.
Fix memory leak when act_bpf is released.

Fixes: d23b8ad8 ("tc: add BPF based action")
Fixes: a8cb5f55 ("act_bpf: add initial eBPF support for actions")
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddf06c1e

fq_codel: fix return value of fq_codel_drop() · c0afd9ce

由 WANG Cong 提交于 7月 14, 2015

The ->drop() is supposed to return the number of bytes it dropped,
however fq_codel_drop() returns the index of the flow where it drops
a packet from.

Fix this by introducing a helper to wrap fq_codel_drop().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0afd9ce

net_sched: fix a use-after-free in sfq · e8d092aa

由 WANG Cong 提交于 7月 14, 2015

Fixes: 25331d6c ("net: sched: implement qstat helper routines")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8d092aa

ipv6: lock socket in ip6_datagram_connect() · 03645a11

由 Eric Dumazet 提交于 7月 14, 2015

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03645a11

fq_codel: fix a use-after-free · 052cbda4

由 WANG Cong 提交于 7月 13, 2015

Fixes: 25331d6c ("net: sched: implement qstat helper routines")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

052cbda4

tcp: don't use F-RTO on non-recurring timeouts · f82b681a

由 Yuchung Cheng 提交于 7月 13, 2015

Currently F-RTO may repeatedly send new data packets on non-recurring
timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682)
should only be used on either new recovery or recurring timeouts.

This exacerbates the recovery progress during frequent timeout &
repair, because we prioritize sending new data packets instead of
repairing the holes when the bandwidth is already scarce.

Fix it by correcting the test of a new recovery episode.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f82b681a

bridge: mdb: fix double add notification · 5ebc7846

由 Nikolay Aleksandrov 提交于 7月 13, 2015

Since the mdb add/del code was introduced there have been 2 br_mdb_notify
calls when doing br_mdb_add() resulting in 2 notifications on each add.

Example:
 Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
 Before patch:
 root@debian:~# bridge monitor all
 [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
 [MDB]dev br0 port eth1 grp 239.0.0.1 permanent

 After patch:
 root@debian:~# bridge monitor all
 [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: cfd56754 ("bridge: add support of adding and deleting mdb entries")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ebc7846

bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave · bc8c20ac

由 Satish Ashok 提交于 7月 13, 2015

A report with INCLUDE/Change_to_include and empty source list should be
treated as a leave, specified by RFC 3376, section 3.1:
"If the requested filter mode is INCLUDE *and* the requested source
 list is empty, then the entry corresponding to the requested
 interface and multicast address is deleted if present.  If no such
 entry is present, the request is ignored."
Signed-off-by: NSatish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc8c20ac

net: Fix skb csum races when peeking · 89c22d8c

由 Herbert Xu 提交于 7月 13, 2015

When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89c22d8c

NET: AX.25: Stop heartbeat timer on disconnect. · da278622

由 Richard Stearn 提交于 7月 13, 2015

This may result in a kernel panic.  The bug has always existed but
somehow we've run out of luck now and it bites.
Signed-off-by: NRichard Stearn <richard@rns-stearn.demon.co.uk>
Cc: stable@vger.kernel.org	# all branches
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da278622

net: Clone skb before setting peeked flag · 738ac1eb

由 Herbert Xu 提交于 7月 13, 2015

Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.

The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first.  This causes funky races which leads
to double-free.

This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.

Fixes: a59322be ("[UDP]: Only increment counter on first peek/recv")
Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

738ac1eb

rtnetlink: reject non-IFLA_VF_PORT attributes inside IFLA_VF_PORTS · 035d210f

由 Daniel Borkmann 提交于 7月 13, 2015

Similarly as in commit 4f7d2cdf ("rtnetlink: verify IFLA_VF_INFO
attributes before passing them to driver"), we have a double nesting
of netlink attributes, i.e. IFLA_VF_PORTS only contains IFLA_VF_PORT
that is nested itself. While IFLA_VF_PORTS is a verified attribute
from ifla_policy[], we only check if the IFLA_VF_PORTS container has
IFLA_VF_PORT attributes and then pass the attribute's content itself
via nla_parse_nested(). It would be more correct to reject inner types
other than IFLA_VF_PORT instead of continuing parsing and also similarly
as in commit 4f7d2cdf, to check for a minimum of NLA_HDRLEN.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Scott Feldman <sfeldma@gmail.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

035d210f

15 7月, 2015 1 次提交

rds: rds_ib_device.refcount overflow · 4fabb594

由 Wengang Wang 提交于 7月 06, 2015

Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fabb594

13 7月, 2015 1 次提交

can: replace timestamp as unique skb attribute · d3b58c47

由 Oliver Hartkopp 提交于 6月 26, 2015

Commit 514ac99c "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.

Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245 "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.

This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.

This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.

Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

d3b58c47

12 7月, 2015 3 次提交

net: dsa: Fix off-by-one in switch address parsing · c8cf89f7

由 Florian Fainelli 提交于 7月 11, 2015

cd->sw_addr is used as a MDIO bus address, which cannot exceed
PHY_MAX_ADDR (32), our check was off-by-one.

Fixes: 5e95329b ("dsa: add device tree bindings to register DSA switches")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8cf89f7

net: dsa: Test array index before use · 8f5063e9

由 Florian Fainelli 提交于 7月 11, 2015

port_index is used an index into an array, and this information comes
from Device Tree, make sure that port_index is not equal to the array
size before using it. Move the check against port_index earlier in the
loop.

Fixes: 5e95329b: ("dsa: add device tree bindings to register DSA switches")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f5063e9

net: switchdev: don't abort unsupported operations · 2ee94014

由 Vivien Didelot 提交于 7月 10, 2015

There is no need to abort attribute setting or object addition, if the
prepare phase returned operation not supported.

Thus, abort these two transactions only if the error is not -EOPNOTSUPP.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ee94014

11 7月, 2015 3 次提交

net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets · 8220ea23

由 Phil Sutter 提交于 7月 10, 2015

Reconsidering my commit 20462155 "net: inet_diag: export IPV6_V6ONLY
sockopt", I am not happy with the limitations it causes for socket
analysing code in userspace. Exporting the value only if it is set makes
it hard for userspace to decide whether the option is not set or the
kernel does not support exporting the option at all.

>From an auditor's perspective, the interesting question for listening
AF_INET6 sockets is: "Does it NOT have IPV6_V6ONLY set?" Because it is
the unexpected case. This patch allows to answer this question reliably.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8220ea23

bridge: mdb: allow the user to delete mdb entry if there's a querier · 51ed7f3e

由 Satish Ashok 提交于 7月 09, 2015

Until now when a querier was present static entries couldn't be deleted.
Fix this and allow the user to manipulate the mdb with or without a
querier.
Signed-off-by: NSatish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51ed7f3e

net: call rcu_read_lock early in process_backlog · 2c17d27c

由 Julian Anastasov 提交于 7月 09, 2015

Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:

CPU 1                  CPU 2
                       skb->dev: no reference
                       process_backlog:__skb_dequeue
                       process_backlog:local_irq_enable

on_each_cpu for
flush_backlog =>       IPI(hardirq): flush_backlog
                       - packet not found in backlog

                       CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections

netdev_run_todo,
rcu_barrier: no
ongoing callbacks
                       __netif_receive_skb_core:rcu_read_lock
                       - too late
free dev
                       process packet for freed dev

Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c17d27c

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功