提交 · 71f2c3470fca51be27f6fc5975675f7e1c3b91e5 · openeuler / raspberrypi-kernel

05 8月, 2016 2 次提交

mac80211: End the MPSP even if EOSP frame was not acked · 71f2c347

由 Masashi Honma 提交于 8月 02, 2016

If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.
Signed-off-by: NMasashi Honma <masashi.honma@gmail.com>
Acked-by: NBob Copeland <me@bobcopeland.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

71f2c347

mac80211: fix purging multicast PS buffer queue · 6b07d9ca

由 Felix Fietkau 提交于 8月 02, 2016

The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

Cc: stable@vger.kernel.org
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6b07d9ca

03 8月, 2016 1 次提交

mac80211: mesh: flush stations before beacons are stopped · c37a54ac

由 Maital Hahn 提交于 7月 13, 2016

Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.

Tested also on Ralink RT3572 chipset.
Signed-off-by: NMaital Hahn <maitalm@ti.com>
Signed-off-by: NYaniv Machani <yanivma@ti.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

c37a54ac

02 8月, 2016 2 次提交

mac80211: fix check for buffered powersave frames with txq · 4e3f21bc

由 Felix Fietkau 提交于 7月 11, 2016

The logic was inverted here, set the bit if frames are pending.

Fixes: ba8c3d6f ("mac80211: add an intermediate software queue implementation")
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

4e3f21bc

cfg80211: fix missing break in NL8211_CHAN_WIDTH_80P80 case · 680682d4

由 Colin Ian King 提交于 7月 17, 2016

The switch on chandef->width is missing a break on the
NL8211_CHAN_WIDTH_80P80 case; currently we get a WARN_ON when
center_freq2 is non-zero because of the missing break.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

680682d4

31 7月, 2016 6 次提交

sctp: allow receiving msg when TCP-style sk is in CLOSED state · e0878694

由 Xin Long 提交于 7月 30, 2016

Commit 141ddefc ("sctp: change sk state to CLOSED instead of
CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the
assoc is closed when sctp_accept clones a new sk.

If there is still data in sk receive queue, users will not be able
to read it any more, as sctp_recvmsg returns directly if sk state
is CLOSED.

This patch is to add CLOSED state check in sctp_recvmsg to allow
reading data from TCP-style sk with CLOSED state as what TCP does.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0878694

sctp: allow delivering notifications after receiving SHUTDOWN · a0fc6843

由 Xin Long 提交于 7月 30, 2016

Prior to this patch, once sctp received SHUTDOWN or shutdown with RD,
sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would
be dropped in sctp_ulpq_tail_event(). It would cause:

1. some notifications couldn't be received by users. like
   SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C().

2. sctp would also never trigger sk_data_ready when the association
   was closed, making it harder to identify the end of the association
   by calling recvmsg() and getting an EOF. It was not convenient for
   kernel users.

The check here should be stopping delivering DATA chunks after receiving
SHUTDOWN, and stopping delivering ANY chunks after sctp_close().

So this patch is to allow notifications to enqueue into receive queue
even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event,
but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all
events.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0fc6843

sctp: fix the issue sctp requeue auth chunk incorrectly · 1aa25ec2

由 Xin Long 提交于 7月 30, 2016

sctp needs to queue auth chunk back when we know that we are going
to generate another segment. But commit f1533cce ("sctp: fix
panic when sending auth chunks") requeues the last chunk processed
which is probably not the auth chunk.

It causes panic when calculating the MAC in sctp_auth_calculate_hmac(),
as the incorrect offset of the auth chunk in skb->data.

This fix is to requeue it by using packet->auth.

Fixes: f1533cce ("sctp: fix panic when sending auth chunks")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1aa25ec2

tcp: consider recv buf for the initial window scale · f626300a

由 Soheil Hassas Yeganeh 提交于 7月 29, 2016

tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().

To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.

Fixes: b0573dea ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f626300a

net: ipv6: use list_move instead of list_del/list_add · c882219a

由 Wei Yongjun 提交于 7月 28, 2016

Using list_move() instead of list_del() + list_add().
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c882219a

tipc: fix imbalance read_unlock_bh in __tipc_nl_add_monitor() · 6b65bc29

由 Wei Yongjun 提交于 7月 28, 2016

In the error handling case of nla_nest_start() failed read_unlock_bh()
is called  to unlock a lock that had not been taken yet. sparse warns
about the context imbalance as the following:

net/tipc/monitor.c:799:23: warning:
 context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block

Fixes: cf6f7e1d ('tipc: dump monitor attributes')
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b65bc29

27 7月, 2016 9 次提交

af_unix: charge buffers to kmemcg · 3aa9799e

由 Vladimir Davydov 提交于 7月 26, 2016

Unix sockets can consume a significant amount of system memory, hence
they should be accounted to kmemcg.

Since unix socket buffers are always allocated from process context, all
we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
sock->sk_allocation mask.

Eric asked:

> 1) What happens when a buffer, allocated from socket <A> lands in a
> different socket <B>, maybe owned by another user/process.
>
> Who owns it now, in term of kmemcg accounting ?

We never move memcg charges.  E.g.  if two processes from different
cgroups are sharing a memory region, each page will be charged to the
process which touched it first.  Or if two processes are working with
the same directory tree, inodes and dentries will be charged to the
first user.  The same is fair for unix socket buffers - they will be
charged to the sender.

> 2) Has performance impact been evaluated ?

I ran netperf STREAM_STREAM with default options in a kmemcg on a 4 core
x2 HT box.  The results are below:

 # clients            bandwidth (10^6bits/sec)
                    base              patched
         1      67643 +-  725      64874 +-  353    - 4.0 %
         4     193585 +- 2516     186715 +- 1460    - 3.5 %
         8     194820 +-  377     187443 +- 1229    - 3.7 %

So the accounting doesn't come for free - it takes ~4% of performance.
I believe we could optimize it by using per cpu batching not only on
charge, but also on uncharge in memcg core, but that's beyond the scope
of this patch set - I'll take a look at this later.

Anyway, if performance impact is found to be unacceptable, it is always
possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
or not use memory cgroups at runtime at all (thanks to jump labels
there'll be no overhead even if they are compiled in).

Link: http://lkml.kernel.org/r/fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.comSigned-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3aa9799e

l2tp: Correctly return -EBADF from pppol2tp_getname. · 4ac36a4a

由 phil.turnbull@oracle.com 提交于 7月 26, 2016

If 'tunnel' is NULL we should return -EBADF but the 'end_put_sess' path
unconditionally sets 'error' back to zero. Rework the error path so it
more closely matches pppol2tp_sendmsg.

Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ac36a4a

net: ipmr/ip6mr: update lastuse on entry change · 90b5ca17

由 Nikolay Aleksandrov 提交于 7月 26, 2016

Currently lastuse is updated on entry creation and cache hit, but it should
also be updated on entry change. Since both on add and update the ttl array
is updated we can simply update the lastuse in ipmr_update_thresholds.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Donald Sharp <sharpd@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90b5ca17

tipc: dump monitor attributes · cf6f7e1d

由 Parthasarathy Bhuvaragan 提交于 7月 26, 2016

In this commit, we dump the monitor attributes when queried.
The link monitor attributes are separated into two kinds:
1. general attributes per bearer
2. specific attributes per node/peer
This style resembles the socket attributes and the nametable
publications per socket.
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf6f7e1d

tipc: add a function to get the bearer name · ff0d3e78

由 Parthasarathy Bhuvaragan 提交于 7月 26, 2016

Introduce a new function to get the bearer name from
its id. This is used in subsequent commit.
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff0d3e78

tipc: get monitor threshold for the cluster · bf1035b2

由 Parthasarathy Bhuvaragan 提交于 7月 26, 2016

In this commit, we add support to fetch the configured
cluster monitoring threshold.
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf1035b2

tipc: make cluster size threshold for monitoring configurable · 7b3f5229

由 Parthasarathy Bhuvaragan 提交于 7月 26, 2016

In this commit, we introduce support to configure the minimum
threshold to activate the new link monitoring algorithm.
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b3f5229

tipc: introduce constants for tipc address validation · 9ff26e9f

由 Parthasarathy Bhuvaragan 提交于 7月 26, 2016

In this commit, we introduce defines for tipc address size,
offset and mask specification for Zone.Cluster.Node.
There is no functional change in this commit.
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ff26e9f

net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() · d1c2b501

由 He Chunhui 提交于 7月 26, 2016

NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee
neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged,
the entry state should not be changed.

Currently the code puts an extra "NUD_CONNECTED" condition. So if old state
was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the
state can be changed to NUD_STALE.

This may cause problem. Because NUD_STALE lladdr doesn't guarantee
reachability, when we send traffic, the state will be changed to
NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()),
we will change the state to NUD_PROBE and send probe traffic. But now the
state may be reset to NUD_STALE again(e.g. by broadcast ARP packets),
so the probe traffic will not be sent. This situation may happen again and
again, and packets will be sent to an non-reachable lladdr forever.

The fix is to remove the "NUD_CONNECTED" condition. After that the
"NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will
be redundant, so remove it.

This change may increase probe traffic, but it's essential since NUD_STALE
lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr,
when we can't get confirmation, even while remote packets try to set
NUD_STALE state.
Signed-off-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1c2b501

26 7月, 2016 16 次提交

net_sched: get rid of struct tcf_common · ec0595cc

由 WANG Cong 提交于 7月 25, 2016

After the previous patch, struct tc_action should be enough
to represent the generic tc action, tcf_common is not necessary
any more. This patch gets rid of it to make tc action code
more readable.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec0595cc

net_sched: move tc_action into tcf_common · a85a970a

由 WANG Cong 提交于 7月 25, 2016

struct tc_action is confusing, currently we use it for two purposes:
1) Pass in arguments and carry out results from helper functions
2) A generic representation for tc actions

The first one is error-prone, since we need to make sure we don't
miss anything. This patch aims to get rid of this use, by moving
tc_action into tcf_common, so that they are allocated together
in hashtable and can be cast'ed easily.

And together with the following patch, we could really make
tc_action a generic representation for all tc actions and each
type of action can inherit from it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a85a970a

udp: use sk_filter_trim_cap for udp{,6}_queue_rcv_skb · ba66bbe5

由 Daniel Borkmann 提交于 7月 25, 2016

After a6127697 ("udp: prevent bugcheck if filter truncates packet
too much"), there followed various other fixes for similar cases such
as f4979fce ("rose: limit sk_filter trim to payload").

Latter introduced a new helper sk_filter_trim_cap(), where we can pass
the trim limit directly to the socket filter handling. Make use of it
here as well with sizeof(struct udphdr) as lower cap limit and drop the
extra skb->len test in UDP's input path.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba66bbe5

net/sctp: terminate rhashtable walk correctly · 5fc382d8

由 Vegard Nossum 提交于 7月 23, 2016

I was seeing a lot of these:

    BUG: sleeping function called from invalid context at mm/slab.h:388
    in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
    Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150

     [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
     [<ffffffff83295722>] _raw_spin_lock+0x12/0x40
     [<ffffffff811aac87>] console_unlock+0x2f7/0x930
     [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520
     [<ffffffff811aba6a>] vprintk_default+0x1a/0x20
     [<ffffffff812c171a>] printk+0x94/0xb0
     [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170
     [<ffffffff8115835e>] ___might_sleep+0x3be/0x460
     [<ffffffff81158490>] __might_sleep+0x90/0x1a0
     [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0
     [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0
     [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60
     [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150
     [<ffffffff8143a82b>] seq_read+0x27b/0x1180
     [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180
     [<ffffffff813d471b>] __vfs_read+0xdb/0x610
     [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0
     [<ffffffff813d615b>] SyS_pread64+0x11b/0x150
     [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
     [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a
     [<ffffffffffffffff>] 0xffffffffffffffff

Apparently we always need to call rhashtable_walk_stop(), even when
rhashtable_walk_start() fails:

 * rhashtable_walk_start - Start a hash table walk
 * @iter:       Hash table iterator
 *
 * Start a hash table walk.  Note that we take the RCU lock in all
 * cases including when we return an error.  So you must always call
 * rhashtable_walk_stop to clean up.

otherwise we never call rcu_read_unlock() and we get the splat above.

Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fc382d8

net/irda: fix NULL pointer dereference on memory allocation failure · d3e6952c

由 Vegard Nossum 提交于 7月 23, 2016

I ran into this:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
    RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
    RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
    RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
    RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
    R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
    FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
    Stack:
     0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
     ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
     ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
    Call Trace:
     [<ffffffff82bca542>] irda_connect+0x562/0x1190
     [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
     [<ffffffff825b4489>] SyS_connect+0x9/0x10
     [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
     [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
    RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
     RSP <ffff880111747bb8>
    ---[ end trace 4cda2588bc055b30 ]---

The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
and then irttp_connect_request() almost immediately dereferences it.

Cc: stable@vger.kernel.org
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3e6952c

sctp: also point GSO head_skb to the sk when it's available · 52253db9

由 Marcelo Ricardo Leitner 提交于 7月 23, 2016

The head skb for GSO packets won't travel through the inner depths of
SCTP stack as it doesn't contain any chunks on it. That means skb->sk
doesn't get set and then when sctp_recvmsg() calls
sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
to check flags at the socket (sp->v4mapped).

The fix is to initialize skb->sk for th head skb once we are able to do
it. That is, when the first chunk is processed.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52253db9

sctp: fix BH handling on socket backlog · eefc1b1d

由 Marcelo Ricardo Leitner 提交于 7月 23, 2016

Now that the backlog processing is called with BH enabled, we have to
disable BH before taking the socket lock via bh_lock_sock() otherwise
it may dead lock:

sctp_backlog_rcv()
                bh_lock_sock(sk);

                if (sock_owned_by_user(sk)) {
                        if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                sctp_chunk_free(chunk);
                        else
                                backloged = 1;
                } else
                        sctp_inq_push(inqueue, chunk);

                bh_unlock_sock(sk);

while sctp_inq_push() was disabling/enabling BH, but enabling BH
triggers pending softirq, which then may try to re-lock the socket in
sctp_rcv().

[  219.187215]  <IRQ>
[  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
[  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
[  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
[  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
[  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
[  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
[  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
[  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
[  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
[  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
[  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
[  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
[  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
[  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
[  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
[  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
[  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
[  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
[  219.187247]  <EOI>
[  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
[  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
[  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
[  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
[  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
[  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
[  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
[  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
[  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
[  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
[  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
[  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
[  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
[  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20

Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eefc1b1d

kcm: remove redundant -ve error check and return path · 0a58f474

由 Colin Ian King 提交于 7月 22, 2016

The check for a -ve error is redundant, remove it and just
immediately return the return value from the call to
seq_open_net.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a58f474

net: ipv6: Always leave anycast and multicast groups on link down · ea06f717

由 Mike Manning 提交于 7月 22, 2016

Default kernel behavior is to delete IPv6 addresses on link
down, which entails deletion of the multicast and the
subnet-router anycast addresses. These deletions do not
happen with sysctl setting to keep global IPv6 addresses on
link down, so every link down/up causes an increment of the
anycast and multicast refcounts. These bogus refcounts may
stop these addrs from being removed on subsequent calls to
delete them. The solution is to leave the groups for the
multicast and subnet anycast on link down for the callflow
when global IPv6 addresses are kept.

Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: NMike Manning <mmanning@brocade.com>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea06f717

sctp: use inet_recvmsg to support sctp RFS well · fd2d180a

由 Xin Long 提交于 7月 22, 2016

Commit 486bdee0 ("sctp: add support for RPS and RFS")
saves skb->hash into sk->sk_rxhash so that the inet_* can
record it to flow table.

But sctp uses sock_common_recvmsg as .recvmsg instead
of inet_recvmsg, sock_common_recvmsg doesn't invoke
sock_rps_record_flow to record the flow. It may cause
that the receiver has no chances to record the flow if
it doesn't send msg or poll the socket.

So this patch fixes it by using inet_recvmsg as .recvmsg
in sctp.

Fixes: 486bdee0 ("sctp: add support for RPS and RFS")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd2d180a

bridge: Fix incorrect re-injection of LLDP packets · baedbe55

由 Ido Schimmel 提交于 7月 22, 2016

Commit 8626c56c ("bridge: fix potential use-after-free when hook
returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a
bridge port to be re-injected to the Rx path with skb->dev set to the
bridge device, but this breaks the lldpad daemon.

The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP
for any valid device on the system, which doesn't not include soft
devices such as bridge and VLAN.

Since packet sockets (ptype_base) are processed in the Rx path after the
Rx handler, LLDP packets with skb->dev set to the bridge device never
reach the lldpad daemon.

Fix this by making the bridge's Rx handler re-inject LLDP packets with
RX_HANDLER_PASS, which effectively restores the behaviour prior to the
mentioned commit.

This means netfilter will never receive LLDP packets coming through a
bridge port, as I don't see a way in which we can have okfn() consume
the packet without breaking existing behaviour. I've already carried out
a similar fix for STP packets in commit 56fae404 ("bridge: Fix
incorrect re-injection of STP packets").

Fixes: 8626c56c ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baedbe55

sctp: support ipv6 nonlocal bind · 9b974202

由 Xin Long 提交于 7月 22, 2016

This patch makes sctp support ipv6 nonlocal bind by adding
sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind
check in sctp_v6_available as what sctp did to support
ipv4 nonlocal bind (commit cdac4e07).
Reported-by: NShijoe George <spanjikk@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b974202

bpf, events: fix offset in skb copy handler · aa7145c1

由 Daniel Borkmann 提交于 7月 22, 2016

This patch fixes the __output_custom() routine we currently use with
bpf_skb_copy(). I missed that when len is larger than the size of the
current handle, we can issue multiple invocations of copy_func, and
__output_custom() advances destination but also source buffer by the
written amount of bytes. When we have __output_custom(), this is actually
wrong since in that case the source buffer points to a non-linear object,
in our case an skb, which the copy_func helper is supposed to walk.
Therefore, since this is non-linear we thus need to pass the offset into
the helper, so that copy_func can use it for extracting the data from
the source object.

Therefore, adjust the callback signatures properly and pass offset
into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
__DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
i) to pass in whether we should advance source buffer or not; this is
a compile-time constant condition, ii) to pass in the offset for
__output_custom(), which we do with help of __VA_ARGS__, so everything
can stay inlined as is currently. Both changes allow for adapting the
__output_* fast-path helpers w/o extra overhead.

Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa7145c1

net/ncsi: avoid maybe-uninitialized warning · a1b43edd

由 Arnd Bergmann 提交于 7月 21, 2016

gcc-4.9 and higher warn about the newly added NSCI code:

net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel':
net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The warning is a false positive and therefore harmless, but it would be good to
avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave()
is what confuses gcc to the point that it cannot track whether the variable
was unused or not.

This rearranges the code in a way that makes it obvious to gcc that old_state
is always initialized at the time of use, functionally this should not
change anything.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1b43edd

net: bridge: br_set_ageing_time takes a clock_t · 9e0b27fe

由 Vivien Didelot 提交于 7月 21, 2016

Change the ageing_time type in br_set_ageing_time() from u32 to what it
is expected to be, i.e. a clock_t.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e0b27fe

net: bridge: fix br_stp_enable_bridge comment · dba479f3

由 Vivien Didelot 提交于 7月 21, 2016

br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly
pasted comment and use the same as br_stp_disable_bridge().
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dba479f3

25 7月, 2016 2 次提交

net/sched: Add match-all classifier hw offloading. · b87f7936

由 Yotam Gigi 提交于 7月 21, 2016

Following the work that have been done on offloading classifiers like u32
and flower, now the match-all classifier hw offloading is possible. if
the interface supports tc offloading.

To control the offloading, two tc flags have been introduced: skip_sw and
skip_hw. Typical usage:

tc filter add dev eth25 parent ffff: 	\
	matchall skip_sw		\
	action mirred egress mirror	\
	dev eth27
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b87f7936

net/sched: introduce Match-all classifier · bf3994d2

由 Jiri Pirko 提交于 7月 21, 2016

The matchall classifier matches every packet and allows the user to apply
actions on it. This filter is very useful in usecases where every packet
should be matched, for example, packet mirroring (SPAN) can be setup very
easily using that filter.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf3994d2

23 7月, 2016 2 次提交

netfilter: nft_compat: fix crash when related match/target module is removed · 4b512e1c

由 Liping Zhang 提交于 7月 23, 2016

We "cache" the loaded match/target modules and reuse them, but when the
modules are removed, we still point to them. Then we may end up with
invalid memory references when using iptables-compat to add rules later.

Input the following commands will reproduce the kernel crash:
  # iptables-compat -A INPUT -j LOG
  # iptables-compat -D INPUT -j LOG
  # rmmod xt_LOG
  # iptables-compat -A INPUT -j LOG
  BUG: unable to handle kernel paging request at ffffffffa05a9010
  IP: [<ffffffff813f783e>] strcmp+0xe/0x30
  Call Trace:
  [<ffffffffa05acc43>] nft_target_select_ops+0x83/0x1f0 [nft_compat]
  [<ffffffffa058a177>] nf_tables_expr_parse+0x147/0x1f0 [nf_tables]
  [<ffffffffa058e541>] nf_tables_newrule+0x301/0x810 [nf_tables]
  [<ffffffff8141ca00>] ? nla_parse+0x20/0x100
  [<ffffffffa057fa8f>] nfnetlink_rcv+0x33f/0x53d [nfnetlink]
  [<ffffffffa057f94b>] ? nfnetlink_rcv+0x1fb/0x53d [nfnetlink]
  [<ffffffff817116b8>] netlink_unicast+0x178/0x220
  [<ffffffff81711a5b>] netlink_sendmsg+0x2fb/0x3a0
  [<ffffffff816b7fc8>] sock_sendmsg+0x38/0x50
  [<ffffffff816b8a7e>] ___sys_sendmsg+0x28e/0x2a0
  [<ffffffff816bcb7e>] ? release_sock+0x1e/0xb0
  [<ffffffff81804ac5>] ? _raw_spin_unlock_bh+0x35/0x40
  [<ffffffff816bcbe2>] ? release_sock+0x82/0xb0
  [<ffffffff816b93d4>] __sys_sendmsg+0x54/0x90
  [<ffffffff816b9422>] SyS_sendmsg+0x12/0x20
  [<ffffffff81805172>] entry_SYSCALL_64_fastpath+0x1a/0xa9

So when nobody use the related match/target module, there's no need to
"cache" it. And nft_[match|target]_release are useless anymore, remove
them.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4b512e1c

netfilter: nft_compat: put back match/target module if init fail · 2bf4fade

由 Liping Zhang 提交于 7月 23, 2016

If the user specify the invalid NFTA_MATCH_INFO/NFTA_TARGET_INFO attr
or memory alloc fail, we should call module_put to the related match
or target. Otherwise, we cannot remove the module even nobody use it.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2bf4fade