提交 · d1ef551ac9e0d954c0ed7eea7d953ee1a45d813d · openeuler / Kernel

19 8月, 2017 14 次提交

由 stephen hemminger 提交于 8月 18, 2017

Make code closer to current style. Mostly whitespace changes.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6648c65e

net: mark receive queue attributes ro_after_init · 667e427b

由 stephen hemminger 提交于 8月 18, 2017

Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

667e427b

net: make queue attributes ro_after_init · 2b9c7581

由 stephen hemminger 提交于 8月 18, 2017

The XPS queue attributes can be ro_after_init.
Also use __ATTR_RX macros to simplify initialization.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b9c7581

net: make BQL sysfs attributes ro_after_init · 170c658a

由 stephen hemminger 提交于 8月 18, 2017

Also fix macro to not have ; at end.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

170c658a

net: drop unused attribute argument from sysfs queue funcs · 718ad681

由 stephen hemminger 提交于 8月 18, 2017

The show and store functions don't need/use the attribute.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

718ad681

net: make net sysfs attributes ro_after_init · ec6cc599

由 stephen hemminger 提交于 8月 18, 2017

The attributes of net devices are immutable.

Ideally, attribute groups would contain const attributes
but there are too many places that do modifications of list
during startup (in other code) to allow that.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec6cc599

net: constify net_ns_type_operations · 737aec57

由 stephen hemminger 提交于 8月 18, 2017

This can be const.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

737aec57

net: make net_class ro_after_init · e6d473e6

由 stephen hemminger 提交于 8月 18, 2017

The net_class in sysfs is only modified on init.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6d473e6

net: constify netdev_class_file · b793dc5c

由 stephen hemminger 提交于 8月 18, 2017

These functions are wrapper arount class_create_file which can take a
const attribute.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b793dc5c

net: don't decrement kobj reference count on init failure · d0d66837

由 stephen hemminger 提交于 8月 18, 2017

If kobject_init_and_add failed, then the failure path would
decrement the reference count of the queue kobject whose reference
count was already zero.

Fixes: 114cf580 ("bql: Byte queue limits")
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0d66837

xdp: adjust xdp redirect tracepoint to include return error code · 4c03bdd7

由 Jesper Dangaard Brouer 提交于 8月 17, 2017

The return error code need to be included in the tracepoint
xdp:xdp_redirect, else its not possible to distinguish successful or
failed XDP_REDIRECT transmits.

XDP have no queuing mechanism. Thus, it is fairly easily to overrun a
NIC transmit queue.  The eBPF program invoking helpers (bpf_redirect
or bpf_redirect_map) to redirect a packet doesn't get any feedback
whether the packet was actually transmitted.

Info on failed transmits in the tracepoint xdp:xdp_redirect, is
interesting as this opens for providing a feedback-loop to the
receiving XDP program.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c03bdd7

net: inet: diag: expose sockets cgroup classid · 0888e372

由 Levin, Alexander (Sasha Levin) 提交于 8月 17, 2017

This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.
Signed-off-by: NSasha Levin <alexander.levin@verizon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0888e372

ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t · 9620fef2

由 Eric Dumazet 提交于 8月 18, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9620fef2

ipv6: fix false-postive maybe-uninitialized warning · 401481e0

由 Arnd Bergmann 提交于 8月 18, 2017

Adding a lock around one of the assignments prevents gcc from
tracking the state of the local 'fibmatch' variable, so it can no
longer prove that 'dst' is always initialized, leading to a bogus
warning:

net/ipv6/route.c: In function 'inet6_rtm_getroute':
net/ipv6/route.c:3659:2: error: 'dst' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This moves the other assignment into the same lock to shut up the
warning.

Fixes: 121622db ("ipv6: route: make rtm_getroute not assume rtnl is locked")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

401481e0

18 8月, 2017 2 次提交

bpf: reuse tc bpf prologue for sk skb progs · 047b0ecd

由 Daniel Borkmann 提交于 8月 17, 2017

Given both program types are effecitvely doing the same in the
prologue, just reuse the one that we had for tc and only adapt
to the corresponding drop verdict value. That way, we don't need
to have the duplicate from 8a31db56 ("bpf: add access to sock
fields and pkt data from sk_skb programs") to maintain.
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

047b0ecd

bpf: no need to nullify ri->map in xdp_do_redirect · 4d6a75b6

由 Daniel Borkmann 提交于 8月 17, 2017

We are guaranteed to have a NULL ri->map in this branch since
we test for it earlier, so we don't need to reset it here.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d6a75b6

17 8月, 2017 11 次提交

D
tcp: Export tcp_{sendpage,sendmsg}_locked() for ipv6. · 774c4673
由 David S. Miller 提交于 8月 16, 2017
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
774c4673

net: sched: cls_flower: fix ndo_setup_tc type for stats call · d978db8d

由 Jiri Pirko 提交于 8月 16, 2017

I made a stupid mistake using TC_CLSFLOWER_STATS instead of
TC_SETUP_CLSFLOWER. Funny thing is that both are defined as "2" so it
actually did not cause any harm. Anyway, fixing it now.

Fixes: 2572ac53 ("net: sched: make type an argument for ndo_setup_tc")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d978db8d

qdisc: add tracepoint qdisc:qdisc_dequeue for dequeued SKBs · e543002f

由 Jesper Dangaard Brouer 提交于 8月 15, 2017

The main purpose of this tracepoint is to monitor bulk dequeue
in the network qdisc layer, as it cannot be deducted from the
existing qdisc stats.

The txq_state can be used for determining the reason for zero packet
dequeues, see enum netdev_queue_state_t.

Notice all packets doesn't necessary activate this tracepoint. As
qdiscs with flag TCQ_F_CAN_BYPASS, can directly invoke
sch_direct_xmit() when qdisc_qlen is zero.

Remember that perf record supports filters like:

 perf record -e qdisc:qdisc_dequeue \
  --filter 'ifindex == 4 && (packets > 1 || txq_state > 0)'
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e543002f

J
bpf: add access to sock fields and pkt data from sk_skb programs · 8a31db56
由 John Fastabend 提交于 8月 15, 2017
```
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8a31db56

bpf: sockmap with sk redirect support · 174a79ff

由 John Fastabend 提交于 8月 15, 2017

Recently we added a new map type called dev map used to forward XDP
packets between ports (6093ec2d). This patches introduces a
similar notion for sockets.

A sockmap allows users to add participating sockets to a map. When
sockets are added to the map enough context is stored with the
map entry to use the entry with a new helper

  bpf_sk_redirect_map(map, key, flags)

This helper (analogous to bpf_redirect_map in XDP) is given the map
and an entry in the map. When called from a sockmap program, discussed
below, the skb will be sent on the socket using skb_send_sock().

With the above we need a bpf program to call the helper from that will
then implement the send logic. The initial site implemented in this
series is the recv_sock hook. For this to work we implemented a map
attach command to add attributes to a map. In sockmap we add two
programs a parse program and a verdict program. The parse program
uses strparser to build messages and pass them to the verdict program.
The parse programs use the normal strparser semantics. The verdict
program is of type SK_SKB.

The verdict program returns a verdict SK_DROP, or  SK_REDIRECT for
now. Additional actions may be added later. When SK_REDIRECT is
returned, expected when bpf program uses bpf_sk_redirect_map(), the
sockmap logic will consult per cpu variables set by the helper routine
and pull the sock entry out of the sock map. This pattern follows the
existing redirect logic in cls and xdp programs.

This gives the flow,

 recv_sock -> str_parser (parse_prog) -> verdict_prog -> skb_send_sock
                                                     \
                                                      -> kfree_skb

As an example use case a message based load balancer may use specific
logic in the verdict program to select the sock to send on.

Sample programs are provided in future patches that hopefully illustrate
the user interfaces. Also selftests are in follow-on patches.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

174a79ff

bpf: introduce new program type for skbs on sockets · b005fd18

由 John Fastabend 提交于 8月 15, 2017

A class of programs, run from strparser and soon from a new map type
called sock map, are used with skb as the context but on established
sockets. By creating a specific program type for these we can use
bpf helpers that expect full sockets and get the verifier to ensure
these helpers are not used out of context.

The new type is BPF_PROG_TYPE_SK_SKB. This patch introduces the
infrastructure and type.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b005fd18

net: fixes for skb_send_sock · db5980d8

由 John Fastabend 提交于 8月 15, 2017

A couple fixes to new skb_send_sock infrastructure. However, no users
currently exist for this code (adding user in next handful of patches)
so it should not be possible to trigger a panic with existing in-kernel
code.

Fixes: 306b13eb ("proto_ops: Add locked held versions of sendmsg and sendpage")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db5980d8

net: add sendmsg_locked and sendpage_locked to af_inet6 · 45f91bdc

由 John Fastabend 提交于 8月 15, 2017

To complete the sendmsg_locked and sendpage_locked implementation add
the hooks for af_inet6 as well.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f91bdc

net: early init support for strparser · f26de110

由 John Fastabend 提交于 8月 15, 2017

It is useful to allow strparser to init sockets before the read_sock
callback has been established.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f26de110

net_sched/hfsc: opencode trivial set_active() and set_passive() · 6b0355f4

由 Konstantin Khlebnikov 提交于 8月 15, 2017

Any move comment abount update_vf() into right place.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b0355f4

net_sched: call qlen_notify only if child qdisc is empty · 95946658

由 Konstantin Khlebnikov 提交于 8月 15, 2017

This callback is used for deactivating class in parent qdisc.
This is cheaper to test queue length right here.

Also this allows to catch draining screwed backlog and prevent
second deactivation of already inactive parent class which will
crash kernel for sure. Kernel with print warning at destruction
of child qdisc where no packets but backlog is not zero.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95946658

16 8月, 2017 10 次提交

F
ipv4: route: set ipv4 RTM_GETROUTE to not use rtnl · 394f51ab
由 Florian Westphal 提交于 8月 15, 2017
```
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
394f51ab
F
ipv6: route: set ipv6 RTM_GETROUTE to not use rtnl · e3a22b7f
由 Florian Westphal 提交于 8月 15, 2017
```
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e3a22b7f

ipv6: route: make rtm_getroute not assume rtnl is locked · 121622db

由 Florian Westphal 提交于 8月 15, 2017

__dev_get_by_index assumes RTNL is held, use _rcu version instead.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

121622db

dsa: fix flow disector null pointer · 7324157b

由 Craig Gallek 提交于 8月 15, 2017

A recent change to fix up DSA device behavior made the assumption that
all skbs passing through the flow disector will be associated with a
device. This does not appear to be a safe assumption.  Syzkaller found
the crash below by attaching a BPF socket filter that tries to find the
payload offset of a packet passing between two unix sockets.

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 2940 Comm: syzkaller872007 Not tainted 4.13.0-rc4-next-20170811 #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d1b425c0 task.stack: ffff8801d0bc0000
RIP: 0010:__skb_flow_dissect+0xdcd/0x3ae0 net/core/flow_dissector.c:445
RSP: 0018:ffff8801d0bc7340 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000060 RSI: ffffffff856dc080 RDI: 0000000000000300
RBP: ffff8801d0bc7870 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: ffffed003a178f1e R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff856dc080 R15: ffff8801ce223140
FS:  00000000016ed880(0000) GS:ffff8801dc000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020008000 CR3: 00000001ce22d000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 skb_flow_dissect_flow_keys include/linux/skbuff.h:1176 [inline]
 skb_get_poff+0x9a/0x1a0 net/core/flow_dissector.c:1079
 ______skb_get_pay_offset net/core/filter.c:114 [inline]
 __skb_get_pay_offset+0x15/0x20 net/core/filter.c:112
Code: 80 3c 02 00 44 89 6d 10 0f 85 44 2b 00 00 4d 8b 67 20 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 00 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 13 2b 00 00 4d 8b a4 24 00 03 00 00 4d 85 e4
RIP: __skb_flow_dissect+0xdcd/0x3ae0 net/core/flow_dissector.c:445 RSP: ffff8801d0bc7340

Fixes: 43e66528 ("net-next: dsa: fix flow dissection")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NCraig Gallek <kraig@google.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7324157b

net_sched: remove warning from qdisc_hash_add · c90e9514

由 Konstantin Khlebnikov 提交于 8月 15, 2017

It was added in commit e57a784d ("pkt_sched: set root qdisc
before change() in attach_default_qdiscs()") to hide duplicates
from "tc qdisc show" for incative deivices.

After 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
it triggered when classful qdisc is added to inactive device because
default qdiscs are added before switching root qdisc.

Anyway after commit ea327469 ("net: sched: avoid duplicates in
qdisc dump") duplicates are filtered right in dumper.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c90e9514

net_sched/sfq: update hierarchical backlog when drop packet · 325d5dc3

由 Konstantin Khlebnikov 提交于 8月 15, 2017

When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.

Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

325d5dc3

net_sched: reset pointers to tcf blocks in classful qdiscs' destructors · 89890422

由 Konstantin Khlebnikov 提交于 8月 15, 2017

Traffic filters could keep direct pointers to classes in classful qdisc,
thus qdisc destruction first removes all filters before freeing classes.
Class destruction methods also tries to free attached filters but now
this isn't safe because tcf_block_put() unlike to tcf_destroy_chain()
cannot be called second time.

This patch set class->block to NULL after first tcf_block_put() and
turn second call into no-op.

Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89890422

ipv4: fix NULL dereference in free_fib_info_rcu() · 187e5b3a

由 Eric Dumazet 提交于 8月 15, 2017

If fi->fib_metrics could not be allocated in fib_create_info()
we attempt to dereference a NULL pointer in free_fib_info_rcu() :

    m = fi->fib_metrics;
    if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
            kfree(m);

Before my recent patch, we used to call kfree(NULL) and nothing wrong
happened.

Instead of using RCU to defer freeing while we are under memory stress,
it seems better to take immediate action.

This was reported by syzkaller team.

Fixes: 3fb07daf ("ipv4: add reference counting to metrics")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

187e5b3a

ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80

由 Eric Dumazet 提交于 8月 15, 2017

Based on a syzkaller report [1], I found that a per cpu allocation
failure in snmp6_alloc_dev() would then lead to NULL dereference in
ip6_route_dev_notify().

It seems this is a very old bug, thus no Fixes tag in this submission.

Let's add in6_dev_put_clear() helper, as we will probably use
it elsewhere (once available/present in net-next)

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff88019f456680 task.stack: ffff8801c6e58000
RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
 in6_dev_put include/net/addrconf.h:335 [inline]
 ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
 call_netdevice_notifiers net/core/dev.c:1694 [inline]
 rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
 register_netdev+0x1a/0x30 net/core/dev.c:7669
 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
 ops_init+0x10a/0x570 net/core/net_namespace.c:118
 setup_net+0x313/0x710 net/core/net_namespace.c:294
 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
 SYSC_unshare kernel/fork.c:2347 [inline]
 SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512c9
RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
ffff8801c6e5f1b0
RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
ffff8801c6e5f1b0
RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
ffff8801c6e5f1b0
---[ end trace e441d046c6410d31 ]---
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12d94a80

ipv6: fib: Provide offload indication using nexthop flags · fe400799

由 Ido Schimmel 提交于 8月 15, 2017

IPv6 routes currently lack nexthop flags as in IPv4. This has several
implications.

In the forwarding path, it requires us to check the carrier state of the
nexthop device and potentially ignore a linkdown route, instead of
checking for RTNH_F_LINKDOWN.

It also requires capable drivers to use the user facing IPv6-specific
route flags to provide offload indication, instead of using the nexthop
flags as in IPv4.

Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
provide offload indication instead of the RTF_OFFLOAD flag, which is
removed while it's still not part of any official kernel release.

In the near future we would like to use the field for the
RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
not be ready in time for the current cycle.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe400799

15 8月, 2017 3 次提交

tcp: fix possible deadlock in TCP stack vs BPF filter · d624d276

由 Eric Dumazet 提交于 8月 14, 2017

Filtering the ACK packet was not put at the right place.

At this place, we already allocated a child and put it
into accept queue.

We absolutely need to call tcp_child_process() to release
its spinlock, or we will deadlock at accept() or close() time.

Found by syzkaller team (Thanks a lot !)

Fixes: 8fac365f ("tcp: Add a tcp_filter hook before handle ack packet")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d624d276

dccp: purge write queue in dccp_destroy_sock() · 7749d4ff

由 Eric Dumazet 提交于 8月 14, 2017

syzkaller reported that DCCP could have a non empty
write queue at dismantle time.

WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x417 kernel/panic.c:180
 __warn+0x1c4/0x1d9 kernel/panic.c:541
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
 inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
 dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 sock_release+0x8d/0x1e0 net/socket.c:597
 sock_close+0x16/0x20 net/socket.c:1126
 __fput+0x327/0x7e0 fs/file_table.c:210
 ____fput+0x15/0x20 fs/file_table.c:246
 task_work_run+0x18a/0x260 kernel/task_work.c:116
 exit_task_work include/linux/task_work.h:21 [inline]
 do_exit+0xa32/0x1b10 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:969
 get_signal+0x7e8/0x17e0 kernel/signal.c:2330
 do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
 exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
 syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7749d4ff

ipv6: release rt6->rt6i_idev properly during ifdown · e5645f51

由 Wei Wang 提交于 8月 14, 2017

When a dst is created by addrconf_dst_alloc() for a host route or an
anycast route, dst->dev points to loopback dev while rt6->rt6i_idev
points to a real device.
When the real device goes down, the current cleanup code only checks for
dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the
refcount leak on the real device in the above situation.
This patch makes sure to always release the refcount taken on
rt6->rt6i_idev during dst_dev_put().

Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of
dst_free()")
Reported-by: NJohn Stultz <john.stultz@linaro.org>
Tested-by: NJohn Stultz <john.stultz@linaro.org>
Tested-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NWei Wang <weiwan@google.com>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5645f51

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功