提交 · 95110ac88d5139c73eef6ede37eff23c4089f4f2 · openanolis / cloud-kernel

03 11月, 2017 5 次提交

xen/pvcalls: fix unsigned less than zero error check · 95110ac8

由 Colin Ian King 提交于 11月 03, 2017

The check on bedata->ref is never true because ref is an unsigned
integer. Fix this by assigning signed int ret to the return of the
call to gnttab_claim_grant_reference so the -ve return can be checked.

Detected by CoverityScan, CID#1460358 ("Unsigned compared against 0")

Fixes: 21968190 ("xen/pvcalls: connect to the backend")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

95110ac8

xen/time: Return -ENODEV from xen_get_wallclock() · b5494ad8

由 Boris Ostrovsky 提交于 11月 02, 2017

For any other error sync_cmos_clock() will reschedule itself
every second or so, for no good reason.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

b5494ad8

xen/pvcalls-front: mark expected switch fall-through · 3d8765d4

由 Gustavo A. R. Silva 提交于 11月 02, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

3d8765d4

xen: xenbus_probe_frontend: mark expected switch fall-throughs · 5fa916f7

由 Gustavo A. R. Silva 提交于 11月 02, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 146562
Addresses-Coverity-ID: 146563
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

5fa916f7

xen/time: do not decrease steal time after live migration on xen · 5e25f5db

由 Dongli Zhang 提交于 11月 01, 2017

After guest live migration on xen, steal time in /proc/stat
(cpustat[CPUTIME_STEAL]) might decrease because steal returned by
xen_steal_lock() might be less than this_rq()->prev_steal_time which is
derived from previous return value of xen_steal_clock().

For instance, steal time of each vcpu is 335 before live migration.

cpu  198 0 368 200064 1962 0 0 1340 0 0
cpu0 38 0 81 50063 492 0 0 335 0 0
cpu1 65 0 97 49763 634 0 0 335 0 0
cpu2 38 0 81 50098 462 0 0 335 0 0
cpu3 56 0 107 50138 374 0 0 335 0 0

After live migration, steal time is reduced to 312.

cpu  200 0 370 200330 1971 0 0 1248 0 0
cpu0 38 0 82 50123 500 0 0 312 0 0
cpu1 65 0 97 49832 634 0 0 312 0 0
cpu2 39 0 82 50167 462 0 0 312 0 0
cpu3 56 0 107 50207 374 0 0 312 0 0

Since runstate times are cumulative and cleared during xen live migration
by xen hypervisor, the idea of this patch is to accumulate runstate times
to global percpu variables before live migration suspend. Once guest VM is
resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new
runstate times and previously accumulated times stored in global percpu
variables.

Comment above HYPERVISOR_suspend() has been removed as it is inaccurate:
the call can return an error code (e.g., possibly -EPERM in the future).

Similar and more severe issue would impact prior linux 4.8-4.10 as
discussed by Michael Las at
https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
which would overflow steal time and lead to 100% st usage in top command
for linux 4.8-4.10. A backport of this patch would fix that issue.

[boris: added linux/slab.h to driver/xen/time.c, slightly reformatted
        commit message]

References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guestSigned-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

5e25f5db

31 10月, 2017 14 次提交

xen: support 52 bit physical addresses in pv guests · 6f0e8bf1

由 Juergen Gross 提交于 10月 27, 2017

Physical addresses on processors supporting 5 level paging can be up to
52 bits wide. For a Xen pv guest running on such a machine those
physical addresses have to be supported in order to be able to use any
memory on the machine even if the guest itself does not support 5 level
paging.

So when reading/writing a MFN from/to a pte don't use the kernel's
PTE_PFN_MASK but a new XEN_PTE_MFN_MASK allowing full 40 bit wide MFNs.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

6f0e8bf1

xen: introduce a Kconfig option to enable the pvcalls frontend · 5eee149a

由 Stefano Stabellini 提交于 10月 30, 2017

Also add pvcalls-front to the Makefile.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

5eee149a

xen/pvcalls: implement release command · 235a71c5

由 Stefano Stabellini 提交于 10月 30, 2017

Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
in_mutex and out_mutex to avoid concurrent accesses. Then, free the
socket.

For passive sockets, check whether we have already pre-allocated an
active socket for the purpose of being accepted. If so, free that as
well.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

235a71c5

xen/pvcalls: implement poll command · 5842c835

由 Stefano Stabellini 提交于 10月 30, 2017

For active sockets, check the indexes and use the inflight_conn_req
waitqueue to wait.

For passive sockets if an accept is outstanding
(PVCALLS_FLAG_ACCEPT_INFLIGHT), check if it has been answered by looking
at bedata->rsp[req_id]. If so, return POLLIN.  Otherwise use the
inflight_accept_req waitqueue.

If no accepts are inflight, send PVCALLS_POLL to the backend. If we have
outstanding POLL requests awaiting for a response use the inflight_req
waitqueue: inflight_req is awaken when a new response is received; on
wakeup we check whether the POLL response is arrived by looking at the
PVCALLS_FLAG_POLL_RET flag. We set the flag from
pvcalls_front_event_handler, if the response was for a POLL command.

In pvcalls_front_event_handler, get the struct sock_mapping from the
poll id (we previously converted struct sock_mapping* to uintptr_t and
used it as id).
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

5842c835

xen/pvcalls: implement recvmsg · ae0d0405

由 Stefano Stabellini 提交于 10月 30, 2017

Implement recvmsg by copying data from the "in" ring. If not enough data
is available and the recvmsg call is blocking, then wait on the
inflight_conn_req waitqueue. Take the active socket in_mutex so that
only one function can access the ring at any given time.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

ae0d0405

xen/pvcalls: implement sendmsg · 45ddce21

由 Stefano Stabellini 提交于 10月 30, 2017

Send data to an active socket by copying data to the "out" ring. Take
the active socket out_mutex so that only one function can access the
ring at any given time.

If not enough room is available on the ring, rather than returning
immediately or sleep-waiting, spin for up to 5000 cycles. This small
optimization turns out to improve performance significantly.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

45ddce21

xen/pvcalls: implement accept command · 9774c6cc

由 Stefano Stabellini 提交于 10月 30, 2017

Introduce a waitqueue to allow only one outstanding accept command at
any given time and to implement polling on the passive socket. Introduce
a flags field to keep track of in-flight accept and poll commands.

Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
sure that only one accept command is executed at any given time by
setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
inflight_accept_req waitqueue.

Convert the new struct sock_mapping pointer into an uintptr_t and use it
as id for the new socket to pass to the backend.

Check if the accept call is non-blocking: in that case after sending the
ACCEPT command to the backend store the sock_mapping pointer of the new
struct and the inflight req_id then return -EAGAIN (which will respond
only when there is something to accept). Next time accept is called,
we'll check if the ACCEPT command has been answered, if so we'll pick up
where we left off, otherwise we return -EAGAIN again.

Note that, differently from the other commands, we can use
wait_event_interruptible (instead of wait_event) in the case of accept
as we are able to track the req_id of the ACCEPT response that we are
waiting.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

9774c6cc

xen/pvcalls: implement listen command · 1853f11d

由 Stefano Stabellini 提交于 10月 30, 2017

Send PVCALLS_LISTEN to the backend.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

1853f11d

xen/pvcalls: implement bind command · 67ea9893

由 Stefano Stabellini 提交于 10月 30, 2017

Send PVCALLS_BIND to the backend. Introduce a new structure, part of
struct sock_mapping, to store information specific to passive sockets.

Introduce a status field to keep track of the status of the passive
socket.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

67ea9893

xen/pvcalls: implement connect command · cb1c7d9b

由 Stefano Stabellini 提交于 10月 30, 2017

Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for
the active socket.

Introduce fields in struct sock_mapping to keep track of active sockets.
Introduce a waitqueue to allow the frontend to wait on data coming from
the backend on the active socket (recvmsg command).

Two mutexes (one of reads and one for writes) will be used to protect
the active socket in and out rings from concurrent accesses.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

cb1c7d9b

xen/pvcalls: implement socket command and handle events · 2195046b

由 Stefano Stabellini 提交于 10月 30, 2017

Send a PVCALLS_SOCKET command to the backend, use the masked
req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
ready for the response, and there cannot be two outstanding responses
with the same req_id.

Wait for the response by waiting on the inflight_req waitqueue and
check for the req_id field in rsp[req_id]. Use atomic accesses and
barriers to read the field. Note that the barriers are simple smp
barriers (as opposed to virt barriers) because they are for internal
frontend synchronization, not frontend<->backend communication.

Once a response is received, clear the corresponding rsp slot by setting
req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid
only from the frontend point of view. It is not part of the PVCalls
protocol.

pvcalls_front_event_handler is in charge of copying responses from the
ring to the appropriate rsp slot. It is done by copying the body of the
response first, then by copying req_id atomically. After the copies,
wake up anybody waiting on waitqueue.

socket_lock protects accesses to the ring.

Convert the pointer to sock_mapping into an uintptr_t and use it as
id for the new socket to pass to the backend. The struct will be fully
initialized later on connect or bind.

sock->sk->sk_send_head is not used for ip sockets: reuse the field to
store a pointer to the struct sock_mapping corresponding to the socket.
This way, we can easily get the struct sock_mapping from the struct
socket.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

2195046b

xen/pvcalls: connect to the backend · 21968190

由 Stefano Stabellini 提交于 10月 30, 2017

Implement the probe function for the pvcalls frontend. Read the
supported versions, max-page-order and function-calls nodes from
xenstore.

Only one frontend<->backend connection is supported at any given time
for a guest. Store the active frontend device to a static pointer.

Introduce a stub functions for the event handler.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

21968190

xen/pvcalls: implement frontend disconnect · aa7ba376

由 Stefano Stabellini 提交于 10月 30, 2017

Introduce a data structure named pvcalls_bedata. It contains pointers to
the command ring, the event channel, a list of active sockets and a list
of passive sockets. Lists accesses are protected by a spin_lock.

Introduce a waitqueue to allow waiting for a response on commands sent
to the backend.

Introduce an array of struct xen_pvcalls_response to store commands
responses.

Introduce a new struct sock_mapping to keep track of sockets.  In this
patch the struct sock_mapping is minimal, the fields will be added by
the next patches.

pvcalls_refcount is used to keep count of the outstanding pvcalls users.
Only remove connections once the refcount is zero.

Implement pvcalls frontend removal function. Go through the list of
active and passive sockets and free them all, one at a time.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

aa7ba376

xen/pvcalls: introduce the pvcalls xenbus frontend · 416efba0

由 Stefano Stabellini 提交于 10月 30, 2017

Introduce a xenbus frontend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.
Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

416efba0

30 10月, 2017 1 次提交
- L
  
  Linux 4.14-rc7 · 0b07194b
  由 Linus Torvalds 提交于 10月 29, 2017
  
  0b07194b
29 10月, 2017 20 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 19e12196

由 Linus Torvalds 提交于 10月 29, 2017

Pull networking fixes from David Miller:

 1) Fix route leak in xfrm_bundle_create().

 2) In mac80211, validate user rate mask before configuring it. From
    Johannes Berg.

 3) Properly enforce memory limits in fair queueing code, from Toke
    Hoiland-Jorgensen.

 4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.

 5) Fix TSO header allocation and management in mvpp2 driver, from Yan
    Markman.

 6) Don't take socket lock in BH handler in strparser code, from Tom
    Herbert.

 7) Don't show sockets from other namespaces in AF_UNIX code, from
    Andrei Vagin.

 8) Fix double free in error path of tap_open(), from Girish Moodalbail.

 9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
    and Alexander Duyck.

10) Fix DCB mode programming in stmmac driver, from Jose Abreu.

11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
    Long.

12) Properly align SKB head before building SKB in tuntap, from Jason
    Wang.

13) Avoid matching qdiscs with a zero handle during lookups, from Cong
    Wang.

14) Fix various endianness bugs in sctp, from Xin Long.

15) Fix tc filter callback races and add selftests which trigger the
    problem, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  selftests: Introduce a new test case to tc testsuite
  selftests: Introduce a new script to generate tc batch file
  net_sched: fix call_rcu() race on act_sample module removal
  net_sched: add rtnl assertion to tcf_exts_destroy()
  net_sched: use tcf_queue_work() in tcindex filter
  net_sched: use tcf_queue_work() in rsvp filter
  net_sched: use tcf_queue_work() in route filter
  net_sched: use tcf_queue_work() in u32 filter
  net_sched: use tcf_queue_work() in matchall filter
  net_sched: use tcf_queue_work() in fw filter
  net_sched: use tcf_queue_work() in flower filter
  net_sched: use tcf_queue_work() in flow filter
  net_sched: use tcf_queue_work() in cgroup filter
  net_sched: use tcf_queue_work() in bpf filter
  net_sched: use tcf_queue_work() in basic filter
  net_sched: introduce a workqueue for RCU callbacks of tc filter
  sctp: fix some type cast warnings introduced since very beginning
  sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
  sctp: fix some type cast warnings introduced by transport rhashtable
  sctp: fix some type cast warnings introduced by stream reconf
  ...

19e12196

Merge branch 'net_sched-fix-races-with-RCU-callbacks' · 6c325f4e

由 David S. Miller 提交于 10月 29, 2017

Cong Wang says:

====================
net_sched: fix races with RCU callbacks

Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:

1. A simple one fixed by Daniel:

commit c78e1746
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed May 20 17:13:33 2015 +0200

    net: sched: fix call_rcu() race on classifier module unloads

2. A very nasty one fixed by me:

commit 1697c4bb
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon Sep 11 16:33:32 2017 -0700

    net_sched: carefully handle tcf_block_put()

3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/
https://patchwork.ozlabs.org/patch/826695/

Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs.

Alternatively, we could also:

a) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible (if not
impossible).

b) Just get rid of these RCU callbacks, because they are not
necessary at all, callers of these call_rcu() are all on slow paths
and holding RTNL lock, so blocking is allowed in their contexts.
However, David and Eric dislike adding synchronize_rcu() here.

As suggested by Paul, we could defer the work to a workqueue and
gain the permission of holding RTNL again without any performance
impact, however, in tcf_block_put() we could have a deadlock when
flushing workqueue while hodling RTNL lock, the trick here is to
defer the work itself in workqueue and make it queued after all
other works so that we keep the same ordering to avoid any
use-after-free. Please see the first patch for details.

Patch 1 introduces the infrastructure, patch 2~12 move each
tc filter to the new tc filter workqueue, patch 13 adds
an assertion to catch potential bugs like this, patch 14
closes another rcu callback race, patch 15 and patch 16 add
new test cases.
====================
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c325f4e

selftests: Introduce a new test case to tc testsuite · 31c2611b

由 Chris Mi 提交于 10月 26, 2017

In this patchset, we fixed a tc bug. This patch adds the test case
that reproduces the bug. To run this test case, user should specify
an existing NIC device:
  # sudo ./tdc.py -d enp4s0f0

This test case belongs to category "flower". If user doesn't specify
a NIC device, the test cases belong to "flower" will not be run.

In this test case, we create 1M filters and all filters share the same
action. When destroying all filters, kernel should not panic. It takes
about 18s to run it.
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NLucas Bates <lucasb@mojatatu.com>
Signed-off-by: NChris Mi <chrism@mellanox.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31c2611b

selftests: Introduce a new script to generate tc batch file · 7f071998

由 Chris Mi 提交于 10月 26, 2017

  # ./tdc_batch.py -h
  usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file

  TC batch file generator

  positional arguments:
    device                device name
    file                  batch file name

  optional arguments:
    -h, --help            show this help message and exit
    -n NUMBER, --number NUMBER
                          how many lines in batch file
    -o, --skip_sw         skip_sw (offload), by default skip_hw
    -s, --share_action    all filters share the same action
    -p, --prio            all filters have different prio
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NLucas Bates <lucasb@mojatatu.com>
Signed-off-by: NChris Mi <chrism@mellanox.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f071998

net_sched: fix call_rcu() race on act_sample module removal · 46e235c1

由 Cong Wang 提交于 10月 26, 2017

Similar to commit c78e1746
("net: sched: fix call_rcu() race on classifier module unloads"),
we need to wait for flying RCU callback tcf_sample_cleanup_rcu().

Cc: Yotam Gigi <yotamg@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46e235c1

net_sched: add rtnl assertion to tcf_exts_destroy() · 2d132eba

由 Cong Wang 提交于 10月 26, 2017

After previous patches, it is now safe to claim that
tcf_exts_destroy() is always called with RTNL lock.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d132eba

net_sched: use tcf_queue_work() in tcindex filter · 27ce4f05

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27ce4f05

net_sched: use tcf_queue_work() in rsvp filter · d4f84a41

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4f84a41

net_sched: use tcf_queue_work() in route filter · c2f3f31d

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2f3f31d

net_sched: use tcf_queue_work() in u32 filter · c0d378ef

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0d378ef

net_sched: use tcf_queue_work() in matchall filter · df2735ee

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df2735ee

net_sched: use tcf_queue_work() in fw filter · e071dff2

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e071dff2

net_sched: use tcf_queue_work() in flower filter · 0552c8af

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0552c8af

net_sched: use tcf_queue_work() in flow filter · 94cdb475

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94cdb475

net_sched: use tcf_queue_work() in cgroup filter · b1b5b04f

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b5b04f

net_sched: use tcf_queue_work() in bpf filter · e910af67

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e910af67

net_sched: use tcf_queue_work() in basic filter · c96a4838

由 Cong Wang 提交于 10月 26, 2017

Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c96a4838

net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d

由 Cong Wang 提交于 10月 26, 2017

This patch introduces a dedicated workqueue for tc filters
so that each tc filter's RCU callback could defer their
action destroy work to this workqueue. The helper
tcf_queue_work() is introduced for them to use.

Because we hold RTNL lock when calling tcf_block_put(), we
can not simply flush works inside it, therefore we have to
defer it again to this workqueue and make sure all flying RCU
callbacks have already queued their work before this one, in
other words, to ensure this is the last one to execute to
prevent any use-after-free.

On the other hand, this makes tcf_block_put() ugly and
harder to understand. Since David and Eric strongly dislike
adding synchronize_rcu(), this is probably the only
solution that could make everyone happy.

Please also see the code comments below.
Reported-by: NChris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7aa0045d

Merge branch 'sctp-endianness-fixes' · 8c83c885

由 David S. Miller 提交于 10月 29, 2017

Xin Long says:

====================
sctp: a bunch of fixes for some sparse warnings

As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
warnings or errors checked by sparse appear. They are all problems
about Endian and type cast.

Most of them are just warnings by which no issues could be caused
while some might be bugs.

This patchset fixes them with four patches basically according to
how they are introduced.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c83c885

sctp: fix some type cast warnings introduced since very beginning · 978aa047

由 Xin Long 提交于 10月 28, 2017

These warnings were found by running 'make C=2 M=net/sctp/'.
They are there since very beginning.

Note after this patch, there still one warning left in
sctp_outq_flush():
  sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)

Since it has been moved to sctp_stream_outq_migrate on net-next,
to avoid the extra job when merging net-next to net, I will post
the fix for it after the merging is done.
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

978aa047

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功