提交 · 8c98d571bb0e9717fd7be7242945e8e0abebbaa3 · openanolis / cloud-kernel

05 8月, 2017 16 次提交

net: sched: cls_route: no need to call tcf_exts_change for newly allocated struct · 8c98d571

由 Jiri Pirko 提交于 8月 04, 2017

As the f struct was allocated right before route4_set_parms call,
no need to use tcf_exts_change to do atomic change, and we can just
fill-up the unused exts struct directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c98d571

net: sched: cls_flow: no need to call tcf_exts_change for newly allocated struct · c09fc2e1

由 Jiri Pirko 提交于 8月 04, 2017

As the fnew struct just was allocated, so no need to use tcf_exts_change
to do atomic change, and we can just fill-up the unused exts struct
directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c09fc2e1

net: sched: cls_cgroup: no need to call tcf_exts_change for newly allocated struct · 8cc62513

由 Jiri Pirko 提交于 8月 04, 2017

As the new struct just was allocated, so no need to use tcf_exts_change
to do atomic change, and we can just fill-up the unused exts struct
directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cc62513

net: sched: cls_bpf: no need to call tcf_exts_change for newly allocated struct · 6839da32

由 Jiri Pirko 提交于 8月 04, 2017

As the prog struct was allocated right before cls_bpf_set_parms call,
no need to use tcf_exts_change to do atomic change, and we can just
fill-up the unused exts struct directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6839da32

net: sched: cls_basic: no need to call tcf_exts_change for newly allocated struct · ff1f8ca0

由 Jiri Pirko 提交于 8月 04, 2017

As the f struct was allocated right before basic_set_parms call, no need
to use tcf_exts_change to do atomic change, and we can just fill-up
the unused exts struct directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff1f8ca0

net: sched: cls_matchall: no need to call tcf_exts_change for newly allocated struct · a74cb369

由 Jiri Pirko 提交于 8月 04, 2017

As the head struct was allocated right before mall_set_parms call,
no need to use tcf_exts_change to do atomic change, and we can just
fill-up the unused exts struct directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a74cb369

net: sched: cls_fw: no need to call tcf_exts_change for newly allocated struct · 94611bff

由 Jiri Pirko 提交于 8月 04, 2017

As the f struct was allocated right before fw_set_parms call, no need
to use tcf_exts_change to do atomic change, and we can just fill-up
the unused exts struct directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94611bff

net: sched: cls_flower: no need to call tcf_exts_change for newly allocated struct · 45507529

由 Jiri Pirko 提交于 8月 04, 2017

As the f struct was allocated right before fl_set_parms call, no need
to use tcf_exts_change to do atomic change, and we can just fill-up
the unused exts struct directly by tcf_exts_validate.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45507529

net: sched: cls_fw: rename fw_change_attrs function · 1e5003af

由 Jiri Pirko 提交于 8月 04, 2017

Since the function name is misleading since it is not changing
anything, name it similarly to other cls.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e5003af

net: sched: cls_bpf: rename cls_bpf_modify_existing function · 6a725c48

由 Jiri Pirko 提交于 8月 04, 2017

The name cls_bpf_modify_existing is highly misleading, as it indeed does
not modify anything existing. It does not modify at all.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a725c48

net: sched: use tcf_exts_has_actions instead of exts->nr_actions · 978dfd8d

由 Jiri Pirko 提交于 8月 04, 2017

For check in tcf_exts_dump use tcf_exts_has_actions helper instead
of exts->nr_actions for checking if there are any actions present.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

978dfd8d

net: sched: remove check for number of actions in tcf_exts_exec · ec1a9cca

由 Jiri Pirko 提交于 8月 04, 2017

Leave it to tcf_action_exec to return TC_ACT_OK in case there is no
action present.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec1a9cca

net: sched: remove redundant helpers tcf_exts_is_predicative and tcf_exts_is_available · 6fc6d06e

由 Jiri Pirko 提交于 8月 04, 2017

These two helpers are doing the same as tcf_exts_has_actions, so remove
them and use tcf_exts_has_actions instead.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fc6d06e

net: sched: change names of action number helpers to be aligned with the rest · 3bcc0cec

由 Jiri Pirko 提交于 8月 04, 2017

The rest of the helpers are named tcf_exts_*, so change the name of
the action number helpers to be aligned. While at it, change to inline
functions.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bcc0cec

net: sched: remove unneeded tcf_em_tree_change · 4ebc1e3c

由 Jiri Pirko 提交于 8月 04, 2017

Since tcf_em_tree_validate could be always called on a newly created
filter, there is no need for this change function.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ebc1e3c

net: sched: sch_atm: use Qdisc_class_common structure · f7ebdff7

由 Jiri Pirko 提交于 8月 04, 2017

Even if it is only for classid now, use this common struct a be aligned
with the rest of the classful qdiscs.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7ebdff7

04 8月, 2017 24 次提交

tcp: enable MSG_ZEROCOPY · f214f915

由 Willem de Bruijn 提交于 8月 03, 2017

Enable support for MSG_ZEROCOPY to the TCP stack. TSO and GSO are
both supported. Only data sent to remote destinations is sent without
copying. Packets looped onto a local destination have their payload
copied to avoid unbounded latency.

Tested:
  A 10x TCP_STREAM between two hosts showed a reduction in netserver
  process cycles by up to 70%, depending on packet size. Systemwide,
  savings are of course much less pronounced, at up to 20% best case.

  msg_zerocopy.sh 4 tcp:

  without zerocopy
    tx=121792 (7600 MB) txc=0 zc=n
    rx=60458 (7600 MB)

  with zerocopy
    tx=286257 (17863 MB) txc=286257 zc=y
    rx=140022 (17863 MB)

  This test opens a pair of sockets over veth, one one calls send with
  64KB and optionally MSG_ZEROCOPY and on the other reads the initial
  bytes. The receiver truncates, so this is strictly an upper bound on
  what is achievable. It is more representative of sending data out of
  a physical NIC (when payload is not touched, either).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f214f915

sock: ulimit on MSG_ZEROCOPY pages · a91dbff5

由 Willem de Bruijn 提交于 8月 03, 2017

Bound the number of pages that a user may pin.

Follow the lead of perf tools to maintain a per-user bound on memory
locked pages commit 789f90fc ("perf_counter: per user mlock gift")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a91dbff5

sock: MSG_ZEROCOPY notification coalescing · 4ab6c99d

由 Willem de Bruijn 提交于 8月 03, 2017

In the simple case, each sendmsg() call generates data and eventually
a zerocopy ready notification N, where N indicates the Nth successful
invocation of sendmsg() with the MSG_ZEROCOPY flag on this socket.

TCP and corked sockets can cause send() calls to append new data to an
existing sk_buff and, thus, ubuf_info. In that case the notification
must hold a range. odify ubuf_info to store a inclusive range [N..N+m]
and add skb_zerocopy_realloc() to optionally extend an existing range.

Also coalesce notifications in this common case: if a notification
[1, 1] is about to be queued while [0, 0] is the queue tail, just modify
the head of the queue to read [0, 1].

Coalescing is limited to a few TSO frames worth of data to bound
notification latency.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ab6c99d

sock: enable MSG_ZEROCOPY · 1f8b977a

由 Willem de Bruijn 提交于 8月 03, 2017

Prepare the datapath for refcounted ubuf_info. Clone ubuf_info with
skb_zerocopy_clone() wherever needed due to skb split, merge, resize
or clone.

Split skb_orphan_frags into two variants. The split, merge, .. paths
support reference counted zerocopy buffers, so do not do a deep copy.
Add skb_orphan_frags_rx for paths that may loop packets to receive
sockets. That is not allowed, as it may cause unbounded latency.
Deep copy all zerocopy copy buffers, ref-counted or not, in this path.

The exact locations to modify were chosen by exhaustively searching
through all code that might modify skb_frag references and/or the
the SKBTX_DEV_ZEROCOPY tx_flags bit.

The changes err on the safe side, in two ways.

(1) legacy ubuf_info paths virtio and tap are not modified. They keep
    a 1:1 ubuf_info to sk_buff relationship. Calls to skb_orphan_frags
    still call skb_copy_ubufs and thus copy frags in this case.

(2) not all copies deep in the stack are addressed yet. skb_shift,
    skb_split and skb_try_coalesce can be refined to avoid copying.
    These are not in the hot path and this patch is hairy enough as
    is, so that is left for future refinement.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f8b977a

sock: add SOCK_ZEROCOPY sockopt · 76851d12

由 Willem de Bruijn 提交于 8月 03, 2017

The send call ignores unknown flags. Legacy applications may already
unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
socket opts in to zerocopy.

Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
Processes can also query this socket option to detect kernel support
for the feature. Older kernels will return ENOPROTOOPT.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76851d12

sock: add MSG_ZEROCOPY · 52267790

由 Willem de Bruijn 提交于 8月 03, 2017

The kernel supports zerocopy sendmsg in virtio and tap. Expand the
infrastructure to support other socket types. Introduce a completion
notification channel over the socket error queue. Notifications are
returned with ee_origin SO_EE_ORIGIN_ZEROCOPY. ee_errno is 0 to avoid
blocking the send/recv path on receiving notifications.

Add reference counting, to support the skb split, merge, resize and
clone operations possible with SOCK_STREAM and other socket types.

The patch does not yet modify any datapaths.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52267790

sock: skb_copy_ubufs support for compound pages · 3ece7826

由 Willem de Bruijn 提交于 8月 03, 2017

Refine skb_copy_ubufs to support compound pages. With upcoming TCP
zerocopy sendmsg, such fragments may appear.

The existing code replaces each page one for one. Splitting each
compound page into an independent number of regular pages can result
in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.

Instead, fill all destination pages but the last to PAGE_SIZE.
Split the existing alloc + copy loop into separate stages:
1. compute bytelength and minimum number of pages to store this.
2. allocate
3. copy, filling each page except the last to PAGE_SIZE bytes
4. update skb frag array
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ece7826

sock: allocate skbs from optmem · 98ba0bd5

由 Willem de Bruijn 提交于 8月 03, 2017

Add sock_omalloc and sock_ofree to be able to allocate control skbs,
for instance for looping errors onto sk_error_queue.

The transmit budget (sk_wmem_alloc) is involved in transmit skb
shaping, most notably in TCP Small Queues. Using this budget for
control packets would impact transmission.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98ba0bd5

ipv6: fib: Add helpers to hold / drop a reference on rt6_info · a460aa83