提交 · ea3bea3a1d38aab1542176b2ff11a99ce3db9656 · openanolis / cloud-kernel

26 9月, 2015 13 次提交

tcp/dccp: constify rtx_synack() and friends · ea3bea3a

由 Eric Dumazet 提交于 9月 25, 2015

This is done to make sure we do not change listener socket
while sending SYNACK packets while socket lock is not held.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea3bea3a

tcp: constify tcp_v{4|6}_send_synack() socket argument · 0f935dbe

由 Eric Dumazet 提交于 9月 25, 2015

This documents fact that listener lock might not be held
at the time SYNACK are sent.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f935dbe

ipv6: constify ip6_xmit() sock argument · 1c1e9d2b

由 Eric Dumazet 提交于 9月 25, 2015

This is to document that socket lock might not be held at this point.

skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
or spinlocks, so we promote the socket to non const when calling them.

netfilter hooks should never assume socket lock is held,
we also promote the socket to non const.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c1e9d2b

tcp: constify tcp_make_synack() socket argument · 5d062de7

由 Eric Dumazet 提交于 9月 25, 2015

listener socket is not locked when tcp_make_synack() is called.

We better make sure no field is written.

There is one exception : Since SYNACK packets are attached to the listener
at this moment (or SYN_RECV child in case of Fast Open),
sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using
atomic operations so this is safe.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d062de7

ip: constify ip_build_and_send_pkt() socket argument · cfe673b0

由 Eric Dumazet 提交于 9月 25, 2015

This function is used to build and send SYNACK packets,
possibly on behalf of unlocked listener socket.

Make sure we did not miss a write by making this socket const.

We no longer can use ip_select_ident() and have to either
set iph->id to 0 or directly call __ip_select_ident()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfe673b0

tcp: md5: constify tcp_md5_do_lookup() socket argument · b83e3deb

由 Eric Dumazet 提交于 9月 25, 2015

When TCP new listener is done, these functions will be called
without socket lock being held. Make sure they don't change
anything.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b83e3deb

inet: constify ip_dont_fragment() arguments · 4e3f5d72

由 Eric Dumazet 提交于 9月 25, 2015

ip_dont_fragment() can accept const socket and dst
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e3f5d72

ipv6: constify inet6_csk_route_req() socket argument · 30d50c61

由 Eric Dumazet 提交于 9月 25, 2015

socket is not modified, make it const so that callers can
do the same if they need.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30d50c61

ipv6: constify ip6_dst_lookup_{flow|tail}() sock arguments · 3aef934f

由 Eric Dumazet 提交于 9月 25, 2015

ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch
socket, lets add a const qualifier.

This will permit the same change in inet6_csk_route_req()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3aef934f

inet: constify inet_csk_route_req() socket argument · e5895bc6

由 Eric Dumazet 提交于 9月 25, 2015

This is used by TCP listener core, and listener socket shall
not be modified by inet_csk_route_req().
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5895bc6

inet: constify ip_route_output_flow() socket argument · 6f9c9615

由 Eric Dumazet 提交于 9月 25, 2015

Very soon, TCP stack might call inet_csk_route_req(), which
calls inet_csk_route_req() with an unlocked listener socket,
so we need to make sure ip_route_output_flow() is not trying to
change any field from its socket argument.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9c9615

tcp: constify tcp_openreq_init_rwin() · b1964b5f

由 Eric Dumazet 提交于 9月 25, 2015

Soon, listener socket wont be locked when tcp_openreq_init_rwin()
is called. We need to read socket fields once, as their value
could change under us.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1964b5f

tcp: constify listener socket in tcp_v[46]_init_req() · b40cf18e

由 Eric Dumazet 提交于 9月 25, 2015

Soon, listener socket spinlock will no longer be held,
add const arguments to tcp_v[46]_init_req() to make clear these
functions can not mess socket fields.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b40cf18e

25 9月, 2015 9 次提交

switchdev: reduce transaction phase enum down to a boolean · f623ab7f

由 Jiri Pirko 提交于 9月 24, 2015

Now, since we have only 2 values for transaction phase, just use bool.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f623ab7f

switchdev: remove "ABORT" transaction phase · 9f6467cf

由 Jiri Pirko 提交于 9月 24, 2015

No longer used by drivers, as transaction queue with item destructors
takes care of abort phase internally in switchdev code. So kill it.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f6467cf

switchdev: remove "NONE" transaction phase · 2b8a61a6

由 Jiri Pirko 提交于 9月 24, 2015

Shouldn't have been there in the first place. Now it is unused, kill it.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b8a61a6

switchdev: add switchdev_trans_ph_prepare/commit helpers · 8bdb4272

由 Jiri Pirko 提交于 9月 24, 2015

Add helpers which should be used int attr_set/obj_add switchdev ops to
check the phase of transaction.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bdb4272

switchdev: move transaction phase enum under transaction structure · f8db8348

由 Jiri Pirko 提交于 9月 24, 2015

Before it disappears completely, move transaction phase enum under
transaction structure and make attr/obj structures a bit cleaner.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8db8348

switchdev: introduce transaction item queue for attr_set and obj_add · 7ea6eb3f

由 Jiri Pirko 提交于 9月 24, 2015

Now, the memory allocation in prepare/commit state is done separatelly
in each driver (rocker). Introduce the similar mechanism in generic
switchdev code, in form of queue. That can be used not only for memory
allocations, but also for different items. Abort item destruction
is handled as well.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ea6eb3f

switchdev: rename "trans" to "trans_ph". · 69f5df49

由 Jiri Pirko 提交于 9月 24, 2015

This is temporary, name "trans" will be used for something else and
"trans_ph" will eventually disappear.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69f5df49

ipv6: remove unused neigh parameter from ndisc functions · 38cf595b

由 Jiri Benc 提交于 9月 22, 2015

Since commit 12fd84f4 ("ipv6: Remove unused neigh argument for
icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na
and ndisc_send_ns is unused.

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38cf595b

genetlink: simplify genl_notify · 92c14d9b

由 Jiri Benc 提交于 9月 22, 2015

The genl_notify function has too many arguments for no real reason - all
callers use genl_info to get them anyway. Just pass the genl_info down to
genl_notify.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92c14d9b

24 9月, 2015 1 次提交

net/ethoc: support big-endian register layout · 06e60e59

由 Max Filippov 提交于 9月 22, 2015

Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06e60e59

22 9月, 2015 2 次提交

tcp: usec resolution SYN/ACK RTT · 0f1c28ae

由 Yuchung Cheng 提交于 9月 18, 2015

Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK
RTT is often measured as 0ms or sometimes 1ms, which would affect
RTT estimation and min RTT samping used by some congestion control.

This patch improves SYN/ACK RTT to be usec resolution if platform
supports it. While the timestamping of SYN/ACK is done in request
sock, the RTT measurement is carefully arranged to avoid storing
another u64 timestamp in tcp_sock.

For regular handshake w/o SYNACK retransmission, the RTT is sampled
right after the child socket is created and right before the request
sock is released (tcp_check_req() in tcp_minisocks.c)

For Fast Open the child socket is already created when SYN/ACK was
sent, the RTT is sampled in tcp_rcv_state_process() after processing
the final ACK an right before the request socket is released.

If the SYN/ACK was retransmistted or SYN-cookie was used, we rely
on TCP timestamps to measure the RTT. The sample is taken at the
same place in tcp_rcv_state_process() after the timestamp values
are validated in tcp_validate_incoming(). Note that we do not store
TS echo value in request_sock for SYN-cookies, because the value
is already stored in tp->rx_opt used by tcp_ack_update_rtt().

One side benefit is that the RTT measurement now happens before
initializing congestion control (of the passive side). Therefore
the congestion control can use the SYN/ACK RTT.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f1c28ae

s390/iucv: do not use arrays as argument · 91e60eb6

由 Ursula Braun 提交于 9月 18, 2015

The iucv code uses arrays as arguments. Even though this does not
really cause a problem, it could be misleading, since the compiler
turns array arguments into just a pointer argument. To be more
precise this patch changes the array arguments into pointers.
Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91e60eb6

19 9月, 2015 8 次提交

netfilter: Pass net into nf_xfrm_me_harder · c7af6483

由 Eric W. Biederman 提交于 9月 18, 2015

Instead of calling dev_net on a likley looking network device
pass state->net into nf_xfrm_me_harder.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c7af6483

netfilter: Pass priv instead of nf_hook_ops to netfilter hooks · 06198b34

由 Eric W. Biederman 提交于 9月 18, 2015

Only pass the void *priv parameter out of the nf_hook_ops. That is
all any of the functions are interested now, and by limiting what is
passed it becomes simpler to change implementation details.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

06198b34

netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tuple · a31f1adc

由 Eric W. Biederman 提交于 9月 18, 2015

As gre does not have the srckey in the packet gre_pkt_to_tuple
needs to perform a lookup in it's per network namespace tables.

Pass in the proper network namespace to all pkt_to_tuple
implementations to ensure gre (and any similar protocols) can get this
right.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a31f1adc

act_connmark: Remember the struct net instead of guessing it. · a4ffe319

由 Eric W. Biederman 提交于 9月 18, 2015

Stop guessing the struct net instead of remember it.  Guessing is just
silly and will be problematic in the future when I implement routes
between network namespaces.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a4ffe319

netfilter: Pass net to nf_dup_ipv4 and nf_dup_ipv6 · 206e8c00

由 Eric W. Biederman 提交于 9月 18, 2015

This allows them to stop guessing the network namespace with pick_net.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

206e8c00

netfilter: nf_tables: Pass struct net in nft_pktinfo · 46448d00

由 Eric W. Biederman 提交于 9月 18, 2015

nft_pktinfo is passed on the stack so this does not bloat any in core
data structures.

By centrally computing this information this makes maintence of the code
simpler, and understading of the code easier.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

46448d00

netfilter: x_tables: Pass struct net in xt_action_param · 156c196f

由 Eric W. Biederman 提交于 9月 18, 2015

As xt_action_param lives on the stack this does not bloat any
persistent data structures.

This is a first step in making netfilter code that needs to know
which network namespace it is executing in simpler.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

156c196f

netfilter: nf_tables: kill nft_pktinfo.ops · 6aa187f2

由 Eric W. Biederman 提交于 9月 18, 2015

- Add nft_pktinfo.pf to replace ops->pf
- Add nft_pktinfo.hook to replace ops->hooknum

This simplifies the code, makes it more readable, and likely reduces
cache line misses.  Maintainability is enhanced as the details of
nft_hook_ops are of no concern to the recpients of nft_pktinfo.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6aa187f2

18 9月, 2015 7 次提交

Bluetooth: Add BT_ERR_RATELIMITED · e781b7f7

由 Szymon Janc 提交于 9月 16, 2015

This patch adds ratelimited version of the BT_ERR macro.
Signed-off-by: NSzymon Janc <ext.szymon.janc@tieto.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

e781b7f7

bpf: add bpf_redirect() helper · 27b29f63

由 Alexei Starovoitov 提交于 9月 15, 2015

Existing bpf_clone_redirect() helper clones skb before redirecting
it to RX or TX of destination netdev.
Introduce bpf_redirect() helper that does that without cloning.

Benchmarked with two hosts using 10G ixgbe NICs.
One host is doing line rate pktgen.
Another host is configured as:
$ tc qdisc add dev $dev ingress
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
   action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
so it receives the packet on $dev and immediately xmits it on $dev + 1
The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
that does bpf_clone_redirect() and performance is 2.0 Mpps

$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
   action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
which is using bpf_redirect() - 2.4 Mpps

and using cls_bpf with integrated actions as:
$ tc filter add dev $dev root pref 10 \
  bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
performance is 2.5 Mpps

To summarize:
u32+act_bpf using clone_redirect - 2.0 Mpps
u32+act_bpf using redirect - 2.4 Mpps
cls_bpf using redirect - 2.5 Mpps

For comparison linux bridge in this setup is doing 2.1 Mpps
and ixgbe rx + drop in ip_rcv - 7.8 Mpps
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27b29f63

cls_bpf: introduce integrated actions · 045efa82

由 Daniel Borkmann 提交于 9月 15, 2015

Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.

Then more interesting programs like the following are easier to write:
int cls_bpf_prog(struct __sk_buff *skb)
{
  /* classify arp, ip, ipv6 into different traffic classes
   * and drop all other packets
   */
  switch (skb->protocol) {
  case htons(ETH_P_ARP):
    skb->tc_classid = 1;
    break;
  case htons(ETH_P_IP):
    skb->tc_classid = 2;
    break;
  case htons(ETH_P_IPV6):
    skb->tc_classid = 3;
    break;
  default:
    return TC_ACT_SHOT;
  }

  return TC_ACT_OK;
}

Joint work with Daniel Borkmann.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

045efa82

tcp: provide skb->hash to synack packets · 58d607d3

由 Eric Dumazet 提交于 9月 15, 2015

In commit b73c3d0e ("net: Save TX flow hash in sock and set in skbuf
on xmit"), Tom provided a l4 hash to most outgoing TCP packets.

We'd like to provide one as well for SYNACK packets, so that all packets
of a given flow share same txhash, to later enable bonding driver to
also use skb->hash to perform slave selection.

Note that a SYNACK retransmit shuffles the tx hash, as Tom did
in commit 265f94ff ("net: Recompute sk_txhash on negative routing
advice") for established sockets.

This has nice effect making TCP flows resilient to some kind of black
holes, even at connection establish phase.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58d607d3

netfilter: Pass net into okfn · 0c4b51f0

由 Eric W. Biederman 提交于 9月 15, 2015

This is immediately motivated by the bridge code that chains functions that
call into netfilter.  Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn.  For the moment dst_output_okfn
just silently drops the struct net.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c4b51f0

net: Merge dst_output and dst_output_sk · 5a70649e

由 Eric W. Biederman 提交于 9月 15, 2015

Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb->sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a70649e

xfrm: Remove unused afinfo method init_dst · a6568b24

由 Eric W. Biederman 提交于 9月 15, 2015

Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6568b24

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功