提交 · 633547973ffc32fd2c815639d4675e1531f0896f · openeuler / raspberrypi-kernel

01 7月, 2017 2 次提交

net: convert sk_buff.users from atomic_t to refcount_t · 63354797

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63354797

net: convert nf_bridge_info.use from atomic_t to refcount_t · 53869ceb

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53869ceb

21 6月, 2017 1 次提交

net: introduce __skb_put_[zero, data, u8] · de77b966

由 yuan linyu 提交于 6月 18, 2017

follow Johannes Berg, semantic patch file as below,
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = __skb_put(skb, len);
+p = __skb_put_zero(skb, len);
|
-p = (t)__skb_put(skb, len);
+p = __skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
identifier p;
expression len;
expression skb;
type t;
@@
(
-t p = __skb_put(skb, len);
+t p = __skb_put_zero(skb, len);
)
... when != p
(
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = __skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
|
-p = (t *)__skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(__skb_put(skb, len), 0, len);
+__skb_put_zero(skb, len);

@@
expression skb, len, data;
@@
-memcpy(__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
expression SKB, C, S;
typedef u8;
identifier fn = {__skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);
Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de77b966

16 6月, 2017 6 次提交

networking: add and use skb_put_u8() · 634fef61

由 Johannes Berg 提交于 6月 16, 2017

Joe and Bjørn suggested that it'd be nicer to not have the
cast in the fairly common case of doing
	*(u8 *)skb_put(skb, 1) = c;

Add skb_put_u8() for this case, and use it across the code,
using the following spatch:

    @@
    expression SKB, C, S;
    typedef u8;
    identifier fn = {skb_put};
    fresh identifier fn2 = fn ## "_u8";
    @@
    - *(u8 *)fn(SKB, S) = C;
    + fn2(SKB, C);

Note that due to the "S", the spatch isn't perfect, it should
have checked that S is 1, but there's also places that use a
sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
out that nobody ever did something like
	*(u8 *)skb_put(skb, 2) = c;

which would be wrong anyway since the second byte wouldn't be
initialized.
Suggested-by: NJoe Perches <joe@perches.com>
Suggested-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

634fef61

networking: make skb_push & __skb_push return void pointers · d58ff351

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d58ff351

networking: make skb_pull & friends return void pointers · af72868b

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = {
            skb_pull,
            __skb_pull,
            skb_pull_inline,
            __pskb_pull_tail,
            __pskb_pull,
            pskb_pull
    };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = {
            skb_pull,
            __skb_pull,
            skb_pull_inline,
            __pskb_pull_tail,
            __pskb_pull,
            pskb_pull
    };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af72868b

networking: make skb_put & friends return void pointers · 4df864c1

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4df864c1

networking: introduce and use skb_put_data() · 59ae1d12

由 Johannes Berg 提交于 6月 16, 2017

A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.

An spatch similar to the one for skb_put_zero() converts many
of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

(again, manually post-processed to retain some comments)
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59ae1d12

skbuff: make skb_put_zero() return void · 83ad357d

由 Johannes Berg 提交于 6月 14, 2017

It's nicer to return void, since then there's no need to
cast to any structures. Currently none of the users have
a cast, but a number of future conversions do.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83ad357d

12 6月, 2017 2 次提交

udp: avoid a cache miss on dequeue · 0a463c78

由 Paolo Abeni 提交于 6月 12, 2017

Since UDP no more uses sk->destructor, we can clear completely
the skb head state before enqueuing. Amend and use
skb_release_head_state() for that.

All head states share a single cacheline, which is not
normally used/accesses on dequeue. We can avoid entirely accessing
such cacheline implementing and using in the UDP code a specialized
skb free helper which ignores the skb head state.

This saves a cacheline miss at skb deallocation time.

v1 -> v2:
  replaced secpath_reset() with skb_release_head_state()
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a463c78

net: factor out a helper to decrement the skb refcount · 3889a803

由 Paolo Abeni 提交于 6月 12, 2017

The same code is replicated in 3 different places; move it to a
common helper.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3889a803

05 6月, 2017 1 次提交

skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow · 48a1df65

由 Jason A. Donenfeld 提交于 6月 04, 2017

This is a defense-in-depth measure in response to bugs like
4d6fa57b ("macsec: avoid heap overflow in skb_to_sgvec"). There's
not only a potential overflow of sglist items, but also a stack overflow
potential, so we fix this by limiting the amount of recursion this function
is allowed to do. Not actually providing a bounded base case is a future
disaster that we can easily avoid here.

As a small matter of house keeping, we take this opportunity to move the
documentation comment over the actual function the documentation is for.

While this could be implemented by using an explicit stack of skbuffs,
when implementing this, the function complexity increased considerably,
and I don't think such complexity and bloat is actually worth it. So,
instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
and measured the stack usage there. I also reverted the recent MIPS
changes that give it a separate IRQ stack, so that I could experience
some worst-case situations. I found that limiting it to 24 layers deep
yielded a good stack usage with room for safety, as well as being much
deeper than any driver actually ever creates.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48a1df65

30 5月, 2017 1 次提交

skbuff/mac80211: introduce and use skb_put_zero() · e45a79da

由 Johannes Berg 提交于 5月 24, 2017

This pattern was introduced a number of times in mac80211 just now,
and since it's present in a number of other places it makes sense
to add a little helper for it.

This just adds the helper and transforms the mac80211 code, a later
patch will transform other places.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

e45a79da

22 5月, 2017 2 次提交

net: allow simultaneous SW and HW transmit timestamping · b50a5c70

由 Miroslav Lichvar 提交于 5月 19, 2017

Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
be looped to the socket's error queue with a software timestamp even
when a hardware transmit timestamp is expected to be provided by the
driver.

Applications using this option will receive two separate messages from
the error queue, one with a software timestamp and the other with a
hardware timestamp. As the hardware timestamp is saved to the shared skb
info, which may happen before the first message with software timestamp
is received by the application, the hardware timestamp is copied to the
SCM_TIMESTAMPING control message only when the skb has no software
timestamp or it is an incoming packet.

While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
there are no other users.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b50a5c70

net: add function to retrieve original skb device using NAPI ID · 90b602f8

由 Miroslav Lichvar 提交于 5月 19, 2017

Since commit b6858177 ("net: Make skb->skb_iif always track
skb->dev") skbs don't have the original index of the interface which
received the packet. This information is now needed for a new control
message related to hardware timestamping.

Instead of adding a new field to skb, we can find the device by the NAPI
ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the
driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to
hide the CONFIG_NET_RX_BUSY_POLL ifdef.

CC: Richard Cochran <richardcochran@gmail.com>
Suggested-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90b602f8

20 5月, 2017 6 次提交

sk_buff.h: improve description of CHECKSUM_{COMPLETE, UNNECESSARY} · b4759dcd

由 Davide Caratti 提交于 5月 18, 2017

Add FCoE to the list of protocols that can set CHECKSUM_UNNECESSARY; add a
note to CHECKSUM_COMPLETE section to specify that it does not apply to SCTP
and FCoE protocols.
Suggested-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4759dcd

net: more accurate checksumming in validate_xmit_skb() · 43c26a1a

由 Davide Caratti 提交于 5月 18, 2017

skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to
determine if skb needs software computation of Internet Checksum or crc32c
(or nothing, if this computation can be done by the hardware). Use it in
place of skb_checksum_help() in validate_xmit_skb() to avoid corruption
of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL.

While at it, remove references to skb_csum_off_chk* functions, since they
are not present anymore in Linux  _ see commit cf53b1da ("Revert
 "net: Add driver helper functions to determine checksum offloadability"").
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43c26a1a

net: use skb->csum_not_inet to identify packets needing crc32c · dba00306

由 Davide Caratti 提交于 5月 18, 2017

skb->csum_not_inet carries the indication on which algorithm is needed to
compute checksum on skb in the transmit path, when skb->ip_summed is equal
to CHECKSUM_PARTIAL. If skb carries a SCTP packet and crc32c hasn't been
yet written in L4 header, skb->csum_not_inet is assigned to 1; otherwise,
assume Internet Checksum is needed and thus set skb->csum_not_inet to 0.
Suggested-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dba00306

sk_buff: remove support for csum_bad in sk_buff · 219f1d79

由 Davide Caratti 提交于 5月 18, 2017

This bit was introduced with commit 5a212329 ("net: Support for
csum_bad in skbuff") to reduce the stack workload when processing RX
packets carrying a wrong Internet Checksum. Up to now, only one driver and
GRO core are setting it.
Suggested-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

219f1d79

net: introduce skb_crc32c_csum_help · b72b5bf6

由 Davide Caratti 提交于 5月 18, 2017

skb_crc32c_csum_help is like skb_checksum_help, but it is designed for
checksumming SCTP packets using crc32c (see RFC3309), provided that
libcrc32c.ko has been loaded before. In case libcrc32c is not loaded,
invoking skb_crc32c_csum_help on a skb results in one the following
printouts:

warn_crc32c_csum_update: attempt to compute crc32c without libcrc32c.ko
warn_crc32c_csum_combine: attempt to compute crc32c without libcrc32c.ko
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b72b5bf6

skbuff: add stub to help computing crc32c on SCTP packets · 9617813d

由 Davide Caratti 提交于 5月 18, 2017

sctp_compute_checksum requires crc32c symbol (provided by libcrc32c), so
it can't be used in net core. Like it has been done previously with other
symbols (e.g. ipv6_dst_lookup), introduce a stub struct skb_checksum_ops
to allow computation of crc32c checksum in net core after sctp.ko (and thus
libcrc32c) has been loaded.
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9617813d

18 5月, 2017 1 次提交

tcp: switch TCP TS option (RFC 7323) to 1ms clock · 9a568de4

由 Eric Dumazet 提交于 5月 16, 2017

TCP Timestamps option is defined in RFC 7323

Traditionally on linux, it has been tied to the internal
'jiffies' variable, because it had been a cheap and good enough
generator.

For TCP flows on the Internet, 1 ms resolution would be much better
than 4ms or 10ms (HZ=250 or HZ=100 respectively)

For TCP flows in the DC, Google has used usec resolution for more
than two years with great success [1]

Receive size autotuning (DRS) is indeed more precise and converges
faster to optimal window size.

This patch converts tp->tcp_mstamp to a plain u64 value storing
a 1 usec TCP clock.

This choice will allow us to upstream the 1 usec TS option as
discussed in IETF 97.

[1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a568de4

17 5月, 2017 1 次提交

net/sock: factor out dequeue/peek with offset code · 65101aec

由 Paolo Abeni 提交于 5月 16, 2017

And update __sk_queue_drop_skb() to work on the specified queue.
This will help the udp protocol to use an additional private
rx queue in a later patch.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65101aec

22 4月, 2017 1 次提交
- A
  switch memcpy_from_msg() to copy_from_iter_full() · 3073f070
  由 Al Viro 提交于 2月 17, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  3073f070
14 4月, 2017 1 次提交

net: Add ESP offload features · c7ef8f0c

由 Steffen Klassert 提交于 4月 14, 2017

This patch adds netdev features to configure IPsec offloads.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

c7ef8f0c

09 4月, 2017 1 次提交

skbuff: Extend gso_type to unsigned int. · 7f564528

由 Steffen Klassert 提交于 4月 08, 2017

All available gso_type flags are currently in use, so
extend gso_type from 'unsigned short' to 'unsigned int'
to be able to add further flags.

We reorder the struct skb_shared_info to use
two bytes of the four byte hole before dataref.
All fields before dataref are cleared, i.e.
four bytes more than before the change.

The remaining two byte hole is moved to the
beginning of the structure, this protects us
from immediate overwites on out of bound writes
to the sk_buff head.

Structure layout on x86-64 before the change:

struct skb_shared_info {
	unsigned char              nr_frags;             /*     0     1 */
	__u8                       tx_flags;             /*     1     1 */
	short unsigned int         gso_size;             /*     2     2 */
	short unsigned int         gso_segs;             /*     4     2 */
	short unsigned int         gso_type;             /*     6     2 */
	struct sk_buff *           frag_list;            /*     8     8 */
	struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
	u32                        tskey;                /*    24     4 */
	__be32                     ip6_frag_id;          /*    28     4 */
	atomic_t                   dataref;              /*    32     4 */

	/* XXX 4 bytes hole, try to pack */

	void *                     destructor_arg;       /*    40     8 */
	skb_frag_t                 frags[17];            /*    48   272 */
	/* --- cacheline 5 boundary (320 bytes) --- */

	/* size: 320, cachelines: 5, members: 12 */
	/* sum members: 316, holes: 1, sum holes: 4 */
};

Structure layout on x86-64 after the change:

struct skb_shared_info {
	short unsigned int         _unused;              /*     0     2 */
	unsigned char              nr_frags;             /*     2     1 */
	__u8                       tx_flags;             /*     3     1 */
	short unsigned int         gso_size;             /*     4     2 */
	short unsigned int         gso_segs;             /*     6     2 */
	struct sk_buff *           frag_list;            /*     8     8 */
	struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
	unsigned int               gso_type;             /*    24     4 */
	u32                        tskey;                /*    28     4 */
	__be32                     ip6_frag_id;          /*    32     4 */
	atomic_t                   dataref;              /*    36     4 */
	void *                     destructor_arg;       /*    40     8 */
	skb_frag_t                 frags[17];            /*    48   272 */
	/* --- cacheline 5 boundary (320 bytes) --- */

	/* size: 320, cachelines: 5, members: 13 */
};
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f564528

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571

由 Ingo Molnar 提交于 2月 01, 2017

We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e6017571

11 2月, 2017 1 次提交

net/skbuff: Introduce skb_mac_offset() · ea6da4fd

由 Amir Vadai 提交于 2月 07, 2017

Introduce skb_mac_offset() that could be used to get mac header offset.
Signed-off-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea6da4fd

08 2月, 2017 1 次提交

net: add dst_pending_confirm flag to skbuff · 4ff06203

由 Julian Anastasov 提交于 2月 06, 2017

Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ff06203

02 2月, 2017 2 次提交

netfilter: merge ctinfo into nfct pointer storage area · a9e419dc

由 Florian Westphal 提交于 1月 23, 2017

After this change conntrack operations (lookup, creation, matching from
ruleset) only access one instead of two sk_buff cache lines.

This works for normal conntracks because those are allocated from a slab
that guarantees hw cacheline or 8byte alignment (whatever is larger)
so the 3 bits needed for ctinfo won't overlap with nf_conn addresses.

Template allocation now does manual address alignment (see previous change)
on arches that don't have sufficent kmalloc min alignment.

Some spots intentionally use skb->_nfct instead of skb_nfct() helpers,
this is to avoid undoing the skb_nfct() use when we remove untracked
conntrack object in the future.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a9e419dc

skbuff: add and use skb_nfct helper · cb9c6836

由 Florian Westphal 提交于 1月 23, 2017

Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cb9c6836

11 1月, 2017 1 次提交

mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free · 8c2dd3e4

由 Alexander Duyck 提交于 1月 10, 2017

Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent. First
we move the page_frag_ portion of the name to the front of the functions
names. Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this. I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs. Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix. This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c2dd3e4

09 1月, 2017 4 次提交

net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905

由 Willem de Bruijn 提交于 1月 07, 2017

The tc_from field fulfills two roles. It encodes whether a packet was
redirected by an act_mirred device and, if so, whether act_mirred was
called on ingress or egress. Split it into separate fields.

The information is needed by the special IFB loop, where packets are
taken out of the normal path by act_mirred, forwarded to IFB, then
reinjected at their original location (ingress or egress) by IFB.

The IFB device cannot use skb->tc_at_ingress, because that may have
been overwritten as the packet travels from act_mirred to ifb_xmit,
when it passes through tc_classify on the IFB egress path. Cache this
value in skb->tc_from_ingress.

That field is valid only if a packet arriving at ifb_xmit came from
act_mirred. Other packets can be crafted to reach ifb_xmit. These
must be dropped. Set tc_redirected on redirection and drop all packets
that do not have this bit set.

Both fields are set only on cloned skbs in tc actions, so original
packet sources do not have to clear the bit when reusing packets
(notably, pktgen and octeon).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc31c905

net-tc: convert tc_at to tc_at_ingress · 8dc07fdb

由 Willem de Bruijn 提交于 1月 07, 2017

Field tc_at is used only within tc actions to distinguish ingress from
egress processing. A single bit is sufficient for this purpose.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dc07fdb

net-tc: convert tc_verd to integer bitfields · a5135bcf

由 Willem de Bruijn 提交于 1月 07, 2017

Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5135bcf

net-tc: extract skip classify bit from tc_verd · e7246e12

由 Willem de Bruijn 提交于 1月 07, 2017

Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7246e12

26 12月, 2016 1 次提交

ktime: Cleanup ktime_set() usage · 8b0e1953

由 Thomas Gleixner 提交于 12月 25, 2016

ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

8b0e1953

10 12月, 2016 1 次提交

udp: copy skb->truesize in the first cache line · c84d9490

由 Eric Dumazet 提交于 12月 08, 2016

In UDP RX handler, we currently clear skb->dev before skb
is added to receive queue, because device pointer is no longer
available once we exit from RCU section.

Since this first cache line is always hot, lets reuse this space
to store skb->truesize and thus avoid a cache line miss at
udp_recvmsg()/udp_skb_destructor time while receive queue
spinlock is held.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c84d9490

09 12月, 2016 1 次提交

udp: under rx pressure, try to condense skbs · c8c8b127

由 Eric Dumazet 提交于 12月 07, 2016

Under UDP flood, many softirq producers try to add packets to
UDP receive queue, and one user thread is burning one cpu trying
to dequeue packets as fast as possible.

Two parts of the per packet cost are :
- copying payload from kernel space to user space,
- freeing memory pieces associated with skb.

If socket is under pressure, softirq handler(s) can try to pull in
skb->head the payload of the packet if it fits.

Meaning the softirq handler(s) can free/reuse the page fragment
immediately, instead of letting udp_recvmsg() do this hundreds of usec
later, possibly from another node.

Additional gains :
- We reduce skb->truesize and thus can store more packets per SO_RCVBUF
- We avoid cache line misses at copyout() time and consume_skb() time,
and avoid one put_page() with potential alien freeing on NUMA hosts.

This comes at the cost of a copy, bounded to available tail room, which
is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
than necessary)

This patch gave me about 5 % increase in throughput in my tests.

skb_condense() helper could probably used in other contexts.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8c8b127

06 12月, 2016 1 次提交
- A
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success · 15e6cb46
  由 Al Viro 提交于 11月 01, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  15e6cb46