提交 · de77b966ce8adcb4c58d50e2f087320d5479812a · openeuler / raspberrypi-kernel

21 6月, 2017 4 次提交

net: introduce __skb_put_[zero, data, u8] · de77b966

由 yuan linyu 提交于 6月 18, 2017

follow Johannes Berg, semantic patch file as below,
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = __skb_put(skb, len);
+p = __skb_put_zero(skb, len);
|
-p = (t)__skb_put(skb, len);
+p = __skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
identifier p;
expression len;
expression skb;
type t;
@@
(
-t p = __skb_put(skb, len);
+t p = __skb_put_zero(skb, len);
)
... when != p
(
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = __skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
|
-p = (t *)__skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(__skb_put(skb, len), 0, len);
+__skb_put_zero(skb, len);

@@
expression skb, len, data;
@@
-memcpy(__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
expression SKB, C, S;
typedef u8;
identifier fn = {__skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);
Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de77b966

qed*: Set rdma generic functions prefix · bbfcd1e8

由 Michal Kalderon 提交于 6月 20, 2017

Rename the functions common to both iWARP and RoCE to have a prefix of
_rdma_ instead of _roce_.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbfcd1e8

qed*: qede_roce.[ch] -> qede_rdma.[ch] · b262a06e

由 Michal Kalderon 提交于 6月 20, 2017

Once we have iWARP support, the qede portion of the qedr<->qede would
serve all the RDMA protocols - so rename the file to be appropriate
to its function.

While we're at it, we're also moving a couple of inclusions to it into
.h files and adding includes to make sure it contains all type
definitions it requires.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b262a06e

qed: Chain support for external PBL · 1a4a6975

由 Mintz, Yuval 提交于 6月 20, 2017

iWARP would require the chains to allocate/free their PBL memory
independently, so add the infrastructure to provide it externally.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a4a6975

16 6月, 2017 12 次提交

net: Add IFLA_XDP_PROG_ID · 58038695

由 Martin KaFai Lau 提交于 6月 15, 2017

Expose prog_id through IFLA_XDP_PROG_ID.  This patch
makes modification to generic_xdp.  The later patches will
modify other xdp-supported drivers.

prog_id is added to struct net_dev_xdp.

iproute2 patch will be followed. Here is how the 'ip link'
will look like:
> ip link show eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58038695

networking: add and use skb_put_u8() · 634fef61

由 Johannes Berg 提交于 6月 16, 2017

Joe and Bjørn suggested that it'd be nicer to not have the
cast in the fairly common case of doing
	*(u8 *)skb_put(skb, 1) = c;

Add skb_put_u8() for this case, and use it across the code,
using the following spatch:

    @@
    expression SKB, C, S;
    typedef u8;
    identifier fn = {skb_put};
    fresh identifier fn2 = fn ## "_u8";
    @@
    - *(u8 *)fn(SKB, S) = C;
    + fn2(SKB, C);

Note that due to the "S", the spatch isn't perfect, it should
have checked that S is 1, but there's also places that use a
sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
out that nobody ever did something like
	*(u8 *)skb_put(skb, 2) = c;

which would be wrong anyway since the second byte wouldn't be
initialized.
Suggested-by: NJoe Perches <joe@perches.com>
Suggested-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

634fef61

networking: make skb_push & __skb_push return void pointers · d58ff351

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d58ff351

networking: make skb_pull & friends return void pointers · af72868b

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = {
            skb_pull,
            __skb_pull,
            skb_pull_inline,
            __pskb_pull_tail,
            __pskb_pull,
            pskb_pull
    };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = {
            skb_pull,
            __skb_pull,
            skb_pull_inline,
            __pskb_pull_tail,
            __pskb_pull,
            pskb_pull
    };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af72868b

networking: make skb_put & friends return void pointers · 4df864c1

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4df864c1

networking: introduce and use skb_put_data() · 59ae1d12

由 Johannes Berg 提交于 6月 16, 2017

A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.

An spatch similar to the one for skb_put_zero() converts many
of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

(again, manually post-processed to retain some comments)
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59ae1d12

net/mlx5: Add fast unload support in shutdown flow · 8812c24d

由 Majd Dibbiny 提交于 2月 09, 2017

Adding a support to flush all HW resources with one FW command and
skip all the heavy unload flows of the driver on kernel shutdown.
There's no need to free all the SW context since a new fresh kernel
will be loaded afterwards.

Regarding the FW resources, they should be closed, otherwise we will
have leakage in the FW. To accelerate this flow, we execute one command
in the beginning that tells the FW that the driver isn't going to close
any of the FW resources and asks the FW to clean up everything.
Once the commands complete, it's safe to close the PCI resources and
finish the routine.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8812c24d

net/mlx5: Expose command polling interface · 4525abea

由 Majd Dibbiny 提交于 2月 09, 2017

Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).

This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4525abea

net/mlx5e: Move and optimize query out of buffer function · 432609a4

由 Gal Pressman 提交于 6月 14, 2017

Move "query queue counter out of buffer" helper function out of
qp.c to en_main.c, since mlx5e netdev driver is the only one to use it.

Also allocate the output buffer on the stack instead of the heap, to reduce
number of heap allocs on update_stats work.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com

432609a4

net/mlx5: Fix some spelling mistakes · bd10838a

由 Or Gerlitz 提交于 5月 28, 2017

Fixed few places where endianness was misspelled and
one spot whwere output was:

CHECK: 'endianess' may be misspelled - perhaps 'endianness'?
CHECK: 'ouput' may be misspelled - perhaps 'output'?
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

bd10838a

skbuff: make skb_put_zero() return void · 83ad357d

由 Johannes Berg 提交于 6月 14, 2017

It's nicer to return void, since then there's no need to
cast to any structures. Currently none of the users have
a cast, but a number of future conversions do.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83ad357d

tls: kernel TLS support · 3c4d7559

由 Dave Watson 提交于 6月 14, 2017

Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c4d7559

15 6月, 2017 2 次提交

net: update undefined ->ndo_change_mtu() comment · db46a0e1

由 Magnus Damm 提交于 6月 14, 2017

Update ->ndo_change_mtu() callback comment to remove text
about returning error in case of undefined callback. This
change makes the comment match the existing code behavior.
Signed-off-by: NMagnus Damm <damm+renesas@opensource.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db46a0e1

bpf: permits narrower load from bpf program context fields · 31fd8581

由 Yonghong Song 提交于 6月 13, 2017

Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
        __u8 h = __sk_buff->hash, or
        __u16 p = __sk_buff->protocol
        __u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.

This patch solves the issue by:
  . Introduce a new parameter ctx_field_size to carry the
    field size of narrower load from prog type
    specific *__is_valid_access validator back to verifier.
  . The non-zero ctx_field_size for a memory access indicates
    (1). underlying prog type specific convert_ctx_accesses
         supporting non-whole-field access
    (2). the current insn is a narrower or whole field access.
  . In verifier, for such loads where load memory size is
    less than ctx_field_size, verifier transforms it
    to a full field load followed by proper masking.
  . Currently, __sk_buff and bpf_perf_event_data->sample_period
    are supporting narrowing loads.
  . Narrower stores are still not allowed as typical ctx stores
    are just normal stores.

Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31fd8581

14 6月, 2017 3 次提交

of_mdio: move of_mdio_parse_addr to header file · 4798a714

由 Jon Mason 提交于 6月 13, 2017

The of_mdio_parse_addr() helper function is useful to other code, but
the module dependency chain causes issues.  To work around this, we can
move of_mdio_parse_addr() to be an inline function in the header file.
This gets rid of the dependencies and still allows for the reuse of
code.
Reported-by: NLiviu Dudau <liviu@dudau.co.uk>
Signed-off-by: NJon Mason <jon.mason@broadcom.com>
Fixes: 342fa196 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4798a714

net: phy: Make phy_ethtool_ksettings_get return void · 5514174f

由 yuval.shaia@oracle.com 提交于 6月 13, 2017

Make return value void since function never return meaningfull value
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5514174f

mdio_bus: handle only single PHY reset GPIO · d396e84c

由 Sergei Shtylyov 提交于 6月 12, 2017

Commit 4c5e7a2c ("dt-bindings: mdio: Clarify binding document")
declared that a MDIO reset GPIO property should have only a single GPIO
reference/specifier, however the supporting code was left intact, still
burdening the kernel with now apparently useless loops -- get rid of them.
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d396e84c

13 6月, 2017 2 次提交

cfg80211: support 4-way handshake offloading for 802.1X · 3a00df57

由 Avraham Stern 提交于 6月 09, 2017

Add API for setting the PMK to the driver. For FT support, allow
setting also the PMK-R0 Name.

This can be used by drivers that support 4-Way handshake offload
while IEEE802.1X authentication is managed by upper layers.
Signed-off-by: NAvraham Stern <avraham.stern@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
[arend.vanspriel@broadcom.com: add WANT_1X_4WAY_HS attribute]
Signed-off-by: NArend van Spriel <arend.vanspriel@broadcom.com>
[reword NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_1X docs a bit to
say that the device may require it]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

3a00df57

cfg80211: support 4-way handshake offloading for WPA/WPA2-PSK · 91b5ab62

由 Eliad Peller 提交于 6月 09, 2017

Let drivers advertise support for station-mode 4-way handshake
offloading with a new NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK flag.

Extend use of NL80211_ATTR_PMK attribute indicating it might be passed
as part of NL80211_CMD_CONNECT command, and contain the PSK (which is
the PMK, hence the name.)

The driver/device is assumed to handle the 4-way handshake by
itself in this case (including key derivations, etc.), instead
of relying on the supplicant.

This patch is somewhat based on this one (by Vladimir Kondratiev):
https://patchwork.kernel.org/patch/1309561/.
Signed-off-by: NVladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: NEliad Peller <eliadx.peller@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
[arend.vanspriel@broadcom.com rebase dealing with existing ATTR_PMK]
Signed-off-by: NArend van Spriel <arend.vanspriel@broadcom.com>
[reword NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK docs to indicate
that this offload might be required]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

91b5ab62

12 6月, 2017 3 次提交

udp: avoid a cache miss on dequeue · 0a463c78

由 Paolo Abeni 提交于 6月 12, 2017

Since UDP no more uses sk->destructor, we can clear completely
the skb head state before enqueuing. Amend and use
skb_release_head_state() for that.

All head states share a single cacheline, which is not
normally used/accesses on dequeue. We can avoid entirely accessing
such cacheline implementing and using in the UDP code a specialized
skb free helper which ignores the skb head state.

This saves a cacheline miss at skb deallocation time.

v1 -> v2:
  replaced secpath_reset() with skb_release_head_state()
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a463c78

net: factor out a helper to decrement the skb refcount · 3889a803

由 Paolo Abeni 提交于 6月 12, 2017

The same code is replicated in 3 different places; move it to a
common helper.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3889a803

compiler, clang: properly override 'inline' for clang · 6d53cefb

由 Linus Torvalds 提交于 6月 11, 2017

Commit abb2ea7d ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.

So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.

Maybe this makes clang happier.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d53cefb

10 6月, 2017 5 次提交

qed*: LL2 callback operations · 0518c12f

由 Michal Kalderon 提交于 6月 09, 2017

LL2 today is interrupt driven - when tx/rx completion arrives [or any
other indication], qed needs to operate on the connection and pass
the information to the protocol-driver [or internal qed consumer].
Since we have several flavors of ll2 employeed by the driver,
each handler needs to do an if-else to determine the right functionality
to use based on the connection type.

In order to make things more scalable [given that we're going to add
additional types of ll2 flavors] move the infrastrucutre into using
a callback-based approach - the callbacks would be provided as part
of the connection's initialization parameters.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0518c12f

qed: Cleaner seperation of LL2 inputs · 13c54771

由 Mintz, Yuval 提交于 6月 09, 2017

A LL2 connection [qed_ll2_info] has a sub-structure of type qed_ll2_conn
that contain various inputs for ll2 acquisition, but the connection also
utilizes a couple of other inputs.

Restructure the input structure to include all the inputs and refactor
the code necessary to populate those.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13c54771

qed: Revise ll2 Rx completion · 68be910c

由 Mintz, Yuval 提交于 6月 09, 2017

This introduces qed_ll2_comp_rx_data as a public struct
and moves handling of Rx packets in LL2 into using it.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68be910c

qed: LL2 to use packed information for tx · 7c7973b2

由 Mintz, Yuval 提交于 6月 09, 2017

First step in revising the LL2 interface, this declares
qed_ll2_tx_pkt_info as part of the ll2 interface, and uses it for
transmission instead of receiving lots of parameters.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c7973b2

Ipvlan should return an error when an address is already in use. · 3ad7d246

由 Krister Johansen 提交于 6月 08, 2017

The ipvlan code already knows how to detect when a duplicate address is
about to be assigned to an ipvlan device. However, that failure is not
propogated outward and leads to a silent failure.

Introduce a validation step at ip address creation time and allow device
drivers to register to validate the incoming ip addresses. The ipvlan
code is the first consumer. If it detects an address in use, we can
return an error to the user before beginning to commit the new ifa in
the networking code.

This can be especially useful if it is necessary to provision many
ipvlans in containers. The provisioning software (or operator) can use
this to detect situations where an ip address is unexpectedly in use.
Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ad7d246

09 6月, 2017 1 次提交

KEYS: sanitize key structs before freeing · 0620fddb

由 Eric Biggers 提交于 6月 08, 2017

While a 'struct key' itself normally does not contain sensitive
information, Documentation/security/keys.txt actually encourages this:

"Having a payload is not required; and the payload can, in fact,
just be a value stored in the struct key itself."

In case someone has taken this advice, or will take this advice in the
future, zero the key structure before freeing it. We might as well, and
as a bonus this could make it a bit more difficult for an adversary to
determine which keys have recently been in use.

This is safe because the key_jar cache does not use a constructor.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

0620fddb

08 6月, 2017 8 次提交

srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604

由 Paolo Bonzini 提交于 5月 31, 2017

Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Cc: stable@vger.kernel.org
Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: NLinu Cherian <linuc.decode@gmail.com>
Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1123a604

net: ipv6: Release route when device is unregistering · 8397ed36

由 David Ahern 提交于 6月 07, 2017

Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:

$ ifdown bond2    # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2

Steps to reproduce:
    echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
    ip link add dev bond12 type bond
    ip link add dev bond13 type bond
    ip addr add 2001:db8:2::0/64 dev bond12
    ip addr add 2001:db8:3::0/64 dev bond13
    ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
    ip link del dev bond12
    ip link del dev bond13

The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.

Fixes: a1a22c12 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8397ed36

net: propagate tc filter chain index down the ndo_setup_tc call · a5fcf8a6

由 Jiri Pirko 提交于 6月 06, 2017

We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5fcf8a6

net/mlx5e: Add support for reading connector type from PTYS · 5b4793f8

由 Eran Ben Elisha 提交于 2月 13, 2017

Read port connector type from the firmware instead of caching it in the
driver metadata.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5b4793f8

net/mlx5: Update flow table commands layout · 0c90e9c6

由 Maor Gottlieb 提交于 3月 12, 2017

Update struct mlx5_ifc_create(modify)_flow_table_bits according to
the last device specification.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0c90e9c6

net: Fix inconsistent teardown and release of private netdev state. · cf124db5

由 David S. Miller 提交于 5月 08, 2017

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf124db5

rxrpc: Provide a cmsg to specify the amount of Tx data for a call · e754eba6

由 David Howells 提交于 6月 07, 2017

Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.

Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.

By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).

An error will be returned if too little or too much data is presented in
the Tx phase.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e754eba6

D
rxrpc: Provide a getsockopt call to query what cmsgs types are supported · 515559ca
由 David Howells 提交于 6月 07, 2017
```
Provide a getsockopt() call that can query what cmsg types are supported by
AF_RXRPC.
```
515559ca