提交 · b473b0d23529cde6c825a592c035e9d910b19e21 · openeuler / Kernel

27 2月, 2019 2 次提交

devlink: create a special NDO for getting the devlink instance · b473b0d2

由 Jakub Kicinski 提交于 5年前

Instead of iterating over all devlink ports add a NDO which
will return the devlink instance from the driver.

v2: add the netdev_to_devlink() helper (Michal)
v3: check that devlink has ops (Florian)
v4: hold devlink_mutex (Jiri)
Suggested-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b473b0d2

net: devlink: turn devlink into a built-in · f4b6bcc7

由 Jakub Kicinski 提交于 5年前

Being able to build devlink as a module causes growing pains.
First all drivers had to add a meta dependency to make sure
they are not built in when devlink is built as a module.  Now
we are struggling to invoke ethtool compat code reliably.

Make devlink code built-in, users can still not build it at
all but the dynamically loadable module option is removed.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4b6bcc7

25 2月, 2019 5 次提交

net: fix double-free in bpf_lwt_xmit_reroute · bd16693f

由 Peter Oskolkov 提交于 5年前

dst_output() frees skb when it fails (see, for example,
ip_finish_output2), so it must not be freed in this case.

Fixes: 3bd0b152 ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd16693f

ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel · 3d25eabb

由 wenxu 提交于 5年前

The lwtunnel_state is not init the dst_cache Which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d25eabb

net: dev: add generic protodown handler · b5899679

由 Andy Roulin 提交于 5年前

Introduce dev_change_proto_down_generic, a generic ndo_change_proto_down
implementation, which sets the netdev carrier state according to proto_down.

This adds the ability to set protodown on vxlan and macvlan devices in a
generic way for use by control protocols like VRRPD.
Signed-off-by: NAndy Roulin <aroulin@cumulusnetworks.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5899679

net: Skip GSO length estimation if transport header is not set · a0dce875

由 Maxim Mikityanskiy 提交于 5年前

qdisc_pkt_len_init expects transport_header to be set for GSO packets.
Patch [1] skips transport_header validation for GSO packets that don't
have network_header set at the moment of calling virtio_net_hdr_to_skb,
and allows them to pass into the stack. After patch [2] no placeholder
value is assigned to transport_header if dissection fails, so this patch
adds a check to the place where the value of transport_header is used.

[1] https://patchwork.ozlabs.org/patch/1044429/
[2] https://patchwork.ozlabs.org/patch/1046122/Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0dce875

net: Use RCU_INIT_POINTER() to set sk_wq · c2f26e8f

由 Li RongQing 提交于 5年前

This pointer is RCU protected, so proper primitives should be used.
Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2f26e8f

22 2月, 2019 2 次提交

devlink: Modify reply of DEVLINK_CMD_HEALTH_REPORTER_GET · 574b1e1f

由 Aya Levin 提交于 5年前

Avoid sending attributes related to recovery:
DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD and
DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER in reply to
DEVLINK_CMD_HEALTH_REPORTER_GET for a reporter which didn't register a
recover operation.
These parameters can't be configured on a reporter that did not provide
a recover operation, thus not needed to return them.

Fixes: 7afe335a ("devlink: Add health get command")
Signed-off-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

574b1e1f

devlink: Rename devlink health attributes · 54719527

由 Aya Levin 提交于 5年前

Rename devlink health attributes for better reflect the attributes use.
Add COUNT prefix on error counter attribute and recovery counter
attribute.

Fixes: 7afe335a ("devlink: Add health get command")
Signed-off-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54719527

18 2月, 2019 5 次提交

net: Do not allocate page fragments that are not skb aligned · 3bed3cc4

由 Alexander Duyck 提交于 5年前

This patch addresses the fact that there are drivers, specifically tun,
that will call into the network page fragment allocators with buffer sizes
that are not cache aligned. Doing this could result in data alignment
and DMA performance issues as these fragment pools are also shared with the
skb allocator and any other devices that will use napi_alloc_frags or
netdev_alloc_frags.

Fixes: ffde7328 ("net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bed3cc4

ethtool: add compat for flash update · 4eceba17

由 Jakub Kicinski 提交于 5年前

If driver does not support ethtool flash update operation
call into devlink.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eceba17

devlink: add flash update command · 76726ccb

由 Jakub Kicinski 提交于 5年前

Add devlink flash update command. Advanced NICs have firmware
stored in flash and often cryptographically secured. Updating
that flash is handled by management firmware. Ethtool has a
flash update command which served us well, however, it has two
shortcomings:
 - it takes rtnl_lock unnecessarily - really flash update has
   nothing to do with networking, so using a networking device
   as a handle is suboptimal, which leads us to the second one:
 - it requires a functioning netdev - in case device enters an
   error state and can't spawn a netdev (e.g. communication
   with the device fails) there is no netdev to use as a handle
   for flashing.

Devlink already has the ability to report the firmware versions,
now with the ability to update the firmware/flash we will be
able to recover devices in bad state.

To enable updates of sub-components of the FW allow passing
component name.  This name should correspond to one of the
versions reported in devlink info.

v1: - replace target id with component name (Jiri).
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76726ccb

neigh: hook tracepoints in neigh update code · 56dd18a4

由 Roopa Prabhu 提交于 5年前

hook tracepoints at the end of functions that
update a neigh entry. neigh_update gets an additional
tracepoint to trace the update flags and old and new
neigh states.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56dd18a4

trace: events: add a few neigh tracepoints · 9c03b282

由 Roopa Prabhu 提交于 5年前

The goal here is to trace neigh state changes covering all possible
neigh update paths. Plus have a specific trace point in neigh_update
to cover flags sent to neigh_update.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c03b282

17 2月, 2019 1 次提交

sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values · 4057765f

由 Guillaume Nault 提交于 5年前

SO_SNDBUF and SO_RCVBUF (and their *BUFFORCE version) may overflow or
underflow their input value. This patch aims at providing explicit
handling of these extreme cases, to get a clear behaviour even with
values bigger than INT_MAX / 2 or lower than INT_MIN / 2.

For simplicity, only SO_SNDBUF and SO_SNDBUFFORCE are described here,
but the same explanation and fix apply to SO_RCVBUF and SO_RCVBUFFORCE
(with 'SNDBUF' replaced by 'RCVBUF' and 'wmem_max' by 'rmem_max').

Overflow of positive values

===========================

When handling SO_SNDBUF or SO_SNDBUFFORCE, if 'val' exceeds
INT_MAX / 2, the buffer size is set to its minimum value because
'val * 2' overflows, and max_t() considers that it's smaller than
SOCK_MIN_SNDBUF. For SO_SNDBUF, this can only happen with
net.core.wmem_max > INT_MAX / 2.

SO_SNDBUF and SO_SNDBUFFORCE are actually designed to let users probe
for the maximum buffer size by setting an arbitrary large number that
gets capped to the maximum allowed/possible size. Having the upper
half of the positive integer space to potentially reduce the buffer
size to its minimum value defeats this purpose.

This patch caps the base value to INT_MAX / 2, so that bigger values
don't overflow and keep setting the buffer size to its maximum.

Underflow of negative values
============================

For negative numbers, SO_SNDBUF always considers them bigger than
net.core.wmem_max, which is bounded by [SOCK_MIN_SNDBUF, INT_MAX].
Therefore such values are set to net.core.wmem_max and we're back to
the behaviour of positive integers described above (return maximum
buffer size if wmem_max <= INT_MAX / 2, return SOCK_MIN_SNDBUF
otherwise).

However, SO_SNDBUFFORCE behaves differently. The user value is
directly multiplied by two and compared with SOCK_MIN_SNDBUF. If
'val * 2' doesn't underflow or if it underflows to a value smaller
than SOCK_MIN_SNDBUF then buffer size is set to its minimum value.
Otherwise the buffer size is set to the underflowed value.

This patch treats negative values passed to SO_SNDBUFFORCE as null, to
prevent underflows. Therefore negative values now always set the buffer
size to its minimum value.

Even though SO_SNDBUF behaves inconsistently by setting buffer size to
the maximum value when passed a negative number, no attempt is made to
modify this behaviour. There may exist some programs that rely on using
negative numbers to set the maximum buffer size. Avoiding overflows
because of extreme net.core.wmem_max values is the most we can do here.

Summary of altered behaviours
=============================

val      : user-space value passed to setsockopt()
val_uf   : the underflowed value resulting from doubling val when
           val < INT_MIN / 2
wmem_max : short for net.core.wmem_max
val_cap  : min(val, wmem_max)
min_len  : minimal buffer length (that is, SOCK_MIN_SNDBUF)
max_len  : maximal possible buffer length, regardless of wmem_max (that
           is, INT_MAX - 1)
^^^^     : altered behaviour

SO_SNDBUF:
+-------------------------+-------------+------------+----------------+
|       CONDITION         | OLD RESULT  | NEW RESULT |    COMMENT     |
+-------------------------+-------------+------------+----------------+
| val < 0 &&              |             |            | No overflow,   |
| wmem_max <= INT_MAX/2   | wmem_max*2  | wmem_max*2 | keep original  |
|                         |             |            | behaviour      |
+-------------------------+-------------+------------+----------------+
| val < 0 &&              |             |            | Cap wmem_max   |
| INT_MAX/2 < wmem_max    | min_len     | max_len    | to prevent     |
|                         |             | ^^^^^^^    | overflow       |
+-------------------------+-------------+------------+----------------+
| 0 <= val <= min_len/2   | min_len     | min_len    | Ordinary case  |
+-------------------------+-------------+------------+----------------+
| min_len/2 < val &&      | val_cap*2   | val_cap*2  | Ordinary case  |
| val_cap <= INT_MAX/2    |             |            |                |
+-------------------------+-------------+------------+----------------+
| min_len < val &&        |             |            | Cap val_cap    |
| INT_MAX/2 < val_cap     | min_len     | max_len    | again to       |
| (implies that           |             | ^^^^^^^    | prevent        |
| INT_MAX/2 < wmem_max)   |             |            | overflow       |
+-------------------------+-------------+------------+----------------+

SO_SNDBUFFORCE:
+------------------------------+---------+---------+------------------+
|          CONDITION           | BEFORE  | AFTER   |     COMMENT      |
|                              | PATCH   | PATCH   |                  |
+------------------------------+---------+---------+------------------+
| val < INT_MIN/2 &&           | min_len | min_len | Underflow with   |
| val_uf <= min_len            |         |         | no consequence   |
+------------------------------+---------+---------+------------------+
| val < INT_MIN/2 &&           | val_uf  | min_len | Set val to 0 to  |
| val_uf > min_len             |         | ^^^^^^^ | avoid underflow  |
+------------------------------+---------+---------+------------------+
| INT_MIN/2 <= val < 0         | min_len | min_len | No underflow     |
+------------------------------+---------+---------+------------------+
| 0 <= val <= min_len/2        | min_len | min_len | Ordinary case    |
+------------------------------+---------+---------+------------------+
| min_len/2 < val <= INT_MAX/2 | val*2   | val*2   | Ordinary case    |
+------------------------------+---------+---------+------------------+
| INT_MAX/2 < val              | min_len | max_len | Cap val to       |
|                              |         | ^^^^^^^ | prevent overflow |
+------------------------------+---------+---------+------------------+
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4057765f

16 2月, 2019 1 次提交

net: Fix for_each_netdev_feature on Big endian · 3b89ea9c

由 Hauke Mehrtens 提交于 5年前

The features attribute is of type u64 and stored in the native endianes on
the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
and goes over the bits in this area. On little Endian systems this also
works with an u64 as the most significant bit is on the highest address,
but on big endian the words are swapped. When we expect bit 15 here we get
bit 47 (15 + 32).

This patch converts it more or less to its own for_each_set_bit()
implementation which works on 64 bit integers directly. This is then
completely in host endianness and should work like expected.

Fixes: fd867d51 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: NHauke Mehrtens <hauke.mehrtens@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b89ea9c

15 2月, 2019 3 次提交

bpf: fix memory leak in bpf_lwt_xmit_reroute · fb405883

由 Peter Oskolkov 提交于 5年前

On error the skb should be freed. Tested with diff/steps
provided by David Ahern.

v2: surface routing errors to the user instead of a generic EINVAL,
    as suggested by David Ahern.
Reported-by: NDavid Ahern <dsahern@gmail.com>
Fixes: 3bd0b152 ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: NPeter Oskolkov <posk@google.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fb405883

devlink: Fix list access without lock while reading region · dac7c08f

由 Parav Pandit 提交于 5年前

While finding the devlink device during region reading,
devlink device list is accessed and devlink device is
returned without holding a lock. This could lead to use-after-free
accesses.

While at it, add lockdep assert to ensure that all future callers hold
the lock when calling devlink_get_from_attrs().

Fixes: 4e54795a ("devlink: Add support for region snapshot read command")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dac7c08f

devlink: Return right error code in case of errors for region read · fdd41ec2

由 Parav Pandit 提交于 5年前

devlink_nl_cmd_region_read_dumpit() misses to return right error code on
most error conditions.
Return the right error code on such errors.

Fixes: 4e54795a ("devlink: Add support for region snapshot read command")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdd41ec2

14 2月, 2019 7 次提交

page_pool: use DMA_ATTR_SKIP_CPU_SYNC for DMA mappings · 13f16d9d

由 Jesper Dangaard Brouer 提交于 5年前

As pointed out by Alexander Duyck, the DMA mapping done in page_pool needs
to use the DMA attribute DMA_ATTR_SKIP_CPU_SYNC.

As the principle behind page_pool keeping the pages mapped is that the
driver takes over the DMA-sync steps.
Reported-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13f16d9d

net: page_pool: don't use page->private to store dma_addr_t · 1567b85e

由 Ilias Apalodimas 提交于 5年前

As pointed out by David Miller the current page_pool implementation
stores dma_addr_t in page->private.
This won't work on 32-bit platforms with 64-bit DMA addresses since the
page->private is an unsigned long and the dma_addr_t a u64.

A previous patch is adding dma_addr_t on struct page to accommodate this.
This patch adapts the page_pool related functions to use the newly added
struct for storing and retrieving DMA addresses from network drivers.
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1567b85e

net: fix possible overflow in __sk_mem_raise_allocated() · 5bf325a5

由 Eric Dumazet 提交于 5年前

With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bf325a5

bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c · 3bd0b152

由 Peter Oskolkov 提交于 5年前

This patch builds on top of the previous patch in the patchset,
which added BPF_LWT_ENCAP_IP mode to bpf_lwt_push_encap. As the
encapping can result in the skb needing to go via a different
interface/route/dst, bpf programs can indicate this by returning
BPF_LWT_REROUTE, which triggers a new route lookup for the skb.

v8 changes: fix kbuild errors when LWTUNNEL_BPF is builtin, but
   IPV6 is a module: as LWTUNNEL_BPF can only be either Y or N,
   call IPV6 routing functions only if they are built-in.

v9 changes:
   - fixed a kbuild test robot compiler warning;
   - call IPV6 routing functions via ipv6_stub.

v10 changes: removed unnecessary IS_ENABLED and pr_warn_once.

v11 changes: fixed a potential dst leak.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

3bd0b152

bpf: handle GSO in bpf_lwt_push_encap · ca78801a

由 Peter Oskolkov 提交于 5年前

This patch adds handling of GSO packets in bpf_lwt_push_ip_encap()
(called from bpf_lwt_push_encap):

* IPIP, GRE, and UDP encapsulation types are deduced by looking
  into iphdr->protocol or ipv6hdr->next_header;
* SCTP GSO packets are not supported (as bpf_skb_proto_4_to_6
  and similar do);
* UDP_L4 GSO packets are also not supported (although they are
  not blocked in bpf_skb_proto_4_to_6 and similar), as
  skb_decrease_gso_size() will break it;
* SKB_GSO_DODGY bit is set.

Note: it may be possible to support SCTP and UDP_L4 gso packets;
      but as these cases seem to be not well handled by other
      tunneling/encapping code paths, the solution should
      be generic enough to apply to all tunneling/encapping code.

v8 changes:
   - make sure that if GRE or UDP encap is detected, there is
     enough of pushed bytes to cover both IP[v6] + GRE|UDP headers;
   - do not reject double-encapped packets;
   - whitelist TCP GSO packets rather than block SCTP GSO and
     UDP GSO.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

ca78801a

bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap · 52f27877

由 Peter Oskolkov 提交于 5年前

Implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper.
It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and
BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).

This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).

v7 changes:
 - added a call skb_clear_hash();
 - removed calls to skb_set_transport_header();
 - refuse to encap GSO-enabled packets.

v8 changes:
 - fix build errors when LWT is not enabled.

Note: the next patch in the patchset with deal with GSO-enabled packets,
which are currently rejected at encapping attempt.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

52f27877

bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap · 3e0bd37c

由 Peter Oskolkov 提交于 5年前

This patch adds all needed plumbing in preparation to allowing
bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
implementation is added in the next patch in the patchset.

Of note:
- bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
  prog types in addition to BPF_PROG_TYPE_LWT_IN;
- if the skb being encapped has GSO set, encapsulation is limited
  to IPIP/IP+GRE/IP+GUE (both IPv4 and IPv6);
- as route lookups are different for ingress vs egress, the single
  external bpf_lwt_push_encap BPF helper is routed internally to
  either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
  depending on prog type.

v8 changes: fixed a typo.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

3e0bd37c

13 2月, 2019 2 次提交

devlink: use direct return of genlmsg_reply · fde55ea7

由 Li RongQing 提交于 5年前

This can remove redundant check
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fde55ea7

Revert "devlink: Add a generic wake_on_lan port parameter" · da203dfa

由 Vasundhara Volam 提交于 5年前

This reverts commit b639583f.

As per discussion with Jakub Kicinski and Michal Kubecek,
this will be better addressed by soon-too-come ethtool netlink
API with additional indication that given configuration request
is supposed to be persisted.

Also, remove the parameter support from bnxt_en driver.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da203dfa

12 2月, 2019 3 次提交

devlink: don't allocate attrs on the stack · 68750561

由 Jakub Kicinski 提交于 5年前

Number of devlink attributes has grown over 128, causing the
following warning:

../net/core/devlink.c: In function ‘devlink_nl_cmd_region_read_dumpit’:
../net/core/devlink.c:3740:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
  ^

Since the number of attributes is only going to grow allocate
the array dynamically.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68750561

devlink: fix condition for compat device info · 3ceb745b

由 Jakub Kicinski 提交于 5年前

We need the port to be both ethernet and have the rigth netdev,
not one or the other.

Fixes: ddb6e99e ("ethtool: add compat for devlink info")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ceb745b

devlink: Add WARN_ON to catch errors of not cleaning devlink objects · b904aada

由 Parav Pandit 提交于 5年前

Add WARN_ON to make sure that all sub objects of a devlink device are
cleanedup before freeing the devlink device.
This helps to catch any driver bugs.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b904aada

11 2月, 2019 5 次提交

bpf: only adjust gso_size on bytestream protocols · b90efd22

由 Willem de Bruijn 提交于 5年前

bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.

The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cb ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.

Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.

Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.

Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b90efd22

bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock · 655a51e5

由 Martin KaFai Lau 提交于 5年前

This patch adds a helper function BPF_FUNC_tcp_sock and it
is currently available for cg_skb and sched_(cls|act):

struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_tcp_sock *tp;
	struct bpf_sock *sk;
	__u32 snd_cwnd;

	sk = skb->sk;
	if (!sk)
		return 1;

	tp = bpf_tcp_sock(sk);
	if (!tp)
		return 1;

	snd_cwnd = tp->snd_cwnd;
	/* ... */

	return 1;
}

A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
read-only access.  bpf_tcp_sock has all the existing tcp_sock's fields
that has already been exposed by the bpf_sock_ops.
i.e. no new tcp_sock's fields are exposed in bpf.h.

This helper returns a pointer to the tcp_sock.  If it is not a tcp_sock
or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
returns NULL.  Hence, the caller needs to check for NULL before
accessing it.

The current use case is to expose members from tcp_sock
to allow a cg_skb_bpf_prog to provide per cgroup traffic
policing/shaping.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

655a51e5

bpf: Refactor sock_ops_convert_ctx_access · 9b1f3d6e

由 Martin KaFai Lau 提交于 5年前

The next patch will introduce a new "struct bpf_tcp_sock" which
exposes the same tcp_sock's fields already exposed in
"struct bpf_sock_ops".

This patch refactor the existing convert_ctx_access() codes for
"struct bpf_sock_ops" to get them ready to be reused for
"struct bpf_tcp_sock".  The "rtt_min" is not refactored
in this patch because its handling is different from other
fields.

The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
changes are code move only.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9b1f3d6e

bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock · aa65d696

由 Martin KaFai Lau 提交于 5年前

This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
bpf_sock.  The userspace has already been using "state",
e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).

This patch also allows narrow load on the following existing fields:
"family", "type", "protocol" and "src_port".  Unlike IP address,
the load offset is resticted to the first byte for them but it
can be relaxed later if there is a use case.

This patch also folds __sock_filter_check_size() into
bpf_sock_is_valid_access() since it is not called
by any where else.  All bpf_sock checking is in
one place.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

aa65d696

bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper · 46f8bc92

由 Martin KaFai Lau 提交于 5年前

In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock.  For example, in __netdev_pick_tx:

static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
			    struct net_device *sb_dev)
{
	/* ... */

	struct sock *sk = skb->sk;

		if (queue_index != new_index && sk &&
		    sk_fullsock(sk) &&
		    rcu_access_pointer(sk->sk_dst_cache))
			sk_tx_queue_set(sk, new_index);

	/* ... */

	return queue_index;
}

This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().

"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer.  It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
     are not allowed.

The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_sock *sk;

	sk = skb->sk;
	if (!sk)
		return 1;

	sk = bpf_sk_fullsock(sk);
	if (!sk)
		return 1;

	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
		return 1;

	/* some_traffic_shaping(); */

	return 1;
}

(1) The sk is read only

(2) There is no new "struct bpf_sock_common" introduced.

(3) Future kernel sock's members could be added to bpf_sock only
    instead of repeatedly adding at multiple places like currently
    in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.

(4) After "sk = skb->sk", the reg holding sk is in type
    PTR_TO_SOCK_COMMON_OR_NULL.

(5) After bpf_sk_fullsock(), the return type will be in type
    PTR_TO_SOCKET_OR_NULL which is the same as the return type of
    bpf_sk_lookup_xxx().

    However, bpf_sk_fullsock() does not take refcnt.  The
    acquire_reference_state() is only depending on the return type now.
    To avoid it, a new is_acquire_function() is checked before calling
    acquire_reference_state().

(6) The WARN_ON in "release_reference_state()" is no longer an
    internal verifier bug.

    When reg->id is not found in state->refs[], it means the
    bpf_prog does something wrong like
    "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
    never been acquired by calling "bpf_sk_fullsock(skb->sk)".

    A -EINVAL and a verbose are done instead of WARN_ON.  A test is
    added to the test_verifier in a later patch.

    Since the WARN_ON in "release_reference_state()" is no longer
    needed, "__release_reference_state()" is folded into
    "release_reference_state()" also.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

46f8bc92

09 2月, 2019 2 次提交

ethtool: Remove unnecessary null check in ethtool_rx_flow_rule_create · 8b34ec65

由 Nathan Chancellor 提交于 5年前

net/core/ethtool.c:3023:19: warning: address of array
'ext_m_spec->h_dest' will always evaluate to 'true'
[-Wpointer-bool-conversion]
                if (ext_m_spec->h_dest) {
                ~~  ~~~~~~~~~~~~^~~~~~

h_dest is an array, it can't be null so remove this check.

Fixes: eca4205f ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Link: https://github.com/ClangBuiltLinux/linux/issues/353Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b34ec65

devlink: publish params only after driver init is done · 7c62cfb8

由 Jiri Pirko 提交于 5年前

Currently, user can do dump or get of param values right after the
devlink params are registered. However the driver may not be initialized
which is an issue. The same problem happens during notification
upon param registration. Allow driver to publish devlink params
whenever it is ready to handle get() ops. Note that this cannot
be resolved by init reordering, as the "driverinit" params have
to be available before the driver is initialized (it needs the param
values there).
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c62cfb8

08 2月, 2019 2 次提交

devlink: Add health dump {get,clear} commands · 35455e23

由 Eran Ben Elisha 提交于 5年前

Add devlink health dump commands, in order to run an dump operation
over a specific reporter.

The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.

It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35455e23

devlink: Add health diagnose command · fca42a27

由 Eran Ben Elisha 提交于 5年前

Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fca42a27

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功