提交 · a2344b9e3a8c5c2064306b0d99b0e9a6c4813c08 · openeuler / Kernel

07 7月, 2018 1 次提交

Bluetooth: Use extended scanning if controller supports · a2344b9e

由 Jaganath Kanakkassery 提交于 7月 06, 2018

This implements Set extended scan param and set extended scan enable
commands and use it for start LE scan based on controller support.

The new features added in these commands are setting of new PHY for
scanning and setting of scan duration. Both features are disabled
for now, meaning only 1M PHY is set and scan duration is set to 0
which means that scanning will be done untill scan disable is called.

< HCI Command: LE Set Extended Scan Parameters (0x08|0x0041) plen 8
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement (0x00)
        PHYs: 0x01
        Entry 0: LE 1M
          Type: Active (0x01)
          Interval: 11.250 msec (0x0012)
          Window: 11.250 msec (0x0012)
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Extended Scan Enable (0x08|0x0042) plen 6
        Extended scan: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
        Duration: 0 msec (0x0000)
        Period: 0.00 sec (0x0000)
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Enable (0x08|0x0042) ncmd 2
        Status: Success (0x00)
Signed-off-by: NJaganath Kanakkassery <jaganathx.kanakkassery@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

a2344b9e

06 7月, 2018 4 次提交

Bluetooth: remove unused bt-nokia-h4p.h header · 1b0707a7

由 Corentin Labbe 提交于 6月 07, 2018

Nothing in tree use this header which seems a remains of a staging
driver.
This patch remove it.
Signed-off-by: NCorentin Labbe <clabbe@baylibre.com>
Reviewed-by: NSebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

1b0707a7

Bluetooth: Add HCI command for clear Resolv list · 545f2596

由 Ankit Navik 提交于 6月 29, 2018

Check for Resolv list supported by controller. So check the supported
commmand first before issuing this command i.e.,HCI_OP_LE_CLEAR_RESOLV_LIST

Before patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  #55 [hci0] 13.338168
> HCI Event: Command Complete (0x0e) plen 5                #56 [hci0] 13.338842
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    #57 [hci0] 13.339029
> HCI Event: Command Complete (0x0e) plen 4                #58 [hci0] 13.339939
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  #59 [hci0] 13.340152
> HCI Event: Command Complete (0x0e) plen 5                #60 [hci0] 13.340952
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  #61 [hci0] 13.341180
> HCI Event: Command Complete (0x0e) plen 12               #62 [hci0] 13.341898
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

After patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  #55 [hci0] 28.919131
> HCI Event: Command Complete (0x0e) plen 5                #56 [hci0] 28.920016
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    #57 [hci0] 28.920164
> HCI Event: Command Complete (0x0e) plen 4                #58 [hci0] 28.920873
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  #59 [hci0] 28.921109
> HCI Event: Command Complete (0x0e) plen 5                #60 [hci0] 28.922016
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear Resolving... (0x08|0x0029) plen 0  #61 [hci0] 28.922166
> HCI Event: Command Complete (0x0e) plen 4                #62 [hci0] 28.922872
      LE Clear Resolving List (0x08|0x0029) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  #63 [hci0] 28.923117
> HCI Event: Command Complete (0x0e) plen 12               #64 [hci0] 28.924030
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040
Signed-off-by: NAnkit Navik <ankit.p.navik@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

545f2596

Bluetooth: Store Resolv list size · cfdb0c2d

由 Ankit Navik 提交于 6月 29, 2018

When the controller supports the Read LE Resolv List size feature, the
maximum list size are read and now stored.

Before patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  #55 [hci0] 17.979791
> HCI Event: Command Complete (0x0e) plen 5                #56 [hci0] 17.980629
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    #57 [hci0] 17.980786
> HCI Event: Command Complete (0x0e) plen 4                #58 [hci0] 17.981627
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  #59 [hci0] 17.981786
> HCI Event: Command Complete (0x0e) plen 12               #60 [hci0] 17.982636
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

After patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  #55 [hci0] 13.338168
> HCI Event: Command Complete (0x0e) plen 5                #56 [hci0] 13.338842
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    #57 [hci0] 13.339029
> HCI Event: Command Complete (0x0e) plen 4                #58 [hci0] 13.339939
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  #59 [hci0] 13.340152
> HCI Event: Command Complete (0x0e) plen 5                #60 [hci0] 13.340952
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  #61 [hci0] 13.341180
> HCI Event: Command Complete (0x0e) plen 12               #62 [hci0] 13.341898
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040
Signed-off-by: NAnkit Navik <ankit.p.navik@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

cfdb0c2d

net: ipv6: listify ipv6_rcv() and ip6_rcv_finish() · d8269e2c

由 Edward Cree 提交于 7月 05, 2018

Essentially the same as the ipv4 equivalents.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8269e2c

05 7月, 2018 8 次提交

devlink: Add enable_sriov boolean generic parameter · f567bcda

由 Vasundhara Volam 提交于 7月 04, 2018

enable_sriov - Enables Single-Root Input/Output Virtualization(SR-IOV)
characteristic of the device.
Reviewed-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f567bcda

devlink: Add generic parameters internal_err_reset and max_macs · 036467c3

由 Moshe Shemesh 提交于 7月 04, 2018

Add 2 first generic parameters to devlink configuration parameters set:
internal_err_reset - When set enables reset device on internal errors.
max_macs - max number of MACs per ETH port.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

036467c3

devlink: Add devlink notifications support for params · ea601e17

由 Moshe Shemesh 提交于 7月 04, 2018

Add devlink_param_notify() function to support devlink param notifications.
Add notification call to devlink param set, register and unregister
functions.
Add devlink_param_value_changed() function to enable the driver notify
devlink on value change. Driver should use this function after value was
changed on any configuration mode part to driverinit.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea601e17

devlink: Add support for get/set driverinit value · ec01aeb1

由 Moshe Shemesh 提交于 7月 04, 2018

"driverinit" configuration mode value is held by devlink to enable
the driver query the value after reload. Two additional functions
added to help the driver get/set the value from/to devlink:
devlink_param_driverinit_value_set() and
devlink_param_driverinit_value_get().
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec01aeb1

devlink: Add param set command · e3b7ca18

由 Moshe Shemesh 提交于 7月 04, 2018

Add param set command to set value for a parameter.
Value can be set to any of the supported configuration modes.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3b7ca18

devlink: Add param get command · 45f05def

由 Moshe Shemesh 提交于 7月 04, 2018

Add param get command which gets data per parameter.
Option to dump the parameters data per device.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f05def

devlink: Add devlink_param register and unregister · eabaef18

由 Moshe Shemesh 提交于 7月 04, 2018

Define configuration parameters data structure.
Add functions to register and unregister the driver supported
configuration parameters table.
For each parameter registered, the driver should fill all the parameter's
fields. In case the only supported configuration mode is "driverinit"
the parameter's get()/set() functions are not required and should be set
to NULL, for any other configuration mode, these functions are required
and should be set by the driver.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eabaef18

net: limit each hash list length to MAX_GRO_SKBS · 6312fe77

由 Li RongQing 提交于 7月 05, 2018

After commit 07d78363 ("net: Convert NAPI gro list into a small hash
table.")' there is 8 hash buckets, which allows more flows to be held for
merging.  but MAX_GRO_SKBS, the total held skb for merging, is 8 skb still,
limit the hash table performance.

keep MAX_GRO_SKBS as 8 skb, but limit each hash list length to 8 skb, not
the total 8 skb
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6312fe77

04 7月, 2018 14 次提交

net/sched: Make etf report drops on error_queue · 4b15c707

由 Jesus Sanchez-Palencia 提交于 7月 03, 2018

Use the socket error queue for reporting dropped packets if the
socket has enabled that feature through the SO_TXTIME API.

Packets are dropped either on enqueue() if they aren't accepted by the
qdisc or on dequeue() if the system misses their deadline. Those are
reported as different errors so applications can react accordingly.

Userspace can retrieve the errors through the socket error queue and the
corresponding cmsg interfaces. A struct sock_extended_err* is used for
returning the error data, and the packet's timestamp can be retrieved by
adding both ee_data and ee_info fields as e.g.:

    ((__u64) serr->ee_data << 32) + serr->ee_info

This feature is disabled by default and must be explicitly enabled by
applications. Enabling it can bring some overhead for the Tx cycles
of the application.
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b15c707

net/sched: Add HW offloading capability to ETF · 88cab771

由 Jesus Sanchez-Palencia 提交于 7月 03, 2018

Add infra so etf qdisc supports HW offload of time-based transmission.

For hw offload, the time sorted list is still used, so packets are
dequeued always in order of txtime.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \
	   clockid CLOCK_REALTIME

In this example, the Qdisc will use HW offload for the control of the
transmission time through the network adapter. The hrtimer used for
packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME
as reference and packets leave the Qdisc "delta" (100000) nanoseconds
before their transmission time. Because this will be using HW offload and
since dynamic clocks are not supported by the hrtimer, the system clock
and the PHC clock must be synchronized for this mode to behave as
expected.
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88cab771

net/sched: Introduce the ETF Qdisc · 25db26a9

由 Vinicius Costa Gomes 提交于 7月 03, 2018

The ETF (Earliest TxTime First) qdisc uses the information added
earlier in this series (the socket option SO_TXTIME and the new
role of sk_buff->tstamp) to schedule packets transmission based
on absolute time.

For some workloads, just bandwidth enforcement is not enough, and
precise control of the transmission of packets is necessary.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \
           clockid CLOCK_TAI

In this example, the Qdisc will provide SW best-effort for the control
of the transmission time to the network adapter, the time stamp in the
socket will be in reference to the clockid CLOCK_TAI and packets
will leave the qdisc "delta" (100000) nanoseconds before its transmission
time.

The ETF qdisc will buffer packets sorted by their txtime. It will drop
packets on enqueue() if their skbuff clockid does not match the clock
reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped
if it expires while being enqueued.

The qdisc also supports the SO_TXTIME deadline mode. For this mode, it
will dequeue a packet as soon as possible and change the skb timestamp
to 'now' during etf_dequeue().

Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid
to be configured, but it's been decided that usage of CLOCK_TAI should
be enforced until we decide to allow for other clockids to be used.
The rationale here is that PTP times are usually in the TAI scale, thus
no other clocks should be necessary. For now, the qdisc will return
EINVAL if any clocks other than CLOCK_TAI are used.
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25db26a9

net/sched: Allow creating a Qdisc watchdog with other clocks · 860b642b

由 Vinicius Costa Gomes 提交于 7月 03, 2018

This adds 'qdisc_watchdog_init_clockid()' that allows a clockid to be
passed, this allows other time references to be used when scheduling
the Qdisc to run.
Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

860b642b

net: ipv4: Hook into time based transmission · bc969a97

由 Jesus Sanchez-Palencia 提交于 7月 03, 2018

Add a transmit_time field to struct inet_cork, then copy the
timestamp from the CMSG cookie at ip_setup_cork() so we can
safely copy it into the skb later during __ip_make_skb().

For the raw fast path, just perform the copy at raw_send_hdrinc().
Signed-off-by: NRichard Cochran <rcochran@linutronix.de>
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc969a97

net: Add a new socket option for a future transmit time. · 80b14dee

由 Richard Cochran 提交于 7月 03, 2018

This patch introduces SO_TXTIME. User space enables this option in
order to pass a desired future transmit time in a CMSG when calling
sendmsg(2). The argument to this socket option is a 8-bytes long struct
provided by the uapi header net_tstamp.h defined as:

struct sock_txtime {
	clockid_t 	clockid;
	u32		flags;
};

Note that new fields were added to struct sock by filling a 2-bytes
hole found in the struct. For that reason, neither the struct size or
number of cachelines were altered.
Signed-off-by: NRichard Cochran <rcochran@linutronix.de>
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80b14dee

net:sched: add action inheritdsfield to skbedit · e7e3728b

由 Qiaobin Fu 提交于 7月 01, 2018

The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v5:
*Update the drop counter for TC_ACT_SHOT

v4:
*Not allow setting flags other than the expected ones.

*Allow dumping the pure flags.

v3:
*Use optional flags, so that it won't break old versions of tc.

*Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags.

v2:
*Fix the style issue

*Move the code from skbmod to skbedit

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NQiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: NMichel Machado <michel@digirati.com.br>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7e3728b

net: ipv4: listified version of ip_rcv · 17266ee9

由 Edward Cree 提交于 7月 02, 2018

Also involved adding a way to run a netfilter hook over a list of packets.
Rather than attempting to make netfilter know about lists (which would be
a major project in itself) we just let it call the regular okfn (in this
case ip_rcv_finish()) for any packets it steals, and have it give us back
a list of packets it's synchronously accepted (which normally NF_HOOK
would automatically call okfn() on, but we want to be able to potentially
pass the list to a listified version of okfn().)
The netfilter hooks themselves are indirect calls that still happen per-
packet (see nf_hook_entry_hookfn()), but again, changing that can be left
for future work.

There is potential for out-of-order receives if the netfilter hook ends up
synchronously stealing packets, as they will be processed before any
accepts earlier in the list. However, it was already possible for an
asynchronous accept to cause out-of-order receives, so presumably this is
considered OK.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17266ee9

net: core: another layer of lists, around PF_MEMALLOC skb handling · 4ce0017a

由 Edward Cree 提交于 7月 02, 2018

First example of a layer splitting the list (rather than merely taking
 individual packets off it).
Involves new list.h function, list_cut_before(), like list_cut_position()
 but cuts on the other side of the given entry.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ce0017a

net: core: unwrap skb list receive slightly further · 920572b7

由 Edward Cree 提交于 7月 02, 2018

Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

920572b7

net: core: trivial netif_receive_skb_list() entry point · f6ad8c1b

由 Edward Cree 提交于 7月 02, 2018

Just calls netif_receive_skb() in a loop.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6ad8c1b

sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams · 0b0dce7a

由 Xin Long 提交于 7月 02, 2018

spp_ipv6_flowlabel and spp_dscp are added in sctp_paddrparams in
this patch so that users could set sctp_sock/asoc/transport dscp
and flowlabel with spp_flags SPP_IPV6_FLOWLABEL or SPP_DSCP by
SCTP_PEER_ADDR_PARAMS , as described section 8.1.12 in RFC6458.

As said in last patch, it uses '| 0x100000' or '|0x1' to mark
flowlabel or dscp is set,  so that their values could be set
to 0.

Note that to guarantee that an old app built with old kernel
headers could work on the newer kernel, the param's check in
sctp_g/setsockopt_peer_addr_params() is also improved, which
follows the way that sctp_g/setsockopt_delayed_ack() or some
other sockopts' process that accept two types of params does.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b0dce7a

sctp: add support for dscp and flowlabel per transport · 8a9c58d2

由 Xin Long 提交于 7月 02, 2018

Like some other per transport params, flowlabel and dscp are added
in transport, asoc and sctp_sock. By default, transport sets its
value from asoc's, and asoc does it from sctp_sock. flowlabel
only works for ipv6 transport.

Other than that they need to be passed down in sctp_xmit, flow4/6
also needs to set them before looking up route in get_dst.

Note that it uses '& 0x100000' to check if flowlabel is set and
'& 0x1' (tos 1st bit is unused) to check if dscp is set by users,
so that they could be set to 0 by sockopt in next patch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a9c58d2

ipv4: add __ip_queue_xmit() that supports tos param · 69b9e1e0

由 Xin Long 提交于 7月 02, 2018

This patch introduces __ip_queue_xmit(), through which the callers
can pass tos param into it without having to set inet->tos. For
ipv6, ip6_xmit() already allows passing tclass parameter.

It's needed when some transport protocol doesn't use inet->tos,
like sctp's per transport dscp, which will be added in next patch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69b9e1e0

02 7月, 2018 6 次提交

net: expose sk wmem in sock_exceed_buf_limit tracepoint · d6f19938

由 Yafang Shao 提交于 7月 01, 2018

Currently trace_sock_exceed_buf_limit() only show rmem info,
but wmem limit may also be hit.
So expose wmem info in this tracepoint as well.

Regarding memcg, I think it is better to introduce a new tracepoint(if
that is needed), i.e. trace_memcg_limit_hit other than show memcg info in
trace_sock_exceed_buf_limit.
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6f19938

net: fix use-after-free in GRO with ESP · 603d4cf8

由 Sabrina Dubroca 提交于 6月 30, 2018

Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.

Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.

This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.

Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

603d4cf8

net: Enable Tx queue selection based on Rx queues · fc9bab24

由 Amritha Nambiar 提交于 6月 29, 2018

This patch adds support to pick Tx queue based on the Rx queue(s) map
configuration set by the admin through the sysfs attribute
for each Tx queue. If the user configuration for receive queue(s) map
does not apply, then the Tx queue selection falls back to CPU(s) map
based selection and finally to hashing.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc9bab24

net: Record receive queue number for a connection · c6345ce7

由 Amritha Nambiar 提交于 6月 29, 2018

This patch adds a new field to sock_common 'skc_rx_queue_mapping'
which holds the receive queue number for the connection. The Rx queue
is marked in tcp_finish_connect() to allow a client app to do
SO_INCOMING_NAPI_ID after a connect() call to get the right queue
association for a socket. Rx queue is also marked in tcp_conn_request()
to allow syn-ack to go on the right tx-queue associated with
the queue on which syn is received.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6345ce7

net: sock: Change tx_queue_mapping in sock_common to unsigned short · 755c31cd

由 Amritha Nambiar 提交于 6月 29, 2018

Change 'skc_tx_queue_mapping' field in sock_common structure from
'int' to 'unsigned short' type with ~0 indicating unset and
other positive queue values being set. This will accommodate adding
a new 'unsigned short' field in sock_common in the next patch for
rx_queue_mapping.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

755c31cd

net: Refactor XPS for CPUs and Rx queues · 80d19669

由 Amritha Nambiar 提交于 6月 29, 2018

Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80d19669

30 6月, 2018 7 次提交

tipc: extend sock diag for group communication · a1be5a20

由 GhantaKrishnamurthy MohanKrishna 提交于 6月 29, 2018

This commit extends the existing TIPC socket diagnostics framework
for information related to TIPC group communication.
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1be5a20

net/smc: add SMC-D diag support · 4b1b7d3b

由 Hans Wippel 提交于 6月 28, 2018

This patch adds diag support for SMC-D.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b1b7d3b

net/smc: add pnetid support for SMC-D and ISM · 1619f770

由 Hans Wippel 提交于 6月 28, 2018

SMC-D relies on PNETIDs to find usable SMC-D/ISM devices for a SMC
connection. This patch adds SMC-D/ISM support to the current PNETID
implementation.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1619f770

net/smc: add base infrastructure for SMC-D and ISM · c6ba7c9b

由 Hans Wippel 提交于 6月 28, 2018

SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.

This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:

* ISM driver interface:
  This interface allows an ISM driver to register ISM devices in SMC. In
  the process, the driver provides a set of device ops for each device.
  SMC uses these ops to execute SMC specific operations on or transfer
  data over the device.

* Core SMC-D link group, connection, and buffer support:
  Link groups, SMC connections and SMC buffers (in smc_core) are
  extended to support SMC-D.

* SMC type checks:
  Some type checks are added to prevent using SMC-R specific code for
  SMC-D and vice versa.

To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6ba7c9b

net/smc: add pnetid support · 0afff91c

由 Ursula Braun 提交于 6月 28, 2018

s390 hardware supports the definition of a so-call Physical NETwork
IDentifier (short PNETID) per network device port. These PNETIDS
can be used to identify network devices that are attached to the same
physical network (broadcast domain).

On s390 try to use the PNETID of the ethernet device port used for
initial connecting, and derive the IB device port used for SMC RDMA
traffic.

On platforms without PNETID support fall back to the existing
solution of a configured pnet table.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0afff91c

tcp: add new SNMP counter for drops when try to queue in rcv queue · ea5d0c32

由 Yafang Shao 提交于 6月 28, 2018

When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.

In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.

While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.

LINUX_MIB_TCPRCVQDROP:  Number of packets meant to be queued in rcv queue
			but dropped because socket rcvbuf limit hit.
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea5d0c32

bpf: undo prog rejection on read-only lock failure · 85782e03

由 Daniel Borkmann 提交于 6月 28, 2018

Partially undo commit 9facc336 ("bpf: reject any prog that failed
read-only lock") since it caused a regression, that is, syzkaller was
able to manage to cause a panic via fault injection deep in set_memory_ro()
path by letting an allocation fail: In x86's __change_page_attr_set_clr()
it was able to change the attributes of the primary mapping but not in
the alias mapping via cpa_process_alias(), so the second, inner call
to the __change_page_attr() via __change_page_attr_set_clr() had to split
a larger page and failed in the alloc_pages() with the artifically triggered
allocation error which is then propagated down to the call site.

Thus, for set_memory_ro() this means that it returned with an error, but
from debugging a probe_kernel_write() revealed EFAULT on that memory since
the primary mapping succeeded to get changed. Therefore the subsequent
hdr->locked = 0 reset triggered the panic as it was performed on read-only
memory, so call-site assumptions were infact wrong to assume that it would
either succeed /or/ not succeed at all since there's no such rollback in
set_memory_*() calls from partial change of mappings, in other words, we're
left in a state that is "half done". A later undo via set_memory_rw() is
succeeding though due to matching permissions on that part (aka due to the
try_preserve_large_page() succeeding). While reproducing locally with
explicitly triggering this error, the initial splitting only happens on
rare occasions and in real world it would additionally need oom conditions,
but that said, it could partially fail. Therefore, it is definitely wrong
to bail out on set_memory_ro() error and reject the program with the
set_memory_*() semantics we have today. Shouldn't have gone the extra mile
since no other user in tree today infact checks for any set_memory_*()
errors, e.g. neither module_enable_ro() / module_disable_ro() for module
RO/NX handling which is mostly default these days nor kprobes core with
alloc_insn_page() / free_insn_page() as examples that could be invoked long
after bootup and original 314beb9b ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") did neither when it got first introduced to BPF
so "improving" with bailing out was clearly not right when set_memory_*()
cannot handle it today.

Kees suggested that if set_memory_*() can fail, we should annotate it with
__must_check, and all callers need to deal with it gracefully given those
set_memory_*() markings aren't "advisory", but they're expected to actually
do what they say. This might be an option worth to move forward in future
but would at the same time require that set_memory_*() calls from supporting
archs are guaranteed to be "atomic" in that they provide rollback if part
of the range fails, once that happened, the transition from RW -> RO could
be made more robust that way, while subsequent RO -> RW transition /must/
continue guaranteeing to always succeed the undo part.

Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com
Fixes: 9facc336 ("bpf: reject any prog that failed read-only lock")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

85782e03

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功