提交 · 3d391f6518fddcd44367d463aa20a50145f3ea3f · openeuler / Kernel

07 3月, 2022 1 次提交

ptp: Add generic PTP is_sync() function · f72de02e

由 Kurt Kanzenbach 提交于 3月 05, 2022

PHY drivers such as micrel or dp83640 need to analyze whether a given
skb is a PTP sync message for one step functionality.

In order to avoid code duplication introduce a generic function and
move it to ptp classify.
Signed-off-by: NKurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f72de02e

06 3月, 2022 3 次提交

wireless: Use netif_rx(). · f9834dbd

由 Sebastian Andrzej Siewior 提交于 3月 05, 2022

Since commit
   baebdf48 ("net: dev: Makes sure netif_rx() can be invoked in any context.")

the function netif_rx() can be used in preemptible/thread context as
well as in interrupt context.

Use netif_rx().

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9834dbd

can: Use netif_rx(). · 00f4a0af

由 Sebastian Andrzej Siewior 提交于 3月 05, 2022

Since commit
   baebdf48 ("net: dev: Makes sure netif_rx() can be invoked in any context.")

the function netif_rx() can be used in preemptible/thread context as
well as in interrupt context.

Use netif_rx().

Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-can@vger.kernel.org
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00f4a0af

Revert "net/smc: don't req_notify until all CQEs drained" · 925a2421

由 Dust Li 提交于 3月 04, 2022

This reverts commit a505cce6.

Leon says:
  We already discussed that. SMC should be changed to use
  RDMA CQ pool API
  drivers/infiniband/core/cq.c.
  ib_poll_handler() has much better implementation (tracing,
  IRQ rescheduling, proper error handling) than this SMC variant.

Since we will switch to ib_poll_handler() in the future,
revert this patch.

Link: https://lore.kernel.org/netdev/20220301105332.GA9417@linux.alibaba.com/Suggested-by: NLeon Romanovsky <leon@kernel.org>
Suggested-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

925a2421

05 3月, 2022 3 次提交

net: dsa: tag_rtl8_4: add rtl8_4t trailing variant · cd87fecd

由 Luiz Angelo Daros de Luca 提交于 3月 02, 2022

Realtek switches supports the same tag both before ethertype or between
payload and the CRC.
Signed-off-by: NLuiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd87fecd

mptcp: add the mibs for MP_RST · e40dd439

由 Geliang Tang 提交于 3月 04, 2022

This patch added two more mibs for MP_RST, MPTCP_MIB_MPRSTTX for
the MP_RST sending and MPTCP_MIB_MPRSTRX for the MP_RST receiving.
Signed-off-by: NGeliang Tang <geliang.tang@suse.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e40dd439

mptcp: add the mibs for MP_FASTCLOSE · 1e75629c

由 Geliang Tang 提交于 3月 04, 2022

This patch added two more mibs for MP_FASTCLOSE, MPTCP_MIB_MPFASTCLOSETX
for the MP_FASTCLOSE sending and MPTCP_MIB_MPFASTCLOSERX for receiving.

Also added a debug log for MP_FASTCLOSE receiving, printed out the recv_key
of MP_FASTCLOSE in mptcp_parse_option to show that MP_RST is received.
Signed-off-by: NGeliang Tang <geliang.tang@suse.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

1e75629c

04 3月, 2022 22 次提交

Bluetooth: use memset avoid memory leaks · d3715b23

由 Minghao Chi (CGEL ZTE) 提交于 2月 25, 2022

Use memset to initialize structs to prevent memory leaks
in l2cap_ecred_connect
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NMinghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

d3715b23

Bluetooth: move adv_instance_cnt read within the device lock · 4bd80d7a

由 Niels Dossche 提交于 2月 13, 2022

The field adv_instance_cnt is always accessed within a device lock,
except in the function add_advertising. A concurrent remove of an
advertisement with adding another one could result in the if check
"if a new instance was actually added" to not trigger, resulting
in not triggering the "advertising added event".
Signed-off-by: NNiels Dossche <niels.dossche@ugent.be>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

4bd80d7a

Bluetooth: hci_event: Add missing locking on hdev in hci_le_ext_adv_term_evt · 728abc01

由 Niels Dossche 提交于 2月 09, 2022

Both hci_find_adv_instance and hci_remove_adv_instance have a comment
above their function definition saying that these two functions require
the caller to hold the hdev->lock lock. However, hci_le_ext_adv_term_evt
does not acquire that lock and neither does its caller hci_le_meta_evt
(hci_le_meta_evt calls hci_le_ext_adv_term_evt via an indirect function
call because of the lookup in hci_le_ev_table).

The other event handlers all acquire and release the hdev->lock and they
follow the rule that hci_find_adv_instance and hci_remove_adv_instance
must be called while holding the hdev->lock lock.

The solution is to make sure hci_le_ext_adv_term_evt also acquires and
releases the hdev->lock lock. The check on ev->status which logs a
warning and does an early return is not covered by the lock because
other functions also access ev->status without holding the lock.
Signed-off-by: NNiels Dossche <niels.dossche@ugent.be>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

728abc01

Bluetooth: make array bt_uuid_any static const · e616fec6

由 Colin Ian King 提交于 2月 14, 2022

Don't populate the read-only array bt_uuid_any on the stack but
instead make it static const. Also makes the object code a little
smaller.
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

e616fec6

Bluetooth: 6lowpan: No need to clear memory twice · f1b8eea0

由 Christophe JAILLET 提交于 2月 13, 2022

'peer_addr' is a structure embedded in 'struct lowpan_peer'. So there is no
need to explicitly call memset(0) on it. It is already zeroed by kzalloc()
when 'peer' is allocated.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

f1b8eea0

Bluetooth: Improve skb handling in mgmt_device_connected() · c2b2a1a7

由 Radoslaw Biernacki 提交于 2月 01, 2022

This patch introduce eir_skb_put_data() that can be used to simplify
operations on eir in goal of eliminating the necessity of intermediary
buffers.
eir_skb_put_data() is in pair to what eir_append_data() does with help of
eir_len, but without awkwardness when passing return value to skb_put() (as
it returns updated offset not size).
Signed-off-by: NRadoslaw Biernacki <rad@semihalf.com>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>

c2b2a1a7

Bluetooth: Fix skb allocation in mgmt_remote_name() & mgmt_device_connected() · ba17bb62

由 Radoslaw Biernacki 提交于 2月 01, 2022

This patch fixes skb allocation, as lack of space for ev might push skb
tail beyond its end.
Also introduce eir_precalc_len() that can be used instead of magic
numbers for similar eir operations on skb.

Fixes: cf1bce1d ("Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_FOUND")
Fixes: e9674143 ("Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_CONNECTED")
Signed-off-by: NAngela Czubak <acz@semihalf.com>
Signed-off-by: NMarek Maslanka <mm@semihalf.com>
Signed-off-by: NRadoslaw Biernacki <rad@semihalf.com>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>

ba17bb62

Bluetooth: mgmt: Remove unneeded variable · a6fbb2bf

由 Minghao Chi 提交于 1月 18, 2022

Return value from mgmt_cmd_complete() directly instead
of taking this in another redundant variable.
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NMinghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: NCGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>

a6fbb2bf

Bluetooth: hci_sync: fix undefined return of hci_disconnect_all_sync() · 8cd3c55c

由 Tom Rix 提交于 2月 01, 2022

clang static analysis reports this problem
hci_sync.c:4428:2: warning: Undefined or garbage value
  returned to caller
        return err;
        ^~~~~~~~~~

If there are no connections this function is a noop but
err is never set and a false error could be reported.
Return 0 as other hci_* functions do.

Fixes: 182ee45d ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Signed-off-by: NTom Rix <trix@redhat.com>
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>

8cd3c55c

net: dev: use kfree_skb_reason() for __netif_receive_skb_core() · 6c2728b7

由 Menglong Dong 提交于 3月 04, 2022

Add reason for skb drops to __netif_receive_skb_core() when packet_type
not found to handle the skb. For this purpose, the drop reason
SKB_DROP_REASON_PTYPE_ABSENT is introduced. Take ether packets for
example, this case mainly happens when L3 protocol is not supported.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c2728b7

net: dev: use kfree_skb_reason() for sch_handle_ingress() · a568aff2

由 Menglong Dong 提交于 3月 04, 2022

Replace kfree_skb() used in sch_handle_ingress() with
kfree_skb_reason(). Following drop reasons are introduced:

SKB_DROP_REASON_TC_INGRESS
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a568aff2

net: dev: use kfree_skb_reason() for do_xdp_generic() · 7e726ed8

由 Menglong Dong 提交于 3月 04, 2022

Replace kfree_skb() used in do_xdp_generic() with kfree_skb_reason().
The drop reason SKB_DROP_REASON_XDP is introduced for this case.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e726ed8

net: dev: use kfree_skb_reason() for enqueue_to_backlog() · 44f0bd40

由 Menglong Dong 提交于 3月 04, 2022

Replace kfree_skb() used in enqueue_to_backlog() with
kfree_skb_reason(). The skb rop reason SKB_DROP_REASON_CPU_BACKLOG is
introduced for the case of failing to enqueue the skb to the per CPU
backlog queue. The further reason can be backlog queue full or RPS
flow limition, and I think we needn't to make further distinctions.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44f0bd40

net: dev: add skb drop reasons to __dev_xmit_skb() · 7faef054

由 Menglong Dong 提交于 3月 04, 2022

Add reasons for skb drops to __dev_xmit_skb() by replacing
kfree_skb_list() with kfree_skb_list_reason(). The drop reason of
SKB_DROP_REASON_QDISC_DROP is introduced for qdisc enqueue fails.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7faef054

net: skb: introduce the function kfree_skb_list_reason() · 215b0f19

由 Menglong Dong 提交于 3月 04, 2022

To report reasons of skb drops, introduce the function
kfree_skb_list_reason() and make kfree_skb_list() an inline call to
it. This function will be used in the next commit in
__dev_xmit_skb().
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

215b0f19

net: dev: use kfree_skb_reason() for sch_handle_egress() · 98b4d7a4

由 Menglong Dong 提交于 3月 04, 2022

Replace kfree_skb() used in sch_handle_egress() with kfree_skb_reason().
The drop reason SKB_DROP_REASON_TC_EGRESS is introduced. Considering
the code path of tc egerss, we make it distinct with the drop reason
of SKB_DROP_REASON_QDISC_DROP in the next commit.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98b4d7a4

net: dev: Use netif_rx(). · ad0a043f

由 Sebastian Andrzej Siewior 提交于 3月 03, 2022

Since commit
   baebdf48 ("net: dev: Makes sure netif_rx() can be invoked in any context.")

the function netif_rx() can be used in preemptible/thread context as
well as in interrupt context.

Use netif_rx().
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad0a043f

net: bridge: Use netif_rx(). · 2e83bdd5

由 Sebastian Andrzej Siewior 提交于 3月 03, 2022

Since commit
   baebdf48 ("net: dev: Makes sure netif_rx() can be invoked in any context.")

the function netif_rx() can be used in preemptible/thread context as
well as in interrupt context.

Use netif_rx().

Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e83bdd5

net: caif: Use netif_rx(). · 3fb4430e

由 Sebastian Andrzej Siewior 提交于 3月 03, 2022

Since commit
   baebdf48 ("net: dev: Makes sure netif_rx() can be invoked in any context.")

the function netif_rx() can be used in preemptible/thread context as
well as in interrupt context.

Use netif_rx().
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fb4430e

ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() · 2d3916f3

由 Eric Dumazet 提交于 3月 03, 2022

While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.

Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.

Fixes: f185de28 ("mld: add new workqueues for process mld events")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

2d3916f3

net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change · e1bec7fa

由 Vladimir Oltean 提交于 3月 03, 2022

The blamed commit said one thing but did another. It explains that we
should restore the "return err" to the original "goto out_unwind_tagger",
but instead it replaced it with "goto out_unlock".

When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
multi-switch tree, the switches would end up not using the same tagging
protocol.

Fixes: 0b0e2ff1 ("net: dsa: restore error path of dsa_tree_change_tag_proto")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e1bec7fa

net: dcb: disable softirqs in dcbnl_flush_dev() · 10b6bb62

由 Vladimir Oltean 提交于 3月 02, 2022

Ido Schimmel points out that since commit 52cff74e ("dcbnl : Disable
software interrupts before taking dcb_lock"), the DCB API can be called
by drivers from softirq context.

One such in-tree example is the chelsio cxgb4 driver:
dcb_rpl
-> cxgb4_dcb_handle_fw_update
   -> dcb_ieee_setapp

If the firmware for this driver happened to send an event which resulted
in a call to dcb_ieee_setapp() at the exact same time as another
DCB-enabled interface was unregistering on the same CPU, the softirq
would deadlock, because the interrupted process was already holding the
dcb_lock in dcbnl_flush_dev().

Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
in the rest of the dcbnl code.

Fixes: 91b0383f ("net: dcb: flush lingering app table entries for unregistered devices")
Reported-by: NIdo Schimmel <idosch@idosch.org>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

10b6bb62

03 3月, 2022 11 次提交

bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time() · 8d21ec0e

由 Martin KaFai Lau 提交于 3月 02, 2022

* __sk_buff->delivery_time_type:
This patch adds __sk_buff->delivery_time_type.  It tells if the
delivery_time is stored in __sk_buff->tstamp or not.

It will be most useful for ingress to tell if the __sk_buff->tstamp
has the (rcv) timestamp or delivery_time.  If delivery_time_type
is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp.

Two non-zero types are defined for the delivery_time_type,
BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC.  For UNSPEC,
it can only happen in egress because only mono delivery_time can be
forwarded to ingress now.  The clock of UNSPEC delivery_time
can be deduced from the skb->sk->sk_clockid which is how
the sch_etf doing it also.

* Provide forwarded delivery_time to tc-bpf@ingress:
With the help of the new delivery_time_type, the tc-bpf has a way
to tell if the __sk_buff->tstamp has the (rcv) timestamp or
the delivery_time.  During bpf load time, the verifier will learn if
the bpf prog has accessed the new __sk_buff->delivery_time_type.
If it does, it means the tc-bpf@ingress is expecting the
skb->tstamp could have the delivery_time.  The kernel will then
read the skb->tstamp as-is during bpf insn rewrite without
checking the skb->mono_delivery_time.  This is done by adding a
new prog->delivery_time_access bit.  The same goes for
writing skb->tstamp.

* bpf_skb_set_delivery_time():
The bpf_skb_set_delivery_time() helper is added to allow setting both
delivery_time and the delivery_time_type at the same time.  If the
tc-bpf does not need to change the delivery_time_type, it can directly
write to the __sk_buff->tstamp as the existing tc-bpf has already been
doing.  It will be most useful at ingress to change the
__sk_buff->tstamp from the (rcv) timestamp to
a mono delivery_time and then bpf_redirect_*().

bpf only has mono clock helper (bpf_ktime_get_ns), and
the current known use case is the mono EDT for fq, and
only mono delivery time can be kept during forward now,
so bpf_skb_set_delivery_time() only supports setting
BPF_SKB_DELIVERY_TIME_MONO.  It can be extended later when use cases
come up and the forwarding path also supports other clock bases.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d21ec0e

bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress · 7449197d

由 Martin KaFai Lau 提交于 3月 02, 2022

The current tc-bpf@ingress reads and writes the __sk_buff->tstamp
as a (rcv) timestamp which currently could either be 0 (not available)
or ktime_get_real().  This patch is to backward compatible with the
(rcv) timestamp expectation at ingress.  If the skb->tstamp has
the delivery_time, the bpf insn rewrite will read 0 for tc-bpf
running at ingress as it is not available.  When writing at ingress,
it will also clear the skb->mono_delivery_time bit.

/* BPF_READ: a = __sk_buff->tstamp */
if (!skb->tc_at_ingress || !skb->mono_delivery_time)
	a = skb->tstamp;
else
	a = 0

/* BPF_WRITE: __sk_buff->tstamp = a */
if (skb->tc_at_ingress)
	skb->mono_delivery_time = 0;
skb->tstamp = a;

[ A note on the BPF_CGROUP_INET_INGRESS which can also access
  skb->tstamp.  At that point, the skb is delivered locally
  and skb_clear_delivery_time() has already been done,
  so the skb->tstamp will only have the (rcv) timestamp. ]

If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time
has to be cleared also.  It could be done together during
convert_ctx_access().  However, the latter patch will also expose
the skb->mono_delivery_time bit as __sk_buff->delivery_time_type.
Changing the delivery_time_type in the background may surprise
the user, e.g. the 2nd read on __sk_buff->delivery_time_type
may need a READ_ONCE() to avoid compiler optimization.  Thus,
in expecting the needs in the latter patch, this patch does a
check on !skb->tstamp after running the tc-bpf and clears the
skb->mono_delivery_time bit if needed.  The earlier discussion
on v4 [0].

The bpf insn rewrite requires the skb's mono_delivery_time bit and
tc_at_ingress bit.  They are moved up in sk_buff so that bpf rewrite
can be done at a fixed offset.  tc_skip_classify is moved together with
tc_at_ingress.  To get one bit for mono_delivery_time, csum_not_inet is
moved down and this bit is currently used by sctp.

[0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7449197d

net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally · cd14e9b7

由 Martin KaFai Lau 提交于 3月 02, 2022

The previous patches handled the delivery_time in the ingress path
before the routing decision is made.  This patch can postpone clearing
delivery_time in a skb until knowing it is delivered locally and also
set the (rcv) timestamp if needed.  This patch moves the
skb_clear_delivery_time() from dev.c to ip_local_deliver_finish()
and ip6_input_finish().
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd14e9b7

net: Get rcv tstamp if needed in nfnetlink_{log, queue}.c · 80fcec67

由 Martin KaFai Lau 提交于 3月 02, 2022

If skb has the (rcv) timestamp available, nfnetlink_{log, queue}.c
logs/outputs it to the userspace.  When the locally generated skb is
looping from egress to ingress over a virtual interface (e.g. veth,
loopback...),  skb->tstamp may have the delivery time before it is
known that will be delivered locally and received by another sk.  Like
handling the delivery time in network tapping,  use ktime_get_real() to
get the (rcv) timestamp.  The earlier added helper skb_tstamp_cond() is
used to do this.  false is passed to the second 'cond' arg such
that doing ktime_get_real() or not only depends on the
netstamp_needed_key static key.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80fcec67

net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM option · b6561f84

由 Martin KaFai Lau 提交于 3月 02, 2022

IOAM is a hop-by-hop option with a temporary iana allocation (49).
Since it is hop-by-hop, it is done before the input routing decision.
One of the traced data field is the (rcv) timestamp.

When the locally generated skb is looping from egress to ingress over
a virtual interface (e.g. veth, loopback...), skb->tstamp may have the
delivery time before it is known that it will be delivered locally
and received by another sk.

Like handling the network tapping (tcpdump) in the earlier patch,
this patch gets the timestamp if needed without over-writing the
delivery_time in the skb->tstamp.  skb_tstamp_cond() is added to do the
ktime_get_real() with an extra cond arg to check on top of the
netstamp_needed_key static key.  skb_tstamp_cond() will also be used in
a latter patch and it needs the netstamp_needed_key check.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6561f84

net: ipv6: Handle delivery_time in ipv6 defrag · 335c8cf3

由 Martin KaFai Lau 提交于 3月 02, 2022

A latter patch will postpone the delivery_time clearing until the stack
knows the skb is being delivered locally (i.e. calling
skb_clear_delivery_time() at ip_local_deliver_finish() for IPv4
and at ip6_input_finish() for IPv6).  That will allow other kernel
forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.

A very similar IPv6 defrag codes have been duplicated in
multiple places: regular IPv6, nf_conntrack, and 6lowpan.

Unlike the IPv4 defrag which is done before ip_local_deliver_finish(),
the regular IPv6 defrag is done after ip6_input_finish().
Thus, no change should be needed in the regular IPv6 defrag
logic because skb_clear_delivery_time() should have been called.

6lowpan also does not need special handling on delivery_time
because it is a non-inet packet_type.

However, cf_conntrack has a case in NF_INET_PRE_ROUTING that needs
to do the IPv6 defrag earlier.  Thus, it needs to save the
mono_delivery_time bit in the inet_frag_queue which is similar
to how it is handled in the previous patch for the IPv4 defrag.

This patch chooses to do it consistently and stores the mono_delivery_time
in the inet_frag_queue for all cases such that it will be easier
for the future refactoring effort on the IPv6 reasm code.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

335c8cf3

net: ip: Handle delivery_time in ip defrag · 8672406e

由 Martin KaFai Lau 提交于 3月 02, 2022

A latter patch will postpone the delivery_time clearing until the stack
knows the skb is being delivered locally.  That will allow other kernel
forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.

An earlier attempt was to do skb_clear_delivery_time() in
ip_local_deliver() and ip6_input().  The discussion [0] requested
to move it one step later into ip_local_deliver_finish()
and ip6_input_finish() so that the delivery_time can be kept
for the ip_vs forwarding path also.

To do that, this patch also needs to take care of the (rcv) timestamp
usecase in ip_is_fragment().  It needs to expect delivery_time in
the skb->tstamp, so it needs to save the mono_delivery_time bit in
inet_frag_queue such that the delivery_time (if any) can be restored
in the final defragmented skb.

[Note that it will only happen when the locally generated skb is looping
 from egress to ingress over a virtual interface (e.g. veth, loopback...),
 skb->tstamp may have the delivery time before it is known that it will
 be delivered locally and received by another sk.]

[0]: https://lore.kernel.org/netdev/ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8672406e

net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() · d98d58a0

由 Martin KaFai Lau 提交于 3月 02, 2022

The previous patches handled the delivery_time before sch_handle_ingress().

This patch can now set the skb->mono_delivery_time to flag the skb->tstamp
is used as the mono delivery_time (EDT) instead of the (rcv) timestamp
and also clear it with skb_clear_delivery_time() after
sch_handle_ingress().  This will make the bpf_redirect_*()
to keep the mono delivery_time and used by a qdisc (fq) of
the egress-ing interface.

A latter patch will postpone the skb_clear_delivery_time() until the
stack learns that the skb is being delivered locally and that will
make other kernel forwarding paths (ip[6]_forward) able to keep
the delivery_time also.  Thus, like the previous patches on using
the skb->mono_delivery_time bit, calling skb_clear_delivery_time()
is not limited within the CONFIG_NET_INGRESS to avoid too many code
churns among this set.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d98d58a0

net: Clear mono_delivery_time bit in __skb_tstamp_tx() · d93376f5

由 Martin KaFai Lau 提交于 3月 02, 2022

In __skb_tstamp_tx(), it may clone the egress skb and queues the clone to
the sk_error_queue.  The outgoing skb may have the mono delivery_time
while the (rcv) timestamp is expected for the clone, so the
skb->mono_delivery_time bit needs to be cleared from the clone.

This patch adds the skb->mono_delivery_time clearing to the existing
__net_timestamp() and use it in __skb_tstamp_tx().
The __net_timestamp() fast path usage in dev.c is changed to directly
call ktime_get_real() since the mono_delivery_time bit is not set at
that point.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d93376f5

net: Handle delivery_time in skb->tstamp during network tapping with af_packet · 27942a15

由 Martin KaFai Lau 提交于 3月 02, 2022

A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp
is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
skb_clear_tstamp() will then keep this delivery_time during forwarding.

This patch is to make the network tapping (with af_packet) to handle
the delivery_time stored in skb->tstamp.

Regardless of tapping at the ingress or egress,  the tapped skb is
received by the af_packet socket, so it is ingress to the af_packet
socket and it expects the (rcv) timestamp.

When tapping at egress, dev_queue_xmit_nit() is used.  It has already
expected skb->tstamp may have delivery_time,  so it does
skb_clone()+net_timestamp_set() to ensure the cloned skb has
the (rcv) timestamp before passing to the af_packet sk.
This patch only adds to clear the skb->mono_delivery_time
bit in net_timestamp_set().

When tapping at ingress, it currently expects the skb->tstamp is either 0
or the (rcv) timestamp.  Meaning, the tapping at ingress path
has already expected the skb->tstamp could be 0 and it will get
the (rcv) timestamp by ktime_get_real() when needed.

There are two cases for tapping at ingress:

One case is af_packet queues the skb to its sk_receive_queue.
The skb is either not shared or new clone created.  The newly
added skb_clear_delivery_time() is called to clear the
delivery_time (if any) and set the (rcv) timestamp if
needed before the skb is queued to the sk_receive_queue.

Another case, the ingress skb is directly copied to the rx_ring
and tpacket_get_timestamp() is used to get the (rcv) timestamp.
The newly added skb_tstamp() is used in tpacket_get_timestamp()
to check the skb->mono_delivery_time bit before returning skb->tstamp.
As mentioned earlier, the tapping@ingress has already expected
the skb may not have the (rcv) timestamp (because no sk has asked
for it) and has handled this case by directly calling ktime_get_real().
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27942a15

net: Add skb_clear_tstamp() to keep the mono delivery_time · de799101

由 Martin KaFai Lau 提交于 3月 02, 2022

Right now, skb->tstamp is reset to 0 whenever the skb is forwarded.

If skb->tstamp has the mono delivery_time, clearing it can hurt
the performance when it finally transmits out to fq@phy-dev.

The earlier patch added a skb->mono_delivery_time bit to
flag the skb->tstamp carrying the mono delivery_time.

This patch adds skb_clear_tstamp() helper which keeps
the mono delivery_time and clears everything else.

The delivery_time clearing will be postponed until the stack knows the
skb will be delivered locally.  It will be done in a latter patch.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de799101

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功