提交 · 932c1a8fc9e817de24faf7a1de919ff509ef1d17 · openeuler / Kernel

18 7月, 2023 1 次提交

net: core: Add a GID field to struct sock. · 932c1a8f

由 JofDiamonds 提交于 7月 17, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7LTRR
CVE: NA

Reference: https://gitee.com/openeuler/kernel/commit/f6740a11189620e5fd5ec0642c41b00f71b01689

--------------------------------

UID and GID are requested as filters for socketmap, but we can only get
UID from sock structure. This patch adds GID field to struct sock as UID.
Signed-off-by: NLu Wei <luwei32@huawei.com>
Signed-off-by: NLiu Jian <liujian56@huawei.com>
Conflicts:
	include/net/sock.h
	net/core/sock.c
Signed-off-by: NJofDiamonds <kwb0523@163.com>
Reviewed-by: Nwuchangye <wuchangye@huawei.com>

932c1a8f

10 5月, 2023 1 次提交

net: annotate sk->sk_err write from do_recvmmsg() · e05a5f51

由 Eric Dumazet 提交于 5月 09, 2023

do_recvmmsg() can write to sk->sk_err from multiple threads.

As said before, many other points reading or writing sk_err
need annotations.

Fixes: 34b88a68 ("net: Fix use after free in the recvmmsg exit path")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reviewed-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e05a5f51

19 4月, 2023 1 次提交

net: skbuff: hide wifi_acked when CONFIG_WIRELESS not set · eb6fba75

由 Jakub Kicinski 提交于 4月 17, 2023

Datacenter kernel builds will very likely not include WIRELESS,
so let them shave 2 bits off the skb by hiding the wifi fields.
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb6fba75

14 3月, 2023 1 次提交

net: socket: suppress unused warning · ad4bf5f2

由 Vincenzo Palazzo 提交于 3月 10, 2023

suppress unused warnings and fix the error that there is
with the W=1 enabled.

Warning generated

net/socket.c: In function ‘__sys_getsockopt’:
net/socket.c:2300:13: error: variable ‘max_optlen’ set but not used [-Werror=unused-but-set-variable]
 2300 |         int max_optlen;
Signed-off-by: NVincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230310221851.304657-1-vincenzopalazzodev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ad4bf5f2

09 3月, 2023 1 次提交

net: avoid double iput when sock_alloc_file fails · 649c15c7

由 Thadeu Lima de Souza Cascardo 提交于 3月 07, 2023

When sock_alloc_file fails to allocate a file, it will call sock_release.
__sys_socket_file should then not call sock_release again, otherwise there
will be a double free.

[   89.319884] ------------[ cut here ]------------
[   89.320286] kernel BUG at fs/inode.c:1764!
[   89.320656] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   89.321051] CPU: 7 PID: 125 Comm: iou-sqp-124 Not tainted 6.2.0+ #361
[   89.321535] RIP: 0010:iput+0x1ff/0x240
[   89.321808] Code: d1 83 e1 03 48 83 f9 02 75 09 48 81 fa 00 10 00 00 77 05 83 e2 01 75 1f 4c 89 ef e8 fb d2 ba 00 e9 80 fe ff ff c3 cc cc cc cc <0f> 0b 0f 0b e9 d0 fe ff ff 0f 0b eb 8d 49 8d b4 24 08 01 00 00 48
[   89.322760] RSP: 0018:ffffbdd60068bd50 EFLAGS: 00010202
[   89.323036] RAX: 0000000000000000 RBX: ffff9d7ad3cacac0 RCX: 0000000000001107
[   89.323412] RDX: 000000000003af00 RSI: 0000000000000000 RDI: ffff9d7ad3cacb40
[   89.323785] RBP: ffffbdd60068bd68 R08: ffffffffffffffff R09: ffffffffab606438
[   89.324157] R10: ffffffffacb3dfa0 R11: 6465686361657256 R12: ffff9d7ad3cacb40
[   89.324529] R13: 0000000080000001 R14: 0000000080000001 R15: 0000000000000002
[   89.324904] FS:  00007f7b28516740(0000) GS:ffff9d7aeb1c0000(0000) knlGS:0000000000000000
[   89.325328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   89.325629] CR2: 00007f0af52e96c0 CR3: 0000000002a02006 CR4: 0000000000770ee0
[   89.326004] PKRU: 55555554
[   89.326161] Call Trace:
[   89.326298]  <TASK>
[   89.326419]  __sock_release+0xb5/0xc0
[   89.326632]  __sys_socket_file+0xb2/0xd0
[   89.326844]  io_socket+0x88/0x100
[   89.327039]  ? io_issue_sqe+0x6a/0x430
[   89.327258]  io_issue_sqe+0x67/0x430
[   89.327450]  io_submit_sqes+0x1fe/0x670
[   89.327661]  io_sq_thread+0x2e6/0x530
[   89.327859]  ? __pfx_autoremove_wake_function+0x10/0x10
[   89.328145]  ? __pfx_io_sq_thread+0x10/0x10
[   89.328367]  ret_from_fork+0x29/0x50
[   89.328576] RIP: 0033:0x0
[   89.328732] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[   89.329073] RSP: 002b:0000000000000000 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9
[   89.329477] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7b28637a3d
[   89.329845] RDX: 00007fff4e4318a8 RSI: 00007fff4e4318b0 RDI: 0000000000000400
[   89.330216] RBP: 00007fff4e431830 R08: 00007fff4e431711 R09: 00007fff4e4318b0
[   89.330584] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff4e441b38
[   89.330950] R13: 0000563835e3e725 R14: 0000563835e40d10 R15: 00007f7b28784040
[   89.331318]  </TASK>
[   89.331441] Modules linked in:
[   89.331617] ---[ end trace 0000000000000000 ]---

Fixes: da214a47 ("net: add __sys_socket_file()")
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230307173707.468744-1-cascardo@canonical.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

649c15c7

15 2月, 2023 1 次提交

net: use a bounce buffer for copying skb->mark · 2558b803

由 Eric Dumazet 提交于 2月 13, 2023

syzbot found arm64 builds would crash in sock_recv_mark()
when CONFIG_HARDENED_USERCOPY=y

x86 and powerpc are not detecting the issue because
they define user_access_begin.
This will be handled in a different patch,
because a check_object_size() is missing.

Only data from skb->cb[] can be copied directly to/from user space,
as explained in commit 79a8a642 ("net: Whitelist
the skbuff_head_cache "cb" field")

syzbot report was:
usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_head_cache' (offset 168, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:102 !
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 4410 Comm: syz-executor533 Not tainted 6.2.0-rc7-syzkaller-17907-g2d3827b3f393 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usercopy_abort+0x90/0x94 mm/usercopy.c:90
lr : usercopy_abort+0x90/0x94 mm/usercopy.c:90
sp : ffff80000fb9b9a0
x29: ffff80000fb9b9b0 x28: ffff0000c6073400 x27: 0000000020001a00
x26: 0000000000000014 x25: ffff80000cf52000 x24: fffffc0000000000
x23: 05ffc00000000200 x22: fffffc000324bf80 x21: ffff0000c92fe1a8
x20: 0000000000000001 x19: 0000000000000004 x18: 0000000000000000
x17: 656a626f2042554c x16: ffff0000c6073dd0 x15: ffff80000dbd2118
x14: ffff0000c6073400 x13: 00000000ffffffff x12: ffff0000c6073400
x11: ff808000081bbb4c x10: 0000000000000000 x9 : 7b0572d7cc0ccf00
x8 : 7b0572d7cc0ccf00 x7 : ffff80000bf650d4 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000000
x2 : ffff0001fefbff08 x1 : 0000000100000000 x0 : 000000000000006c
Call trace:
usercopy_abort+0x90/0x94 mm/usercopy.c:90
__check_heap_object+0xa8/0x100 mm/slub.c:4761
check_heap_object mm/usercopy.c:196 [inline]
__check_object_size+0x208/0x6b8 mm/usercopy.c:251
check_object_size include/linux/thread_info.h:199 [inline]
__copy_to_user include/linux/uaccess.h:115 [inline]
put_cmsg+0x408/0x464 net/core/scm.c:238
sock_recv_mark net/socket.c:975 [inline]
__sock_recv_cmsgs+0x1fc/0x248 net/socket.c:984
sock_recv_cmsgs include/net/sock.h:2728 [inline]
packet_recvmsg+0x2d8/0x678 net/packet/af_packet.c:3482
____sys_recvmsg+0x110/0x3a0
___sys_recvmsg net/socket.c:2737 [inline]
__sys_recvmsg+0x194/0x210 net/socket.c:2767
__do_sys_recvmsg net/socket.c:2777 [inline]
__se_sys_recvmsg net/socket.c:2774 [inline]
__arm64_sys_recvmsg+0x2c/0x3c net/socket.c:2774
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x64/0x178 arch/arm64/kernel/syscall.c:52
el0_svc_common+0xbc/0x180 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x110 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x14c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: 91388800 aa0903e1 f90003e8 94e6d752 (d4210000)

Fixes: 6fd1d51c ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Erin MacNeil <lnx.erin@gmail.com>
Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230213160059.3829741-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

2558b803

19 1月, 2023 2 次提交

fs: port xattr to mnt_idmap · 39f60c1c

由 Christian Brauner 提交于 1月 13, 2023

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner (Microsoft) <brauner@kernel.org>

39f60c1c

fs: port ->setattr() to pass mnt_idmap · c1632a0f

由 Christian Brauner 提交于 1月 13, 2023

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner (Microsoft) <brauner@kernel.org>

c1632a0f

13 1月, 2023 1 次提交

sock: add tracepoint for send recv length · 6e6eda44

由 Yunhui Cui 提交于 1月 11, 2023

Add 2 tracepoints to monitor the tcp/udp traffic
of per process and per cgroup.

Regarding monitoring the tcp/udp traffic of each process, there are two
existing solutions, the first one is https://www.atoptool.nl/netatop.php.
The second is via kprobe/kretprobe.

Netatop solution is implemented by registering the hook function at the
hook point provided by the netfilter framework.

These hook functions may be in the soft interrupt context and cannot
directly obtain the pid. Some data structures are added to bind packets
and processes. For example, struct taskinfobucket, struct taskinfo ...

Every time the process sends and receives packets it needs multiple
hashmaps,resulting in low performance and it has the problem fo inaccurate
tcp/udp traffic statistics(for example: multiple threads share sockets).

We can obtain the information with kretprobe, but as we know, kprobe gets
the result by trappig in an exception, which loses performance compared
to tracepoint.

We compared the performance of tracepoints with the above two methods, and
the results are as follows:

ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
without trace:
Time per request: 39.660 [ms] (mean)
Time per request: 0.040 [ms] (mean, across all concurrent requests)

netatop:
Time per request: 50.717 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)

kr:
Time per request: 43.168 [ms] (mean)
Time per request: 0.043 [ms] (mean, across all concurrent requests)

tracepoint:
Time per request: 41.004 [ms] (mean)
Time per request: 0.041 [ms] (mean, across all concurrent requests

It can be seen that tracepoint has better performance.
Signed-off-by: NYunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: NXiongchun Duan <duanxiongchun@bytedance.com>
Reviewed-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e6eda44

26 11月, 2022 1 次提交

use less confusing names for iov_iter direction initializers · de4eda9d

由 Al Viro 提交于 9月 15, 2022

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

de4eda9d

24 10月, 2022 1 次提交

net: introduce and use custom sockopt socket flag · a5ef058d

由 Paolo Abeni 提交于 10月 20, 2022

We will soon introduce custom setsockopt for UDP sockets, too.
Instead of doing even more complex arbitrary checks inside
sock_use_custom_sol_socket(), add a new socket flag and set it
for the relevant socket types (currently only MPTCP).
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5ef058d

24 8月, 2022 1 次提交

net: Fix a data-race around sysctl_somaxconn. · 3c9ba81d

由 Kuniyuki Iwashima 提交于 8月 23, 2022

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c9ba81d

20 8月, 2022 1 次提交
- A
  dynamic_dname(): drop unused dentry argument · 0f60d288
  由 Al Viro 提交于 1月 30, 2022
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0f60d288
25 7月, 2022 1 次提交

net: copy from user before calling __copy_msghdr · 7fa875b8

由 Dylan Yudaken 提交于 7月 14, 2022

this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7fa875b8

20 7月, 2022 1 次提交

skbuff: carry external ubuf_info in msghdr · 7c701d92

由 Pavel Begunkov 提交于 7月 12, 2022

Make possible for network in-kernel callers like io_uring to pass in a
custom ubuf_info by setting it in a new field of struct msghdr.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

7c701d92

24 6月, 2022 1 次提交

net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user() · 1228b34c

由 Eric Dumazet 提交于 6月 22, 2022

syzbot reported uninit-value in tcp_recvmsg() [1]

Issue here is that msg->msg_get_inq should have been cleared,
otherwise tcp_recvmsg() might read garbage and perform
more work than needed, or have undefined behavior.

Given CONFIG_INIT_STACK_ALL_ZERO=y is probably going to be
the default soon, I chose to change __sys_recvfrom() to clear
all fields but msghdr.addr which might be not NULL.

For __copy_msghdr_from_user(), I added an explicit clear
of kmsg->msg_get_inq.

[1]
BUG: KMSAN: uninit-value in tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
inet_recvmsg+0x13a/0x5a0 net/ipv4/af_inet.c:850
sock_recvmsg_nosec net/socket.c:995 [inline]
sock_recvmsg net/socket.c:1013 [inline]
__sys_recvfrom+0x696/0x900 net/socket.c:2176
__do_sys_recvfrom net/socket.c:2194 [inline]
__se_sys_recvfrom net/socket.c:2190 [inline]
__x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Local variable msg created at:
__sys_recvfrom+0x81/0x900 net/socket.c:2154
__do_sys_recvfrom net/socket.c:2194 [inline]
__se_sys_recvfrom net/socket.c:2190 [inline]
__x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190

CPU: 0 PID: 3493 Comm: syz-executor170 Not tainted 5.19.0-rc3-syzkaller-30868-g4b28366af7d9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: f94fd25c ("tcp: pass back data left in socket after receive")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Tested-by: Alexander Potapenko<glider@google.com>
Link: https://lore.kernel.org/r/20220622150220.1091182-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

1228b34c

13 6月, 2022 1 次提交

net: make __sys_accept4_file() static · c0424532

由 Yajun Deng 提交于 6月 10, 2022

__sys_accept4_file() isn't used outside of the file, make it static.

As the same time, move file_flags and nofile parameters into
__sys_accept4_file().
Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0424532

10 5月, 2022 3 次提交

ptp: Support late timestamp determination · 97dc7cd9

由 Gerhard Engleder 提交于 5月 06, 2022

If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.

Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.

This mechanism is intended for RX, but TX use is also possible.
Signed-off-by: NGerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>

97dc7cd9

ptp: Pass hwtstamp to ptp_convert_timestamp() · d58809d8

由 Gerhard Engleder 提交于 5月 06, 2022

ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.

Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: NGerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>

d58809d8

ptp: Request cycles for TX timestamp · 51eb7492

由 Gerhard Engleder 提交于 5月 06, 2022

The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.

Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.
Signed-off-by: NGerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>

51eb7492

29 4月, 2022 1 次提交

net: SO_RCVMARK socket option for SO_MARK with recvmsg() · 6fd1d51c

由 Erin MacNeil 提交于 4月 27, 2022

Adding a new socket option, SO_RCVMARK, to indicate that SO_MARK
should be included in the ancillary data returned by recvmsg().

Renamed the sock_recv_ts_and_drops() function to sock_recv_cmsgs().
Signed-off-by: NErin MacNeil <lnx.erin@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20220427200259.2564-1-lnx.erin@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

6fd1d51c

25 4月, 2022 1 次提交

net: add __sys_socket_file() · da214a47

由 Jens Axboe 提交于 4月 12, 2022

This works like __sys_socket(), except instead of allocating and
returning a socket fd, it just returns the file associated with the
socket. No fd is installed into the process file table.

This is similar to do_accept(), and allows io_uring to use this without
instantiating a file descriptor in the process file table.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/20220412202240.234207-2-axboe@kernel.dk

da214a47

23 3月, 2022 1 次提交

fs: allocate inode by using alloc_inode_sb() · fd60b288

由 Muchun Song 提交于 3月 22, 2022

The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>		[ext4]
Acked-by: NRoman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd60b288

14 2月, 2022 1 次提交

net: fix documentation for kernel_getsockname · 0fc95dec

由 Alex Maydanik 提交于 2月 12, 2022

Fixes return value documentation of kernel_getsockname()
and kernel_getpeername() functions.

The previous documentation wrongly specified that the return
value is 0 in case of success, however sock->ops->getname returns
the length of the address in bytes in case of success.
Signed-off-by: NAlex Maydanik <alexander.maydanik@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fc95dec

06 1月, 2022 1 次提交

net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets · 007747a9

由 Miroslav Lichvar 提交于 1月 05, 2022

When multiple sockets using the SOF_TIMESTAMPING_BIND_PHC flag received
a packet with a hardware timestamp (e.g. multiple PTP instances in
different PTP domains using the UDPv4/v6 multicast or L2 transport),
the timestamps received on some sockets were corrupted due to repeated
conversion of the same timestamp (by the same or different vclocks).

Fix ptp_convert_timestamp() to not modify the shared skb timestamp
and return the converted timestamp as a ktime_t instead. If the
conversion fails, return 0 to not confuse the application with
timestamps corresponding to an unexpected PHC.

Fixes: d7c08826 ("net: socket: support hardware timestamp conversion to PHC bound")
Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007747a9

02 1月, 2022 1 次提交

net: socket.c: style fix · e44ef1d4

由 Hamish MacDonald 提交于 1月 01, 2022

Removed spaces and added a tab that was causing an error on checkpatch
Signed-off-by: NHamish MacDonald <elusivenode@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e44ef1d4

27 12月, 2021 1 次提交

net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode · fd3a4590

由 Remi Pommarel 提交于 12月 24, 2021

In compat mode SIOC{G,S}IFBR ioctls were only supporting
BRCTL_GET_VERSION returning an artificially version to spur userland
tool to use SIOCDEVPRIVATE instead. But some userland tools ignore that
and use SIOC{G,S}IFBR unconditionally as seen with busybox's brctl.

Example of non working 32-bit brctl with CONFIG_COMPAT=y:
$ brctl show
brctl: SIOCGIFBR: Invalid argument

Example of fixed 32-bit brctl with CONFIG_COMPAT=y:
$ brctl show
bridge name     bridge id               STP enabled     interfaces
br0
Signed-off-by: NRemi Pommarel <repk@triplefau.lt>
Co-developed-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd3a4590

17 12月, 2021 1 次提交

add missing bpf-cgroup.h includes · aef2feda

由 Jakub Kicinski 提交于 12月 15, 2021

We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency,
make sure those who actually need more than the definition of
struct cgroup_bpf include bpf-cgroup.h explicitly.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org

aef2feda

27 8月, 2021 1 次提交

net: don't unconditionally copy_from_user a struct ifreq for socket ioctls · d0efb162

由 Peter Collingbourne 提交于 8月 26, 2021

A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).

Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.

Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.

Fixes: 44c02a2c ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4dSigned-off-by: NPeter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0efb162

25 8月, 2021 1 次提交

net: add accept helper not installing fd · d32f89da

由 Pavel Begunkov 提交于 8月 25, 2021

Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d32f89da

29 7月, 2021 1 次提交

mctp: Add MCTP base · bc49d816

由 Jeremy Kerr 提交于 7月 29, 2021

Add basic Kconfig, an initial (empty) af_mctp source object, and
{AF,PF}_MCTP definitions, and the required definitions for a new
protocol type.
Signed-off-by: NJeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc49d816

28 7月, 2021 3 次提交

net: bridge: move bridge ioctls out of .ndo_do_ioctl · ad2f99ae

由 Arnd Bergmann 提交于 7月 27, 2021

Working towards obsoleting the .ndo_do_ioctl operation entirely,
stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands
into this callback.

My first attempt was to add another ndo_siocbr() callback, but
as there is only a single driver that takes these commands and
there is already a hook mechanism to call directly into this
driver, extend this hook instead, and use it for both the
deviceless and the device specific ioctl commands.

Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad2f99ae

net: socket: return changed ifreq from SIOCDEVPRIVATE · 88fc023f

由 Arnd Bergmann 提交于 7月 27, 2021

Some drivers that use SIOCDEVPRIVATE ioctl commands modify
the ifreq structure and expect it to be passed back to user
space, which has never really happened for compat mode
because the calling these drivers through ndo_do_ioctl
requires overwriting the ifr_data pointer.

Now that all drivers are converted to ndo_siocdevprivate,
change it to handle this correctly in both compat and
native mode.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88fc023f

dev_ioctl: pass SIOCDEVPRIVATE data separately · a554bf96

由 Arnd Bergmann 提交于 7月 27, 2021

The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that
passes data as part of struct ifreq rather than as an ifr_data pointer, or
that passes data back this way, since the compat_ifr_data_ioctl() helper
overwrites the ifr_data pointer and does not copy anything back out.

Since all drivers using devprivate commands are now converted to the
new .ndo_siocdevprivate callback, fix this by adding the missing piece
and passing the pointer separately the whole way.

This further unifies the native and compat logic for socket ioctls,
as the new code now passes the correct pointer as well as the correct
data for both native and compat ioctls.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a554bf96

23 7月, 2021 4 次提交

net: socket: rework compat_ifreq_ioctl() · 29c49648

由 Arnd Bergmann 提交于 7月 22, 2021

compat_ifreq_ioctl() is one of the last users of copy_in_user() and
compat_alloc_user_space(), as it attempts to convert the 'struct ifreq'
arguments from 32-bit to 64-bit format as used by dev_ioctl() and a
couple of socket family specific interpretations.

The current implementation works correctly when calling dev_ioctl(),
inet_ioctl(), ieee802154_sock_ioctl(), atalk_ioctl(), qrtr_ioctl()
and packet_ioctl(). The ioctl handlers for x25, netrom, rose and x25 do
not interpret the arguments and only block the corresponding commands,
so they do not care.

For af_inet6 and af_decnet however, the compat conversion is slightly
incorrect, as it will copy more data than the native handler accesses,
both of them use a structure that is shorter than ifreq.

Replace the copy_in_user() conversion with a pair of accessor functions
to read and write the ifreq data in place with the correct length where
needed, while leaving the other ones to copy the (already compatible)
structures directly.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29c49648

net: socket: simplify dev_ifconf handling · 876f0bf9

由 Arnd Bergmann 提交于 7月 22, 2021

The dev_ifconf() calling conventions make compat handling
more complicated than necessary, simplify this by moving
the in_compat_syscall() check into the function.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

876f0bf9

net: socket: rework SIOC?IFMAP ioctls · 709566d7

由 Arnd Bergmann 提交于 7月 22, 2021

SIOCGIFMAP and SIOCSIFMAP currently require compat_alloc_user_space()
and copy_in_user() for compat mode.

Move the compat handling into the location where the structures are
actually used, to avoid using those interfaces and get a clearer
implementation.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

709566d7

ethtool: improve compat ioctl handling · dd98d289

由 Arnd Bergmann 提交于 7月 22, 2021

The ethtool compat ioctl handling is hidden away in net/socket.c,
which introduces a couple of minor oddities:

- The implementation may end up diverging, as seen in the RXNFC
  extension in commit 84a1d9c4 ("net: ethtool: extend RXNFC
  API to support RSS spreading of filter matches") that does not work
  in compat mode.

- Most architectures do not need the compat handling at all
  because u64 and compat_u64 have the same alignment.

- On x86, the conversion is done for both x32 and i386 user space,
  but it's actually wrong to do it for x32 and cannot work there.

- On 32-bit Arm, it never worked for compat oabi user space, since
  that needs to do the same conversion but does not.

- It would be nice to get rid of both compat_alloc_user_space()
  and copy_in_user() throughout the kernel.

None of these actually seems to be a serious problem that real
users are likely to encounter, but fixing all of them actually
leads to code that is both shorter and more readable.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd98d289

02 7月, 2021 1 次提交

net: socket: support hardware timestamp conversion to PHC bound · d7c08826

由 Yangbo Lu 提交于 6月 30, 2021

This patch is to support hardware timestamp conversion to
PHC bound. This applies to both RX and TX since their skb
handling (for TX, it's skb clone in error queue) all goes
through __sock_recv_timestamp.
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7c08826

22 6月, 2021 1 次提交

net: add pf_family_names[] for protocol family · fe0bdbde

由 Yejune Deng 提交于 6月 21, 2021

Modify the pr_info content from int to char * in sock_register() and
sock_unregister(), this looks more readable.

Fixed build error in ARCH=sparc64.
Signed-off-by: NYejune Deng <yejune.deng@gmail.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe0bdbde

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功