提交 · d8e464ecc17b4444e9a3e148a9748c4828c6328c · openeuler / Kernel

26 11月, 2019 1 次提交

vfs: mark pipes and sockets as stream-like file descriptors · d8e464ec

由 Linus Torvalds 提交于 11月 17, 2019

In commit 3975b097 ("convert stream-like files -> stream_open, even
if they use noop_llseek") Kirill used a coccinelle script to change
"nonseekable_open()" to "stream_open()", which changed the trivial cases
of stream-like file descriptors to the new model with FMODE_STREAM.

However, the two big cases - sockets and pipes - don't actually have
that trivial pattern at all, and were thus never converted to
FMODE_STREAM even though it makes lots of sense to do so.

That's particularly true when looking forward to the next change:
getting rid of FMODE_ATOMIC_POS entirely, and just using FMODE_STREAM to
decide whether f_pos updates are needed or not.  And if they are, we'll
always do them atomically.

This came up because KCSAN (correctly) noted that the non-locked f_pos
updates are data races: they are clearly benign for the case where we
don't care, but it would be good to just not have that issue exist at
all.

Note that the reason we used FMODE_ATOMIC_POS originally is that only
doing it for the minimal required case is "safer" in that it's possible
that the f_pos locking can cause unnecessary serialization across the
whole write() call.  And in the worst case, that kind of serialization
can cause deadlock issues: think writers that need readers to empty the
state using the same file descriptor.

[ Note that the locking is per-file descriptor - because it protects
  "f_pos", which is obviously per-file descriptor - so it only affects
  cases where you literally use the same file descriptor to both read
  and write.

  So a regular pipe that has separate reading and writing file
  descriptors doesn't really have this situation even though it's the
  obvious case of "reader empties what a bit writer concurrently fills"

  But we want to make pipes as being stream-line anyway, because we
  don't want the unnecessary overhead of locking, and because a named
  pipe can be (ab-)used by reading and writing to the same file
  descriptor. ]

There are likely a lot of other cases that might want FMODE_STREAM, and
looking for ".llseek = no_llseek" users and other cases that don't have
an lseek file operation at all and making them use "stream_open()" might
be a good idea.  But pipes and sockets are likely to be the two main
cases.

Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Eic Dumazet <edumazet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Marco Elver <elver@google.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Paul McKenney <paulmck@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8e464ec

10 7月, 2019 2 次提交

io_uring: add support for recvmsg() · aa1fa28f

由 Jens Axboe 提交于 4月 19, 2019

This is done through IORING_OP_RECVMSG. This opcode uses the same
sqe->msg_flags that IORING_OP_SENDMSG added, and we pass in the
msghdr struct in the sqe->addr field as well.

We use MSG_DONTWAIT to force an inline fast path if recvmsg() doesn't
block, and punt to async execution if it would have.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aa1fa28f

io_uring: add support for sendmsg() · 0fa03c62

由 Jens Axboe 提交于 4月 19, 2019

This is done through IORING_OP_SENDMSG. There's a new sqe->msg_flags
for the flags argument, and the msghdr struct is passed in the
sqe->addr field.

We use MSG_DONTWAIT to force an inline fast path if sendmsg() doesn't
block, and punt to async execution if it would have.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0fa03c62

09 7月, 2019 2 次提交

coallocate socket_wq with socket itself · 333f7909

由 Al Viro 提交于 7月 05, 2019

socket->wq is assign-once, set when we are initializing both
struct socket it's in and struct socket_wq it points to.  As the
matter of fact, the only reason for separate allocation was the
ability to RCU-delay freeing of socket_wq.  RCU-delaying the
freeing of socket itself gets rid of that need, so we can just
fold struct socket_wq into the end of struct socket and simplify
the life both for sock_alloc_inode() (one allocation instead of
two) and for tun/tap oddballs, where we used to embed struct socket
and struct socket_wq into the same structure (now - embedding just
the struct socket).

Note that reference to struct socket_wq in struct sock does remain
a reference - that's unchanged.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

333f7909

sockfs: switch to ->free_inode() · 6d7855c5

由 Al Viro 提交于 7月 05, 2019

we do have an RCU-delayed part there already (freeing the wq),
so it's not like the pipe situation; moreover, it might be
worth considering coallocating wq with the rest of struct sock_alloc.
->sk_wq in struct sock would remain a pointer as it is, but
the object it normally points to would be coallocated with
struct socket...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d7855c5

04 7月, 2019 1 次提交

net: adjust socket level ICW to cope with ipv6 variant of {recv, send}msg · a648a592

由 Paolo Abeni 提交于 7月 03, 2019

After the previous patch we have ipv{6,4} variants for {recv,send}msg,
we should use the generic _INET ICW variant to call into the proper
build-in.

This also allows dropping the now unused and rather ugly _INET4 ICW macro

v1 -> v2:
 - use ICW macro to declare inet6_{recv,send}msg
 - fix a couple of checkpatch offender in the code context
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a648a592

28 6月, 2019 1 次提交

bpf: implement getsockopt and setsockopt hooks · 0d01da6a

由 Stanislav Fomichev 提交于 6月 27, 2019

Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
passing them down to the kernel or bypass kernel completely.
BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
kernel returns.
Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

The buffer memory is pre-allocated (because I don't think there is
a precedent for working with __user memory from bpf). This might be
slow to do for each {s,g}etsockopt call, that's why I've added
__cgroup_bpf_prog_array_is_empty that exits early if there is nothing
attached to a cgroup. Note, however, that there is a race between
__cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
program layout might have changed; this should not be a problem
because in general there is a race between multiple calls to
{s,g}etsocktop and user adding/removing bpf progs from a cgroup.

The return code of the BPF program is handled as follows:
* 0: EPERM
* 1: success, continue with next BPF program in the cgroup chain

v9:
* allow overwriting setsockopt arguments (Alexei Starovoitov):
  * use set_fs (same as kernel_setsockopt)
  * buffer is always kzalloc'd (no small on-stack buffer)

v8:
* use s32 for optlen (Andrii Nakryiko)

v7:
* return only 0 or 1 (Alexei Starovoitov)
* always run all progs (Alexei Starovoitov)
* use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
  (decided to use optval=-1 instead, optval=0 might be a valid input)
* call getsockopt hook after kernel handlers (Alexei Starovoitov)

v6:
* rework cgroup chaining; stop as soon as bpf program returns
  0 or 2; see patch with the documentation for the details
* drop Andrii's and Martin's Acked-by (not sure they are comfortable
  with the new state of things)

v5:
* skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

v4:
* don't export bpf_sk_fullsock helper (Martin Lau)
* size != sizeof(__u64) for uapi pointers (Martin Lau)
* offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

v3:
* typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
* reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
  Nakryiko)
* use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
* use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
* new CG_SOCKOPT_ACCESS macro to wrap repeated parts

v2:
* moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
* aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
* bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
* added [0,2] return code check to verifier (Martin Lau)
* dropped unused buf[64] from the stack (Martin Lau)
* use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
* dropped bpf_target_off from ctx rewrites (Martin Lau)
* use return code for kernel bypass (Martin Lau & Andrii Nakryiko)

Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

0d01da6a

06 6月, 2019 1 次提交

net: socket: drop unneeded likely() call around IS_ERR() · 4546e44c

由 Enrico Weigelt 提交于 6月 05, 2019

IS_ERR() already calls unlikely(), so this extra likely() call
around the !IS_ERR() is not needed.
Signed-off-by: NEnrico Weigelt <info@metux.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4546e44c

01 6月, 2019 1 次提交

uio: make import_iovec()/compat_import_iovec() return bytes on success · 87e5e6da

由 Jens Axboe 提交于 5月 14, 2019

Currently these functions return < 0 on error, and 0 for success.
Change that so that we return < 0 on error, but number of bytes
for success.

Some callers already treat the return value that way, others need a
slight tweak.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87e5e6da

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd

由 Thomas Gleixner 提交于 5月 27, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2874c5fd

26 5月, 2019 2 次提交

vfs: Convert sockfs to use the new mount API · fba9be49

由 David Howells 提交于 3月 25, 2019

Convert the sockfs filesystem to the new internal mount API as the old
one will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fba9be49

mount_pseudo(): drop 'name' argument, switch to d_make_root() · 1f58bb18

由 Al Viro 提交于 5月 20, 2019

Once upon a time we used to set ->d_name of e.g. pipefs root
so that d_path() on pipes would work.  These days it's
completely pointless - dentries of pipes are not even connected
to pipefs root.  However, mount_pseudo() had set the root
dentry name (passed as the second argument) and callers
kept inventing names to pass to it.  Including those that
didn't *have* any non-root dentries to start with...

All of that had been pointless for about 8 years now; it's
time to get rid of that cargo-culting...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1f58bb18

20 5月, 2019 1 次提交

net: fix kernel-doc warnings for socket.c · 85806af0

由 Randy Dunlap 提交于 5月 18, 2019

Fix kernel-doc warnings by moving the kernel-doc notation to be
immediately above the functions that it describes.

Fixes these warnings for sock_sendmsg() and sock_recvmsg():

../net/socket.c:658: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE'
../net/socket.c:658: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE'
../net/socket.c:889: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE'
../net/socket.c:889: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE'
../net/socket.c:889: warning: Excess function parameter 'flags' description in 'INDIRECT_CALLABLE_DECLARE'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85806af0

06 5月, 2019 1 次提交

net: use indirect calls helpers at the socket layer · 8c3c447b

由 Paolo Abeni 提交于 5月 03, 2019

This avoids an indirect call per {send,recv}msg syscall in
the common (IPv6 or IPv4 socket) case.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c3c447b

26 4月, 2019 1 次提交

net: socket: Fix missing break in switch statement · 60747828

由 Gustavo A. R. Silva 提交于 4月 24, 2019

Add missing break statement in order to prevent the code from falling
through to cases SIOCGSTAMP_NEW and SIOCGSTAMPNS_NEW.

This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: 0768e170 ("net: socket: implement 64-bit timestamps")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60747828

20 4月, 2019 2 次提交

net: socket: implement 64-bit timestamps · 0768e170

由 Arnd Bergmann 提交于 4月 17, 2019

The 'timeval' and 'timespec' data structures used for socket timestamps
are going to be redefined in user space based on 64-bit time_t in future
versions of the C library to deal with the y2038 overflow problem,
which breaks the ABI definition.

Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not
use the _IOR() macro to encode the size of the transferred data, so it
remains ambiguous whether the application uses the old or new layout.

The best workaround I could find is rather ugly: we redefine the command
code based on the size of the respective data structure with a ternary
operator. This lets it get evaluated as late as possible, hopefully after
that structure is visible to the caller. We cannot use an #ifdef here,
because inux/sockios.h might have been included before any libc header
that could determine the size of time_t.

The ioctl implementation now interprets the new command codes as always
referring to the 64-bit structure on all architectures, while the old
architecture specific command code still refers to the old architecture
specific layout. The new command number is only used when they are
actually different.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0768e170

net: rework SIOCGSTAMP ioctl handling · c7cbdbf2

由 Arnd Bergmann 提交于 4月 17, 2019

The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
socket protocol handlers, and all of those end up calling the same
sock_get_timestamp()/sock_get_timestampns() helper functions, which
results in a lot of duplicate code.

With the introduction of 64-bit time_t on 32-bit architectures, this
gets worse, as we then need four different ioctl commands in each
socket protocol implementation.

To simplify that, let's add a new .gettstamp() operation in
struct proto_ops, and move ioctl implementation into the common
sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
through.

We can reuse the sock_get_timestamp() implementation, but generalize
it so it can deal with both native and compat mode, as well as
timeval and timespec structures.
Acked-by: NStefan Schmidt <stefan@datenfreihafen.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7cbdbf2

16 3月, 2019 1 次提交

net: add documentation to socket.c · 8a3c245c

由 Pedro Tammela 提交于 3月 14, 2019

Adds missing sphinx documentation to the
socket.c's functions. Also fixes some whitespaces.

I also changed the style of older documentation as an
effort to have an uniform documentation style.
Signed-off-by: NPedro Tammela <pctammela@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a3c245c

26 2月, 2019 1 次提交

net: socket: set sock->sk to NULL after calling proto_ops::release() · ff7b11aa

由 Eric Biggers 提交于 2月 21, 2019

Commit 9060cb71 ("net: crypto set sk to NULL when af_alg_release.")
fixed a use-after-free in sockfs_setattr() when an AF_ALG socket is
closed concurrently with fchownat().  However, it ignored that many
other proto_ops::release() methods don't set sock->sk to NULL and
therefore allow the same use-after-free:

    - base_sock_release
    - bnep_sock_release
    - cmtp_sock_release
    - data_sock_release
    - dn_release
    - hci_sock_release
    - hidp_sock_release
    - iucv_sock_release
    - l2cap_sock_release
    - llcp_sock_release
    - llc_ui_release
    - rawsock_release
    - rfcomm_sock_release
    - sco_sock_release
    - svc_release
    - vcc_release
    - x25_release

Rather than fixing all these and relying on every socket type to get
this right forever, just make __sock_release() set sock->sk to NULL
itself after calling proto_ops::release().

Reproducer that produces the KASAN splat when any of these socket types
are configured into the kernel:

    #include <pthread.h>
    #include <stdlib.h>
    #include <sys/socket.h>
    #include <unistd.h>

    pthread_t t;
    volatile int fd;

    void *close_thread(void *arg)
    {
        for (;;) {
            usleep(rand() % 100);
            close(fd);
        }
    }

    int main()
    {
        pthread_create(&t, NULL, close_thread, NULL);
        for (;;) {
            fd = socket(rand() % 50, rand() % 11, 0);
            fchownat(fd, "", 1000, 1000, 0x1000);
            close(fd);
        }
    }

Fixes: 86741ec2 ("net: core: Add a UID field to struct sock.")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff7b11aa

04 2月, 2019 4 次提交

socket: Add SO_TIMESTAMPING_NEW · 9718475e

由 Deepa Dinamani 提交于 2月 02, 2019

Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
for all architectures.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: rth@twiddle.net
Cc: tglx@linutronix.de
Cc: ubraun@linux.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9718475e

socket: Add SO_TIMESTAMP[NS]_NEW · 887feae3

由 Deepa Dinamani 提交于 2月 02, 2019

Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
socket timestamp options.
These are the y2038 safe versions of the SO_TIMESTAMP_OLD
and SO_TIMESTAMPNS_OLD for all architectures.

Note that the format of scm_timestamping.ts[0] is not changed
in this patch.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

887feae3

socket: Use old_timeval types for socket timestamps · 13c6ee2a

由 Deepa Dinamani 提交于 2月 02, 2019

As part of y2038 solution, all internal uses of
struct timeval are replaced by struct __kernel_old_timeval
and struct compat_timeval by struct old_timeval32.
Make socket timestamps use these new types.

This is mainly to be able to verify that the kernel build
is y2038 safe when such non y2038 safe types are not
supported anymore.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: isdn@linux-pingi.de
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13c6ee2a

sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD · 7f1bc6e9

由 Deepa Dinamani 提交于 2月 02, 2019

SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
way they are currently defined, are not y2038 safe.
Subsequent patches in the series add new y2038 safe versions
of these options which provide 64 bit timestamps on all
architectures uniformly.
Hence, rename existing options with OLD tag suffixes.

Also note that kernel will not use the untagged SO_TIMESTAMP*
and SCM_TIMESTAMP* options internally anymore.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: deller@gmx.de
Cc: dhowells@redhat.com
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-afs@lists.infradead.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f1bc6e9

31 1月, 2019 4 次提交

net: socket: make bond ioctls go through compat_ifreq_ioctl() · 98406133

由 Johannes Berg 提交于 1月 25, 2019

Same story as before, these use struct ifreq and thus need
to be read with the shorter version to not cause faults.

Cc: stable@vger.kernel.org
Fixes: f92d4fc9 ("kill bond_ioctl()")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98406133

net: socket: fix SIOCGIFNAME in compat · c6c9fee3

由 Johannes Berg 提交于 1月 25, 2019

As reported by Robert O'Callahan in
https://bugzilla.kernel.org/show_bug.cgi?id=202273
reverting the previous changes in this area broke
the SIOCGIFNAME ioctl in compat again (I'd previously
fixed it after his previous report of breakage in
https://bugzilla.kernel.org/show_bug.cgi?id=199469).

This is obviously because I fixed SIOCGIFNAME more or
less by accident.

Fix it explicitly now by making it pass through the
restored compat translation code.

Cc: stable@vger.kernel.org
Fixes: 4cf808e7 ("kill dev_ifname32()")
Reported-by: NRobert O'Callahan <robert@ocallahan.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6c9fee3

Revert "kill dev_ifsioc()" · 37ac39bd

由 Johannes Berg 提交于 1月 25, 2019

This reverts commit bf440573 ("kill dev_ifsioc()").

This wasn't really unused as implied by the original commit,
it still handles the copy to/from user differently, and the
commit thus caused issues such as
  https://bugzilla.kernel.org/show_bug.cgi?id=199469
and
  https://bugzilla.kernel.org/show_bug.cgi?id=202273

However, deviating from a strict revert, rename dev_ifsioc()
to compat_ifreq_ioctl() to be clearer as to its purpose and
add a comment.

Cc: stable@vger.kernel.org
Fixes: bf440573 ("kill dev_ifsioc()")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37ac39bd

Revert "socket: fix struct ifreq size in compat ioctl" · 63ff03ab

由 Johannes Berg 提交于 1月 25, 2019

This reverts commit 1cebf8f1 ("socket: fix struct ifreq
size in compat ioctl"), it's a bugfix for another commit that
I'll revert next.

This is not a 'perfect' revert, I'm keeping some coding style
intact rather than revert to the state with indentation errors.

Cc: stable@vger.kernel.org
Fixes: 1cebf8f1 ("socket: fix struct ifreq size in compat ioctl")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63ff03ab

18 12月, 2018 1 次提交

y2038: socket: Add compat_sys_recvmmsg_time64 · e11d4284

由 Arnd Bergmann 提交于 4月 18, 2018

recvmmsg() takes two arguments to pointers of structures that differ
between 32-bit and 64-bit architectures: mmsghdr and timespec.

For y2038 compatbility, we are changing the native system call from
timespec to __kernel_timespec with a 64-bit time_t (in another patch),
and use the existing compat system call on both 32-bit and 64-bit
architectures for compatibility with traditional 32-bit user space.

As we now have two variants of recvmmsg() for 32-bit tasks that are both
different from the variant that we use on 64-bit tasks, this means we
also require two compat system calls!

The solution I picked is to flip things around: The existing
compat_sys_recvmmsg() call gets moved from net/compat.c into net/socket.c
and now handles the case for old user space on all architectures that
have set CONFIG_COMPAT_32BIT_TIME. A new compat_sys_recvmmsg_time64()
call gets added in the old place for 64-bit architectures only, this
one handles the case of a compat mmsghdr structure combined with
__kernel_timespec.

In the indirect sys_socketcall(), we now need to call either
do_sys_recvmmsg() or __compat_sys_recvmmsg(), depending on what kind of
architecture we are on. For compat_sys_socketcall(), no such change is
needed, we always call __compat_sys_recvmmsg().

I decided to not add a new SYS_RECVMMSG_TIME64 socketcall: Any libc
implementation for 64-bit time_t will need significant changes including
an updated asm/unistd.h, and it seems better to consistently use the
separate syscalls that configuration, leaving the socketcall only for
backward compatibility with 32-bit time_t based libc.

The naming is asymmetric for the moment, so both existing syscalls
entry points keep their names, while the new ones are recvmmsg_time32
and compat_recvmmsg_time64 respectively. I expect that we will rename
the compat syscalls later as we start using generated syscall tables
everywhere and add these entry points.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

e11d4284

18 11月, 2018 1 次提交

socket: do a generic_file_splice_read when proto_ops has no splice_read · 95506588

由 Slavomir Kaslev 提交于 11月 16, 2018

splice(2) fails with -EINVAL when called reading on a socket with no splice_read
set in its proto_ops (such as vsock sockets). Switch this to fallbacks to a
generic_file_splice_read instead.
Signed-off-by: NSlavomir Kaslev <kaslevs@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95506588

24 10月, 2018 1 次提交

iov_iter: Separate type from direction and use accessor functions · aa563d7b

由 David Howells 提交于 10月 20, 2018

In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

aa563d7b

19 10月, 2018 1 次提交

net: socket: fix a missing-check bug · b6168562

由 Wenwen Wang 提交于 10月 18, 2018

In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.

This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
Signed-off-by: NWenwen Wang <wang6495@umn.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6168562

06 10月, 2018 1 次提交

socket: Tighten no-error check in bind() · 068b88cc

由 Jakub Sitnicki 提交于 10月 04, 2018

move_addr_to_kernel() returns only negative values on error, or zero on
success. Rewrite the error check to an idiomatic form to avoid confusing
the reader.
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

068b88cc

14 9月, 2018 1 次提交

socket: fix struct ifreq size in compat ioctl · 1cebf8f1

由 Johannes Berg 提交于 9月 13, 2018

As reported by Reobert O'Callahan, since Viro's commit to kill
dev_ifsioc() we attempt to copy too much data in compat mode,
which may lead to EFAULT when the 32-bit version of struct ifreq
sits at/near the end of a page boundary, and the next page isn't
mapped.

Fix this by passing the approprate compat/non-compat size to copy
and using that, as before the dev_ifsioc() removal. This works
because only the embedded "struct ifmap" has different size, and
this is only used in SIOCGIFMAP/SIOCSIFMAP which has a different
handler. All other parts of the union are naturally compatible.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199469.

Fixes: bf440573 ("kill dev_ifsioc()")
Reported-by: NRobert O'Callahan <robert@ocallahan.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cebf8f1

29 8月, 2018 1 次提交

y2038: socket: Change recvmmsg to use __kernel_timespec · c2e6c856

由 Arnd Bergmann 提交于 4月 18, 2018

This converts the recvmmsg() system call in all its variations to use
'timespec64' internally for its timeout, and have a __kernel_timespec64
argument in the native entry point. This lets us change the type to use
64-bit time_t at a later point while using the 32-bit compat system call
emulation for existing user space.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

c2e6c856

15 8月, 2018 1 次提交

net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() · 66b51b0a

由 Jeremy Cline 提交于 8月 13, 2018

req->sdiag_family is a user-controlled value that's used as an array
index. Sanitize it after the bounds check to avoid speculative
out-of-bounds array access.

This also protects the sock_is_registered() call, so this removes the
sanitize call there.

Fixes: e978de7a ("net: socket: Fix potential spectre v1 gadget in sock_is_registered")
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: konrad.wilk@oracle.com
Cc: jamie.iles@oracle.com
Cc: liran.alon@oracle.com
Cc: stable@vger.kernel.org
Signed-off-by: NJeremy Cline <jcline@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66b51b0a

01 8月, 2018 1 次提交

net: remove bogus RCU annotations on socket.wq · e6476c21

由 Christoph Hellwig 提交于 7月 30, 2018

We never use RCU protection for it, just a lot of cargo-cult
rcu_deference_protects calls.

Note that we do keep the kfree_rcu call for it, as the references through
struct sock are RCU protected and thus might require a grace period before
freeing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6476c21

31 7月, 2018 2 次提交

net: remove sock_poll_busy_flag · a331de3b

由 Christoph Hellwig 提交于 7月 30, 2018

Fold it into the only caller to make the code simpler and easier to read.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a331de3b

net: remove sock_poll_busy_loop · f641f13b

由 Christoph Hellwig 提交于 7月 30, 2018

There is no point in hiding this logic in a helper.  Also remove the
useless events != 0 check and only busy loop once we know we actually
have a poll method.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f641f13b

29 7月, 2018 2 次提交

net: socket: Fix potential spectre v1 gadget in sock_is_registered · e978de7a

由 Jeremy Cline 提交于 7月 27, 2018

'family' can be a user-controlled value, so sanitize it after the bounds
check to avoid speculative out-of-bounds access.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJeremy Cline <jcline@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e978de7a

net: socket: fix potential spectre v1 gadget in socketcall · c8e8cd57

由 Jeremy Cline 提交于 7月 27, 2018

'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.

Found with the help of Smatch:

net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJeremy Cline <jcline@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8e8cd57

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功