- 21 12月, 2021 1 次提交
-
-
由 Matt Johnston 提交于
Userspace can receive notification of MCTP address changes via RTNLGRP_MCTP_IFADDR rtnetlink multicast group. Signed-off-by: NMatt Johnston <matt@codeconstruct.com.au> Link: https://lore.kernel.org/r/20211220023104.1965509-1-matt@codeconstruct.com.auSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 12月, 2021 1 次提交
-
-
由 Baowen Zheng 提交于
We add skip_hw and skip_sw for user to control if offload the action to hardware. We also add in_hw_count for user to indicate if the action is offloaded to any hardware. Signed-off-by: NBaowen Zheng <baowen.zheng@corigine.com> Signed-off-by: NSimon Horman <simon.horman@corigine.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 12月, 2021 1 次提交
-
-
由 Matthieu Baerts 提交于
'loc_id' and 'rem_id' are set in all events linked to subflows but those were missing in the events description in the comments. Fixes: b911c97c ("mptcp: add netlink event support") Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 12月, 2021 1 次提交
-
-
由 Hangbin Liu 提交于
Since commit 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") the user could get bond active interface's PHC index directly. But when there is a failover, the bond active interface will change, thus the PHC index is also changed. This may break the user's program if they did not update the PHC timely. This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX. When the user wants to get the bond active interface's PHC, they need to add this flag and be aware the PHC index may be changed. With the new flag. All flag checks in current drivers are removed. Only the checking in net_hwtstamp_validate() is kept. Suggested-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 12月, 2021 1 次提交
-
-
由 Drew DeVault 提交于
This limit has not been updated since 2008, when it was increased to 64 KiB at the request of GnuPG. Until recently, the main use-cases for this feature were (1) preventing sensitive memory from being swapped, as in GnuPG's use-case; and (2) real-time use-cases. In the first case, little memory is called for, and in the second case, the user is generally in a position to increase it if they need more. The introduction of IOURING_REGISTER_BUFFERS adds a third use-case: preparing fixed buffers for high-performance I/O. This use-case will take as much of this memory as it can get, but is still limited to 64 KiB by default, which is very little. This increases the limit to 8 MB, which was chosen fairly arbitrarily as a more generous, but still conservative, default value. It is also possible to raise this limit in userspace. This is easily done, for example, in the use-case of a network daemon: systemd, for instance, provides for this via LimitMEMLOCK in the service file; OpenRC via the rc_ulimit variables. However, there is no established userspace facility for configuring this outside of daemons: end-user applications do not presently have access to a convenient means of raising their limits. The buck, as it were, stops with the kernel. It's much easier to address it here than it is to bring it to hundreds of distributions, and it can only realistically be relied upon to be high-enough by end-user software if it is more-or-less ubiquitous. Most distros don't change this particular rlimit from the kernel-supplied default value, so a change here will easily provide that ubiquity. Link: https://lkml.kernel.org/r/20211028080813.15966-1-sir@cmpwn.comSigned-off-by: NDrew DeVault <sir@cmpwn.com> Acked-by: NJens Axboe <axboe@kernel.dk> Acked-by: NCyril Hrubis <chrubis@suse.cz> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Dona-Couch <andrew@donacou.ch> Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2021 1 次提交
-
-
由 Eric Biggers 提交于
signalfd_poll() and binder_poll() are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, by sending a POLLFREE notification to all waiters. Unfortunately, only eventpoll handles POLLFREE. A second type of non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't handle POLLFREE. This allows a use-after-free to occur if a signalfd or binder fd is polled with aio poll, and the waitqueue gets freed. Fix this by making aio poll handle POLLFREE. A patch by Ramji Jiyani <ramjiyani@google.com> (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com) tried to do this by making aio_poll_wake() always complete the request inline if POLLFREE is seen. However, that solution had two bugs. First, it introduced a deadlock, as it unconditionally locked the aio context while holding the waitqueue lock, which inverts the normal locking order. Second, it didn't consider that POLLFREE notifications are missed while the request has been temporarily de-queued. The second problem was solved by my previous patch. This patch then properly fixes the use-after-free by handling POLLFREE in a deadlock-free way. It does this by taking advantage of the fact that freeing of the waitqueue is RCU-delayed, similar to what eventpoll does. Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
-
- 03 12月, 2021 2 次提交
-
-
由 Alexei Starovoitov 提交于
struct bpf_core_relo is generated by llvm and processed by libbpf. It's a de-facto uapi. With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure. Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command and let the kernel perform CO-RE relocations. Note the struct bpf_line_info and struct bpf_func_info have the same layout when passed from LLVM to libbpf and from libbpf to the kernel except "insn_off" fields means "byte offset" when LLVM generates it. Then libbpf converts it to "insn index" to pass to the kernel. The struct bpf_core_relo's "insn_off" field is always "byte offset". Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-6-alexei.starovoitov@gmail.com
-
由 Alexei Starovoitov 提交于
enum bpf_core_relo_kind is generated by llvm and processed by libbpf. It's a de-facto uapi. With CO-RE in the kernel the bpf_core_relo_kind values become uapi de-jure. Also rename them with BPF_CORE_ prefix to distinguish from conflicting names in bpf_core_read.h. The enums bpf_field_info_kind, bpf_type_id_kind, bpf_type_info_kind, bpf_enum_value_kind are passing different values from bpf program into llvm. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-5-alexei.starovoitov@gmail.com
-
- 02 12月, 2021 2 次提交
-
-
由 Xiayu Zhang 提交于
The description of ETH_P_802_3_MIN is misleading. The value of EthernetType in Ethernet II frame is more than 0x0600, the value of Length in 802.3 frame is less than 0x0600. Signed-off-by: NXiayu Zhang <Xiayu.Zhang@mediatek.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This reverts commit aeeecb88. The new SNMP variable (TCPSmallQueueFailure) can be incremented for good reasons, even on a 100Gbit single TCP_STREAM flow. If we really wanted to ease driver debugging [1], this would require something more sophisticated. [1] Usually, if a driver is delaying TX completions too much, this can lead to stalls in TCP output. Various work arounds have been used in the past, like skb_orphan() in ndo_start_xmit(). Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Menglong Dong <imagedong@tencent.com> Link: https://lore.kernel.org/r/20211201033246.2826224-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 01 12月, 2021 1 次提交
-
-
由 Joanne Koong 提交于
This patch adds the kernel-side and API changes for a new helper function, bpf_loop: long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags); where long (*callback_fn)(u32 index, void *ctx); bpf_loop invokes the "callback_fn" **nr_loops** times or until the callback_fn returns 1. The callback_fn can only return 0 or 1, and this is enforced by the verifier. The callback_fn index is zero-indexed. A few things to please note: ~ The "u64 flags" parameter is currently unused but is included in case a future use case for it arises. ~ In the kernel-side implementation of bpf_loop (kernel/bpf/bpf_iter.c), bpf_callback_t is used as the callback function cast. ~ A program can have nested bpf_loop calls but the program must still adhere to the verifier constraint of its stack depth (the stack depth cannot exceed MAX_BPF_STACK)) ~ Recursive callback_fns do not pass the verifier, due to the call stack for these being too deep. ~ The next patch will include the tests and benchmark Signed-off-by: NJoanne Koong <joannekoong@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211130030622.4131246-2-joannekoong@fb.com
-
- 30 11月, 2021 1 次提交
-
-
由 Hangbin Liu 提交于
Currently, we use hard code number to verify if we are in the arp_interval timeslice. But some user may want to reduce/extend the verify timeslice. With the similar team option 'missed_max' the uers could change that number based on their own environment. Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2021 2 次提交
-
-
由 Menglong Dong 提交于
Once tcp small queue check failed in tcp_small_queue_check(), the throughput of tcp will be limited, and it's hard to distinguish whether it is out of tcp congestion control. Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gurchetan Singh 提交于
The current virtgpu implementation of poll(..) drops events when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is enabled (otherwise it's like a normal DRM driver). This is because paravirtualized userspaces receives responses in a buffer of type BLOB_MEM_GUEST, not by read(..). To be in line with other DRM drivers and avoid specialized behavior, it is possible to define a dummy event for virtgpu. Paravirtualized userspace will now have to call read(..) on the DRM fd to receive the dummy event. Fixes: b1079043 ("drm/virtgpu api: create context init feature") Reported-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NGurchetan Singh <gurchetansingh@chromium.org> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20211122232210.602-2-gurchetansingh@google.comSigned-off-by: NGerd Hoffmann <kraxel@redhat.com>
-
- 23 11月, 2021 1 次提交
-
-
由 Jeremy Kerr 提交于
This change adds a MCTP Serial transport binding, as defined by DMTF specificiation DSP0253 - "MCTP Serial Transport Binding". This is implemented as a new serial line discipline, and can be attached to arbitrary tty devices. From the Kconfig description: This driver provides an MCTP-over-serial interface, through a serial line-discipline, as defined by DMTF specification "DSP0253 - MCTP Serial Transport Binding". By attaching the ldisc to a serial device, we get a new net device to transport MCTP packets. This allows communication with external MCTP endpoints which use serial as their transport. It can also be used as an easy way to provide MCTP connectivity between virtual machines, by forwarding data between simple virtual serial devices. Say y here if you need to connect to MCTP endpoints over serial. To compile as a module, use m; the module will be called mctp-serial. Once the N_MCTP line discipline is set [using ioctl(TCIOSETD)], we get a new netdev suitable for MCTP communication. The 'mctp' utility[1] provides a simple wrapper for this ioctl, using 'link serial <device>': # mctp link serial /dev/ttyS0 & # mctp link dev lo index 1 address 0x00:00:00:00:00:00 net 1 mtu 65536 up dev mctpserial0 index 5 address 0x(no-addr) net 1 mtu 68 down [1]: https://github.com/CodeConstruct/mctpSigned-off-by: NJeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2021 2 次提交
-
-
由 Hao Chen 提交于
Add support to set rx buf len via ethtool -G parameter and get rx buf len via ethtool -g parameter. Signed-off-by: NHao Chen <chenhao288@hisilicon.com> Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hao Chen 提交于
Add support for ethtool to set/get tx copybreak buf size. Signed-off-by: NHao Chen <chenhao288@hisilicon.com> Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2021 1 次提交
-
-
由 Tiezhu Yang 提交于
In the current code, the actual max tail call count is 33 which is greater than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance. We can see the historical evolution from commit 04fd61ab ("bpf: allow bpf programs to tail-call other bpf programs") and commit f9dabe01 ("bpf: Undo off-by-one in interpreter tail call count limit"). In order to avoid changing existing behavior, the actual limit is 33 now, this is reasonable. After commit 874be05f ("bpf, tests: Add tail call test suite"), we can see there exists failed testcase. On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set: # echo 0 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:0 ret 34 != 33 FAIL On some archs: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:1 ret 34 != 33 FAIL Although the above failed testcase has been fixed in commit 18935a72 ("bpf/tests: Fix error in tail call limit tests"), it would still be good to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code more readable. The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh. For the riscv JIT, use RV_REG_TCC directly to save one register move as suggested by Björn Töpel. For the other implementations, no function changes, it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT can reflect the actual max tail call count, the related tail call testcases in test_bpf module and selftests can work well for the interpreter and the JIT. Here are the test results on x86_64: # uname -m x86_64 # echo 0 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf test_suite=test_tail_calls # dmesg | tail -1 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed] # rmmod test_bpf # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf test_suite=test_tail_calls # dmesg | tail -1 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed] # rmmod test_bpf # ./test_progs -t tailcalls #142 tailcalls:OK Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Tested-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com> Tested-by: NIlya Leoshkevich <iii@linux.ibm.com> Acked-by: NBjörn Töpel <bjorn@kernel.org> Acked-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com> Acked-by: NIlya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
-
- 15 11月, 2021 3 次提交
-
-
由 Jakub Kicinski 提交于
This reverts commit 71812af7, reversing changes made to cc0be1ad. Wolfram Sang says: Please revert. Besides the driver in net, it modifies the I2C core code. This has not been acked by the I2C maintainer (in this case me). So, please don't pull this in via the net tree. The question raised here (extending SMBus calls to 255 byte) is complicated because we need ABI backwards compatibility. Link: https://lore.kernel.org/all/YZJ9H4eM%2FM7OXVN0@shikoro/Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Matt Johnston 提交于
I2C_SMBUS is limited to 32 bytes due to compatibility with the 32 byte i2c_smbus_data.block I2C_RDWR allows larger transfers if sufficient sized buffers are passed. Signed-off-by: NMatt Johnston <matt@codeconstruct.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matt Johnston 提交于
SMBus 3.0 increased the maximum block transfer size from 32 bytes to 255 bytes. We increase the size of struct i2c_smbus_data's block[] member. i2c_smbus_xfer() and i2c_smbus_xfer_emulated() now support 255 byte block operations, other block functions remain limited to 32 bytes for compatibility with existing callers. We allow adapters to indicate support for the larger size with I2C_FUNC_SMBUS_V3_BLOCK. Most emulated drivers should be able to use 255 byte blocks by replacing I2C_SMBUS_BLOCK_MAX with I2C_SMBUS_V3_BLOCK_MAX though some will have hardware limitations that need testing. Signed-off-by: NMatt Johnston <matt@codeconstruct.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 11月, 2021 1 次提交
-
-
由 Yonghong Song 提交于
LLVM patches ([1] for clang, [2] and [3] for BPF backend) added support for btf_type_tag attributes. This patch added support for the kernel. The main motivation for btf_type_tag is to bring kernel annotations __user, __rcu etc. to btf. With such information available in btf, bpf verifier can detect mis-usages and reject the program. For example, for __user tagged pointer, developers can then use proper helper like bpf_probe_read_user() etc. to read the data. BTF_KIND_TYPE_TAG may also useful for other tracing facility where instead of to require user to specify kernel/user address type, the kernel can detect it by itself with btf. [1] https://reviews.llvm.org/D111199 [2] https://reviews.llvm.org/D113222 [3] https://reviews.llvm.org/D113496Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211112012609.1505032-1-yhs@fb.com
-
- 11 11月, 2021 2 次提交
-
-
由 Peter Gonda 提交于
For SEV to work with intra host migration, contents of the SEV info struct such as the ASID (used to index the encryption key in the AMD SP) and the list of memory regions need to be transferred to the target VM. This change adds a commands for a target VMM to get a source SEV VM's sev info. Signed-off-by: NPeter Gonda <pgonda@google.com> Suggested-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NMarc Orr <marcorr@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-3-pgonda@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mark Pashmfouroush 提交于
It may be helpful to have access to the ifindex during bpf socket lookup. An example may be to scope certain socket lookup logic to specific interfaces, i.e. an interface may be made exempt from custom lookup code. Add the ifindex of the arriving connection to the bpf_sk_lookup API. Signed-off-by: NMark Pashmfouroush <markpash@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211110111016.5670-2-markpash@cloudflare.com
-
- 10 11月, 2021 1 次提交
-
-
由 David Hildenbrand 提交于
The initial virtio-mem spec states that while unplugged memory should not be read, the device still has to allow for reading unplugged memory inside the usable region. The primary motivation for this default handling was to simplify bringup of virtio-mem, because there were corner cases where Linux might have accidentially read unplugged memory inside added Linux memory blocks. In the meantime, we: 1. Removed /dev/kmem in commit bbcd53c9 ("drivers/char: remove /dev/kmem for good") 2. Disallowed access to virtio-mem device memory via /dev/mem in commit 2128f4e2 ("virtio-mem: disallow mapping virtio-mem memory via /dev/mem") 3. Sanitized access to virtio-mem device memory via /proc/kcore in commit 0daa322b ("fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages") 4. Sanitized access to virtio-mem device memory via /proc/vmcore in commit ce281462 ("virtio-mem: kdump mode to sanitize /proc/vmcore access") "Accidential" access to unplugged memory is no longer possible; we can support the new VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature that will be required by some hypervisors implementing virtio-mem in the near future. Acked-by: NMichael S. Tsirkin <mst@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: Hui Zhu <teawater@gmail.com> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: NDavid Hildenbrand <david@redhat.com>
-
- 08 11月, 2021 2 次提交
-
-
由 Peter Collingbourne 提交于
This constant was previously an unsigned long, but was changed into an int in commit 433c38f4 ("arm64: mte: change ASYNC and SYNC TCF settings into bitfields"). This ended up causing spurious unsigned-signed comparison warnings in expressions such as: (x & PR_MTE_TCF_MASK) != PR_MTE_TCF_NONE Therefore, change it back into an unsigned long to silence these warnings. Link: https://linux-review.googlesource.com/id/I07a72310db30227a5b7d789d0b817d78b657c639Signed-off-by: NPeter Collingbourne <pcc@google.com> Link: https://lore.kernel.org/r/20211105230829.2254790-1-pcc@google.comSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Song Liu 提交于
In some profiler use cases, it is necessary to map an address to the backing file, e.g., a shared library. bpf_find_vma helper provides a flexible way to achieve this. bpf_find_vma maps an address of a task to the vma (vm_area_struct) for this address, and feed the vma to an callback BPF function. The callback function is necessary here, as we need to ensure mmap_sem is unlocked. It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem safely when irqs are disable, we use the same mechanism as stackmap with build_id. Specifically, when irqs are disabled, the unlocked is postponed in an irq_work. Refactor stackmap.c so that the irq_work is shared among bpf_find_vma and stackmap helpers. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Tested-by: NHengqi Chen <hengqi.chen@gmail.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211105232330.1936330-2-songliubraving@fb.com
-
- 04 11月, 2021 3 次提交
-
-
由 Michael S. Tsirkin 提交于
Declaring the struct packed here is mostly harmless, but gives a bad example for people to copy. As the struct is packed and aligned manually, let's just drop the attribute. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NBartosz Golaszewski <brgl@bgdev.pl>
-
由 Viresh Kumar 提交于
This patch adds IRQ support for the virtio GPIO driver. Note that this uses the irq_bus_lock/unlock() callbacks, since those operations over virtio may sleep. Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NBartosz Golaszewski <brgl@bgdev.pl>
-
由 Eric W. Biederman 提交于
As Andy pointed out that there are races between force_sig_info_to_task and sigaction[1] when force_sig_info_task. As Kees discovered[2] ptrace is also able to change these signals. In the case of seeccomp killing a process with a signal it is a security violation to allow the signal to be caught or manipulated. Solve this problem by introducing a new flag SA_IMMUTABLE that prevents sigaction and ptrace from modifying these forced signals. This flag is carefully made kernel internal so that no new ABI is introduced. Longer term I think this can be solved by guaranteeing short circuit delivery of signals in this case. Unfortunately reliable and guaranteed short circuit delivery of these signals is still a ways off from being implemented, tested, and merged. So I have implemented a much simpler alternative for now. [1] https://lkml.kernel.org/r/b5d52d25-7bde-4030-a7b1-7c6f8ab90660@www.fastmail.com [2] https://lkml.kernel.org/r/202110281136.5CE65399A7@keescook Cc: stable@vger.kernel.org Fixes: 307d522f ("signal/seccomp: Refactor seccomp signal and coredump generation") Tested-by: NAndrea Righi <andrea.righi@canonical.com> Tested-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 03 11月, 2021 1 次提交
-
-
由 Jakub Kicinski 提交于
ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id, so we need to subtract non-stats and add one to get a count (IOW -2+1 == -1). Otherwise we'll see: ethnl cmd 21: calculated reply length 40, but consumed 52 Fixes: 9a27a330 ("ethtool: add standard pause stats") Signed-off-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 11月, 2021 3 次提交
-
-
由 James Prestwood 提交于
In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for wireless roams the neighbor discovery cache can and should remain intact since the underlying network has not changed. This patch introduces a sysctl option ndisc_evict_nocarrier which can be disabled by a wireless supplicant during a roam. This allows packets to be sent after a roam immediately without having to wait for neighbor discovery. A user reported roughly a 1 second delay after a roam before packets could be sent out (note, on IPv4). This delay was due to the ARP cache being cleared. During testing of this same scenario using IPv6 no delay was noticed, but regardless there is no reason to clear the ndisc cache for wireless roams. Signed-off-by: NJames Prestwood <prestwoj@gmail.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 James Prestwood 提交于
This change introduces a new sysctl parameter, arp_evict_nocarrier. When set (default) the ARP cache will be cleared on a NOCARRIER event. This new option has been defaulted to '1' which maintains existing behavior. Clearing the ARP cache on NOCARRIER is relatively new, introduced by: commit 859bd2ef Author: David Ahern <dsahern@gmail.com> Date: Thu Oct 11 20:33:49 2018 -0700 net: Evict neighbor entries on carrier down The reason for this changes is to prevent the ARP cache from being cleared when a wireless device roams. Specifically for wireless roams the ARP cache should not be cleared because the underlying network has not changed. Clearing the ARP cache in this case can introduce significant delays sending out packets after a roam. A user reported such a situation here: https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/ After some investigation it was found that the kernel was holding onto packets until ARP finished which resulted in this 1 second delay. It was also found that the first ARP who-has was never responded to, which is actually what caues the delay. This change is more or less working around this behavior, but again, there is no reason to clear the cache on a roam anyways. As for the unanswered who-has, we know the packet made it OTA since it was seen while monitoring. Why it never received a response is unknown. In any case, since this is a problem on the AP side of things all that can be done is to work around it until it is solved. Some background on testing/reproducing the packet delay: Hardware: - 2 access points configured for Fast BSS Transition (Though I don't see why regular reassociation wouldn't have the same behavior) - Wireless station running IWD as supplicant - A device on network able to respond to pings (I used one of the APs) Procedure: - Connect to first AP - Ping once to establish an ARP entry - Start a tcpdump - Roam to second AP - Wait for operstate UP event, and note the timestamp - Start pinging Results: Below is the tcpdump after UP. It was recorded the interface went UP at 10:42:01.432875. 10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46 10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64 10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64 10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64 10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64 10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64 10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64 10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64 10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64 10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64 10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64 10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64 10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64 You can see the first ARP who-has went out very quickly after UP, but was never responded to. Nearly a second later the kernel retries and gets a response. Only then do the ping packets go out. If an ARP entry is manually added prior to UP (after the cache is cleared) it is seen that the first ping is never responded to, so its not only an issue with ARP but with data packets in general. As mentioned prior, the wireless interface was also monitored to verify the ping/ARP packet made it OTA which was observed to be true. Signed-off-by: NJames Prestwood <prestwoj@gmail.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Joanne Koong 提交于
This patch makes 2 changes regarding alignment padding for the "map_extra" field. 1) In the kernel header, "map_extra" and "btf_value_type_id" are rearranged to consolidate the hole. Before: struct bpf_map { ... u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ u64 map_extra; /* 48 8 */ int spin_lock_off; /* 56 4 */ int timer_off; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 id; /* 64 4 */ int numa_node; /* 68 4 */ ... bool frozen; /* 117 1 */ /* XXX 10 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 2, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */ } __attribute__((__aligned__(64))); After: struct bpf_map { ... u32 max_entries; /* 36 4 */ u64 map_extra; /* 40 8 */ u32 map_flags; /* 48 4 */ int spin_lock_off; /* 52 4 */ int timer_off; /* 56 4 */ u32 id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ int numa_node; /* 64 4 */ ... bool frozen /* 113 1 */ /* XXX 14 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 1, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 14 */ } __attribute__((__aligned__(64))); 2) Add alignment padding to the bpf_map_info struct More details can be found in commit 36f9814a ("bpf: fix uapi hole for 32 bit compat applications") Signed-off-by: NJoanne Koong <joannekoong@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029224909.1721024-3-joannekoong@fb.com
-
- 01 11月, 2021 6 次提交
-
-
由 Taehee Yoo 提交于
It adds definitions and control plane code for AMT. this is very similar to udp tunneling interfaces such as gtp, vxlan, etc. In the next patch, data plane code will be added. Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parav Pandit 提交于
Introduce a command to query a device config layout. An example query of network vdpa device: $ vdpa dev add name bar mgmtdev vdpasim_net $ vdpa dev config show bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:35:09:19:48:05", "link ": "up", "link_announce ": false, "mtu": 1500, } } } Signed-off-by: NParav Pandit <parav@nvidia.com> Signed-off-by: NEli Cohen <elic@nvidia.com> Acked-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-3-parav@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Viresh Kumar 提交于
The virtio specification received a new mandatory feature (VIRTIO_I2C_F_ZERO_LENGTH_REQUEST) for zero length requests. Fail if the feature isn't offered by the device. For each read-request, set the VIRTIO_I2C_FLAGS_M_RD flag, as required by the VIRTIO_I2C_F_ZERO_LENGTH_REQUEST feature. This allows us to support zero length requests, like SMBUS Quick, where the buffer need not be sent anymore. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/7c58868cd26d2fc4bd82d0d8b0dfb55636380110.1634808714.git.viresh.kumar@linaro.orgSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: Jie Deng <jie.deng@intel.com> # once the spec is merged
-
由 Pablo Neira Ayuso 提交于
Allow to match and mangle on inner headers / payload data after the transport header. There is a new field in the pktinfo structure that stores the inner header offset which is calculated only when requested. Only TCP and UDP supported at this stage. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Wu Zongyong 提交于
This attribute advertises the min value of virtqueue size. The value is 1 by default. Signed-off-by: NWu Zongyong <wuzongyong@linux.alibaba.com> Link: https://lore.kernel.org/r/2bbc417355c4d22298050b1ba887cecfbde3e85d.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Pablo Neira Ayuso 提交于
Generalize NFT_META_IIFTYPE to NFT_META_IFTYPE which allows you to match on the interface type of the skb->dev field. This field is used by the netdev family to add an implicit dependency to skip non-ethernet packets when matching on layer 3 and 4 TCP/IP header fields. For backward compatibility, add the NFT_META_IIFTYPE alias to NFT_META_IFTYPE. Add __NFT_META_IIFTYPE, to be used by userspace in the future to match specifically on the iiftype. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-