- 12 3月, 2021 1 次提交
-
-
由 Ido Schimmel 提交于
- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*. - RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will currently serve only for dumping of individual buckets of resilient next hop groups. For nexthop group buckets, these messages will carry a nested attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*. There are several reasons why a new suite of messages is created for nexthop buckets instead of overloading the information on the existing RTM_{NEW,DEL,GET}NEXTHOP messages. First, a nexthop group can contain a large number of nexthop buckets (4k is not unheard of). This imposes limits on the amount of information that can be encoded for each nexthop bucket given a netlink message is limited to 64k bytes. Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at this point, in the future it can be extended to provide user space with control over nexthop buckets configuration. - The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is adjusted to bounce groups with that type for now. Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NPetr Machata <petrm@nvidia.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NPetr Machata <petrm@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2021 1 次提交
-
-
由 Maciej W. Rozycki 提交于
Following the recent update to MAINTAINERS update my e-mail address. Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 3月, 2021 12 次提交
-
-
由 Xuesen Huang 提交于
bpf_skb_adjust_room sets the inner_protocol as skb->protocol for packets encapsulation. But that is not appropriate when pushing Ethernet header. Add an option to further specify encap L2 type and set the inner_protocol as ETH_P_TEB. Suggested-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NXuesen Huang <huangxuesen@kuaishou.com> Signed-off-by: NZhiyong Cheng <chengzhiyong@kuaishou.com> Signed-off-by: NLi Wang <wangli09@kuaishou.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/20210304064046.6232-1-hxseverything@gmail.com
-
由 Lorenz Bauer 提交于
Allow to pass sk_lookup programs to PROG_TEST_RUN. User space provides the full bpf_sk_lookup struct as context. Since the context includes a socket pointer that can't be exposed to user space we define that PROG_TEST_RUN returns the cookie of the selected socket or zero in place of the socket pointer. We don't support testing programs that select a reuseport socket, since this would mean running another (unrelated) BPF program from the sk_lookup test handler. Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
-
由 Joe Stringer 提交于
Abstract out the target parameter so that upcoming commits, more than just the existing "helpers" target can be called to generate specific portions of docs from the eBPF UAPI headers. Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-10-joe@cilium.io
-
由 Joe Stringer 提交于
Based roughly on the following commits: * Commit cb4d03ab ("bpf: Add generic support for lookup batch op") * Commit 05799638 ("bpf: Add batch ops to all htab bpf map") * Commit aa2e93b8 ("bpf: Add generic support for update and delete batch ops") Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NBrian Vazquez <brianvv@google.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-9-joe@cilium.io
-
由 Joe Stringer 提交于
Commit 468e2f64 ("bpf: introduce BPF_PROG_QUERY command") originally introduced this, but there have been several additions since then. Unlike BPF_PROG_ATTACH, it appears that the sockmap progs are not able to be queried so far. Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-8-joe@cilium.io
-
由 Joe Stringer 提交于
Based on a brief read of the corresponding source code. Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-7-joe@cilium.io
-
由 Joe Stringer 提交于
Document the prog attach command in more detail, based on git commits: * commit f4324551 ("bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands") * commit 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") * commit f4364dcf ("media: rc: introduce BPF_PROG_LIRC_MODE2") * commit d58e468b ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-6-joe@cilium.io
-
由 Joe Stringer 提交于
Commit b2197755 ("bpf: add support for persistent maps/progs") contains the original implementation and git logs, used as reference for this documentation. Also pull in the filename restriction as documented in commit 6d8cb045 ("bpf: comment why dots in filenames under BPF virtual FS are not allowed") Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-5-joe@cilium.io
-
由 Joe Stringer 提交于
Document the meaning of the BPF_F_LOCK flag for the map lookup/update descriptions. Based on commit 96049f3a ("bpf: introduce BPF_F_LOCK flag"). Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-4-joe@cilium.io
-
由 Joe Stringer 提交于
Introduce high-level descriptions of the intent and return codes of the bpf() syscall commands. Subsequent patches may further flesh out the content to provide a more useful programming reference. Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-3-joe@cilium.io
-
由 Joe Stringer 提交于
These descriptions are present in the man-pages project from the original submissions around 2015-2016. Import them so that they can be kept up to date as developers extend the bpf syscall commands. These descriptions follow the pattern used by scripts/bpf_helpers_doc.py so that we can take advantage of the parser to generate more up-to-date man page writing based upon these headers. Some minor wording adjustments were made to make the descriptions more consistent for the description / return format. Signed-off-by: NJoe Stringer <joe@cilium.io> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NQuentin Monnet <quentin@isovalent.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210302171947.2268128-2-joe@cilium.ioCo-authored-by: NAlexei Starovoitov <ast@kernel.org> Co-authored-by: NMichael Kerrisk <mtk.manpages@gmail.com>
-
由 Ilya Leoshkevich 提交于
Add a new kind value and expand the kind bitfield. Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210226202256.116518-2-iii@linux.ibm.com
-
- 04 3月, 2021 1 次提交
-
-
由 Matthias Schiffer 提交于
Commit 5ee759cd ("l2tp: use standard API for warning log messages") changed a number of warnings about invalid packets in the receive path so that they are always shown, instead of only when a special L2TP debug flag is set. Even with rate limiting these warnings can easily cause significant log spam - potentially triggered by a malicious party sending invalid packets on purpose. In addition these warnings were noticed by projects like Tunneldigger [1], which uses L2TP for its data path, but implements its own control protocol (which is sufficiently different from L2TP data packets that it would always be passed up to userspace even with future extensions of L2TP). Some of the warnings were already redundant, as l2tp_stats has a counter for these packets. This commit adds one additional counter for invalid packets that are passed up to userspace. Packets with unknown session are not counted as invalid, as there is nothing wrong with the format of these packets. With the additional counter, all of these messages are either redundant or benign, so we reduce them to pr_debug_ratelimited(). [1] https://github.com/wlanslovenija/tunneldigger/issues/160 Fixes: 5ee759cd ("l2tp: use standard API for warning log messages") Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 3月, 2021 1 次提交
-
-
由 David Woodhouse 提交于
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 2月, 2021 1 次提交
-
-
由 Dmitry V. Levin 提交于
Apparently, <linux/netfilter/nfnetlink_cthelper.h> and <linux/netfilter/nfnetlink_acct.h> could not be included into the same compilation unit because of a cut-and-paste typo in the former header. Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure") Cc: <stable@vger.kernel.org> # v3.6 Signed-off-by: NDmitry V. Levin <ldv@altlinux.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 27 2月, 2021 3 次提交
-
-
由 Yonghong Song 提交于
The bpf_for_each_map_elem() helper is introduced which iterates all map elements with a callback function. The helper signature looks like long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags) and for each map element, the callback_fn will be called. For example, like hashmap, the callback signature may look like long callback_fn(map, key, val, callback_ctx) There are two known use cases for this. One is from upstream ([1]) where a for_each_map_elem helper may help implement a timeout mechanism in a more generic way. Another is from our internal discussion for a firewall use case where a map contains all the rules. The packet data can be compared to all these rules to decide allow or deny the packet. For array maps, users can already use a bounded loop to traverse elements. Using this helper can avoid using bounded loop. For other type of maps (e.g., hash maps) where bounded loop is hard or impossible to use, this helper provides a convenient way to operate on all elements. For callback_fn, besides map and map element, a callback_ctx, allocated on caller stack, is also passed to the callback function. This callback_ctx argument can provide additional input and allow to write to caller stack for output. If the callback_fn returns 0, the helper will iterate through next element if available. If the callback_fn returns 1, the helper will stop iterating and returns to the bpf program. Other return values are not used for now. Currently, this helper is only available with jit. It is possible to make it work with interpreter with so effort but I leave it as the future work. [1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
-
由 Hangbin Liu 提交于
Commit 34b2021c ("bpf: Add BPF-helper for MTU checking") added an extra blank line in bpf helper description. This will make bpf_helpers_doc.py stop building bpf_helper_defs.h immediately after bpf_check_mtu(), which will affect future added functions. Fixes: 34b2021c ("bpf: Add BPF-helper for MTU checking") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Randy Dunlap 提交于
Drop the doubled word "for" in a comment. {firewire-cdev.h} Drop the doubled word "in" in a comment. {input.h} Drop the doubled word "a" in a comment. {mdev.h} Drop the doubled word "the" in a comment. {ptrace.h} Link: https://lkml.kernel.org/r/20210126232444.22861-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 2月, 2021 2 次提交
-
-
由 Huang Ying 提交于
Now, NUMA balancing can only optimize the page placement among the NUMA nodes if the default memory policy is used. Because the memory policy specified explicitly should take precedence. But this seems too strict in some situations. For example, on a system with 4 NUMA nodes, if the memory of an application is bound to the node 0 and 1, NUMA balancing can potentially migrate the pages between the node 0 and 1 to reduce cross-node accessing without breaking the explicit memory binding policy. So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to set_mempolicy() when mode is MPOL_BIND. With the flag specified, NUMA balancing will be enabled within the thread to optimize the page placement within the constrains of the specified memory binding policy. With the newly added flag, the NUMA balancing control mechanism becomes, - sysctl knob numa_balancing can enable/disable the NUMA balancing globally. - even if sysctl numa_balancing is enabled, the NUMA balancing will be disabled for the memory areas or applications with the explicit memory policy by default. - MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for the applications when specifying the explicit memory policy (MPOL_BIND). Various page placement optimization based on the NUMA balancing can be done with these flags. As the first step, in this patch, if the memory of the application is bound to multiple nodes (MPOL_BIND), and in the hint page fault handler the accessing node are in the policy nodemask, the page will be tried to be migrated to the accessing node to reduce the cross-node accessing. If the newly added MPOL_F_NUMA_BALANCING flag is specified by an application on an old kernel version without its support, set_mempolicy() will return -1 and errno will be set to EINVAL. The application can use this behavior to run on both old and new kernel versions. And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL as before. Because we don't support optimization based on the NUMA balancing for these modes. In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for mbind(). But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good API/ABI for the purpose of the patch. And because it's not clear whether it's necessary to enable NUMA balancing for a specific memory area inside an application, so we only add the flag at the thread level (set_mempolicy()) instead of the memory area level (mbind()). We can do that when it become necessary. To test the patch, we run a test case as follows on a 4-node machine with 192 GB memory (48 GB per node). 1. Change pmbench memory accessing benchmark to call set_mempolicy() to bind its memory to node 1 and 3 and enable NUMA balancing. Some related code snippets are as follows, #include <numaif.h> #include <numa.h> struct bitmask *bmp; int ret; bmp = numa_parse_nodestring("1,3"); ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING, bmp->maskp, bmp->size + 1); /* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */ if (ret < 0 && errno == EINVAL) ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1); if (ret < 0) { perror("Failed to call set_mempolicy"); exit(-1); } 2. Run a memory eater on node 3 to use 40 GB memory before running pmbench. 3. Run pmbench with 64 processes, the working-set size of each process is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB. The CPU and the memory (as in step 1.) of all pmbench processes is bound to node 1 and 3. So, after CPU usage is balanced, some pmbench processes run on the CPUs of the node 3 will access the memory of the node 1. 4. After the pmbench processes run for 100 seconds, kill the memory eater. Now it's possible for some pmbench processes to migrate their pages from node 1 to node 3 to reduce cross-node accessing. Test results show that, with the patch, the pages can be migrated from node 1 to node 3 after killing the memory eater, and the pmbench score can increase about 17.5%. Link: https://lkml.kernel.org/r/20210120061235.148637-2-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hangbin Liu 提交于
Commit 34b2021c ("bpf: Add BPF-helper for MTU checking") added an extra blank line in bpf helper description. This will make bpf_helpers_doc.py stop building bpf_helper_defs.h immediately after bpf_check_mtu(), which will affect future added functions. Fixes: 34b2021c ("bpf: Add BPF-helper for MTU checking") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
-
- 24 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
A few reasons to do this: - The naming of the manager and worker have changed. That's a user visible change, so makes sense to flag it. - Opening certain files that use ->signal (like /proc/self or /dev/tty) now works, and the flag tells the application upfront that this is the case. - Related to the above, using signalfd will now work as well. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 2月, 2021 3 次提交
-
-
由 Parav Pandit 提交于
Enable user to query vdpa device information. $ vdpa dev add mgmtdev vdpasim_net name foo2 Show the newly created vdpa device by its name: $ vdpa dev show foo2 foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256 $ vdpa dev show foo2 -jp { "dev": { "foo2": { "type": "network", "mgmtdev": "vdpasim_net", "vendor_id": 0, "max_vqs": 2, "max_vq_size": 256 } } } Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NEli Cohen <elic@nvidia.com> Reviewed-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210105103203.82508-6-parav@nvidia.com Including a memory leak fix: Link: https://lore.kernel.org/r/20210217060614.59561-1-parav@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Parav Pandit 提交于
Add the ability to add and delete a vdpa device. Examples: Create a vdpa device of type network named "foo2" from the management device vdpasim: $ vdpa dev add mgmtdev vdpasim_net name foo2 Delete the vdpa device after its use: $ vdpa dev del foo2 Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NEli Cohen <elic@nvidia.com> Reviewed-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210105103203.82508-5-parav@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Parav Pandit 提交于
To add one or more VDPA devices, define a management device which allows adding or removing vdpa device. A management device defines set of callbacks to manage vdpa devices. To begin with, it defines add and remove callbacks through which a user defined vdpa device can be added or removed. A unique management device is identified by its unique handle identified by management device name and optionally the bus name. Hence, introduce routine through which driver can register a management device and its callback operations for adding and remove a vdpa device. Introduce vdpa netlink socket family so that user can query management device and its attributes. Example of show vdpa management device which allows creating vdpa device of networking class (device id = 0x1) of virtio specification 1.1 section 5.1.1. $ vdpa mgmtdev show vdpasim_net: supported_classes: net Example of showing vdpa management device in JSON format. $ vdpa mgmtdev show -jp { "show": { "vdpasim_net": { "supported_classes": [ "net" ] } } } Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NEli Cohen <elic@nvidia.com> Reviewed-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210105103203.82508-4-parav@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Including a bugfix: vpda: correctly size vdpa_nl_policy We need to ensure last entry of vdpa_nl_policy[] is zero, otherwise out-of-bounds access is hurting us. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Eli Cohen <elic@nvidia.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210210134911.4119555-1-eric.dumazet@gmail.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 17 2月, 2021 4 次提交
-
-
由 Ben Widawsky 提交于
Add initial set of formal commands beyond basic identify and command enumeration. Signed-off-by: NBen Widawsky <ben.widawsky@intel.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> (v2) Link: https://lore.kernel.org/r/20210217040958.1354670-8-ben.widawsky@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Ben Widawsky 提交于
CXL devices identified by the memory-device class code must implement the Device Command Interface (described in 8.2.9 of the CXL 2.0 spec). While the driver already maintains a list of commands it supports, there is still a need to be able to distinguish between commands that the driver knows about from commands that are optionally supported by the hardware. The Command Effects Log (CEL) is specified in the CXL 2.0 specification. The CEL is one of two types of logs, the other being vendor specific. They are distinguished in hardware/spec via UUID. The CEL is useful for 2 things: 1. Determine which optional commands are supported by the CXL device. 2. Enumerate any vendor specific commands The CEL is used by the driver to determine which commands are available in the hardware and therefore which commands userspace is allowed to execute. The set of enabled commands might be a subset of commands which are advertised in UAPI via CXL_MEM_SEND_COMMAND IOCTL. With the CEL enabling comes a internal flag to indicate a base set of commands that are enabled regardless of CEL. Such commands are required for basic interaction with the hardware and thus can be useful in debug cases, for example if the CEL is corrupted. The implementation leaves the statically defined table of commands and supplements it with a bitmap to determine commands that are enabled. This organization was chosen for the following reasons: - Smaller memory footprint. Doesn't need a table per device. - Reduce memory allocation complexity. - Fixed command IDs to opcode mapping for all devices makes development and debugging easier. - Certain helpers are easily achievable, like cxl_for_each_cmd(). Signed-off-by: NBen Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> (v3) Link: https://lore.kernel.org/r/20210217040958.1354670-7-ben.widawsky@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Ben Widawsky 提交于
The CXL memory device send interface will have a number of supported commands. The raw command is not such a command. Raw commands allow userspace to send a specified opcode to the underlying hardware and bypass all driver checks on the command. The primary use for this command is to [begrudgingly] allow undocumented vendor specific hardware commands. While not the main motivation, it also allows prototyping new hardware commands without a driver patch and rebuild. While this all sounds very powerful it comes with a couple of caveats: 1. Bug reports using raw commands will not get the same level of attention as bug reports using supported commands (via taint). 2. Supported commands will be rejected by the RAW command. With this comes new debugfs knob to allow full access to your toes with your weapon of choice. Signed-off-by: NBen Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Ariel Sibley <Ariel.Sibley@microchip.com> Link: https://lore.kernel.org/r/20210217040958.1354670-6-ben.widawsky@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Ben Widawsky 提交于
Add a straightforward IOCTL that provides a mechanism for userspace to query the supported memory device commands. CXL commands as they appear to userspace are described as part of the UAPI kerneldoc. The command list returned via this IOCTL will contain the full set of commands that the driver supports, however, some of those commands may not be available for use by userspace. Memory device commands first appear in the CXL 2.0 specification. They are submitted through a mailbox mechanism specified in the CXL 2.0 specification. The send command allows userspace to issue mailbox commands directly to the hardware. The list of available commands to send are the output of the query command. The driver verifies basic properties of the command and possibly inspect the input (or output) payload to determine whether or not the command is allowed (or might taint the kernel). Reported-by: kernel test robot <lkp@intel.com> # bug in earlier revision Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NBen Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20210217040958.1354670-5-ben.widawsky@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 2月, 2021 3 次提交
-
-
由 Geliang Tang 提交于
Add mptcpi_local_addr_used and mptcpi_local_addr_max in struct mptcp_info. Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Laurent Vivier 提交于
It can be useful to the interpreter to know which flags are in use. For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument. This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled. Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460. Signed-off-by: NLaurent Vivier <laurent@vivier.eu> Reviewed-by: NYunQiang Su <ysu@wavecomp.com> URL: http://bugs.debian.org/970460Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Pablo Neira Ayuso 提交于
A userspace daemon like firewalld might need to monitor for netlink updates to detect its ruleset removal by the (global) flush ruleset command to ensure ruleset persistency. This adds extra complexity from userspace and, for some little time, the firewall policy is not in place. This patch adds the NFT_TABLE_F_OWNER flag which allows a userspace program to own the table that creates in exclusivity. Tables that are owned... - can only be updated and removed by the owner, non-owners hit EPERM if they try to update it or remove it. - are destroyed when the owner closes the netlink socket or the process is gone (implicit netlink socket closure). - are skipped by the global flush ruleset command. - are listed in the global ruleset. The userspace process that sets on the NFT_TABLE_F_OWNER flag need to leave open the netlink socket. A new NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID to identify the owner from userspace. This patch also updates error reporting when an unknown table flag is specified to change it from EINVAL to EOPNOTSUPP given that EINVAL is usually reserved to report for malformed netlink messages to userspace. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 15 2月, 2021 2 次提交
-
-
由 Bartosz Golaszewski 提交于
GPL-2.0 license identifier is deprecated. User-space projects that want to include the kernel header with their source-code will be unable to become fully REUSE compliant due to the reuse tool complaining about deprecated licenses. Change the SPDX identifier to GPL-2.0-only. Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
-
由 Kent Gibson 提交于
The description of the flags field of the struct gpio_v2_line_info mentions "the GPIO lines" while the info only applies to an individual GPIO line. This was accidentally changed from "the GPIO line" during formatting improvements. Reword to "this GPIO line" to clarify and to be consistent with other struct gpio_v2_line_info fields. Fixes: 2cc522d3 ("gpio: uapi: kernel-doc formatting improvements") Signed-off-by: NKent Gibson <warthog618@gmail.com> Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
-
- 13 2月, 2021 3 次提交
-
-
由 Florian Westphal 提交于
Allow userspace (mptcpd) to subscribe to mptcp genl multicast events. This implementation reuses the same event API as the mptcp kernel fork to ease integration of existing tools, e.g. mptcpd. Supported events include: 1. start and close of an mptcp connection 2. start and close of subflows (joins) 3. announce and withdrawals of addresses 4. subflow priority (backup/non-backup) change. Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs. The SKB object is complex and the skb->len value (accessible from BPF-prog) also include the length of any extra GRO/GSO segments, but without taking into account that these GRO/GSO segments get added transport (L4) and network (L3) headers before being transmitted. Thus, this BPF-helper is created such that the BPF-programmer don't need to handle these details in the BPF-prog. The API is designed to help the BPF-programmer, that want to do packet context size changes, which involves other helpers. These other helpers usually does a delta size adjustment. This helper also support a delta size (len_diff), which allow BPF-programmer to reuse arguments needed by these other helpers, and perform the MTU check prior to doing any actual size adjustment of the packet context. It is on purpose, that we allow the len adjustment to become a negative result, that will pass the MTU check. This might seem weird, but it's not this helpers responsibility to "catch" wrong len_diff adjustments. Other helpers will take care of these checks, if BPF-programmer chooses to do actual size adjustment. V14: - Improve man-page desc of len_diff. V13: - Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff. V12: - Simplify segment check that calls skb_gso_validate_network_len. - Helpers should return long V9: - Use dev->hard_header_len (instead of ETH_HLEN) - Annotate with unlikely req from Daniel - Fix logic error using skb_gso_validate_network_len from Daniel V6: - Took John's advice and dropped BPF_MTU_CHK_RELAX - Returned MTU is kept at L3-level (like fib_lookup) V4: Lot of changes - ifindex 0 now use current netdev for MTU lookup - rename helper from bpf_mtu_check to bpf_check_mtu - fix bug for GSO pkt length (as skb->len is total len) - remove __bpf_len_adj_positive, simply allow negative len adj Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
-
由 Jesper Dangaard Brouer 提交于
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup) can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog don't know the MTU value that caused this rejection. If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it need to know this MTU value for the ICMP packet. Patch change lookup and result struct bpf_fib_lookup, to contain this MTU value as output via a union with 'tot_len' as this is the value used for the MTU lookup. V5: - Fixed uninit value spotted by Dan Carpenter. - Name struct output member mtu_result Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287789952.790810.13134700381067698781.stgit@firesoul
-
- 12 2月, 2021 2 次提交
-
-
由 Johannes Berg 提交于
These were missed earlier, add the necessary documentation and, while at it, clarify it. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210212105023.895c3389f063.I46dea3bfc64385bc6f600c50d294007510994f8f@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Ben Greear 提交于
Allow user to disable HE mode, similar to how VHT and HT can be disabled. Useful for testing. Signed-off-by: NBen Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20210204144610.25971-1-greearb@candelatech.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-