- 28 12月, 2017 3 次提交
-
-
由 Sudip Mukherjee 提交于
The build of mips bcm47xx_defconfig is failing with the error: net/sched/sch_fq_codel.c: In function 'fq_codel_init': net/sched/sch_fq_codel.c:487:8: error: too many arguments to function 'tcf_block_get' While adding the extack support, the commit missed adding it in the headers when CONFIG_NET_CLS is not defined. Fixes: 8d1a77f9 ("net: sch: api: add extack support in tcf_block_get") Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lorenzo Bianconi 提交于
Introduce peer_offset parameter in order to add the capability to specify two different values for payload offset on tx/rx side. If just offset is provided by userspace use it for rx side as well in order to maintain compatibility with older l2tp versions Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Leitner 提交于
As suggested by Rob Herring [1] rename the previously introduced reset-{,post-}delay-us bindings to the clearer reset-{,de}assert-us [1] https://patchwork.kernel.org/patch/10104905/Signed-off-by: NRichard Leitner <richard.leitner@skidata.com> Reviewed-by: NRob Herring <robh@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 12月, 2017 1 次提交
-
-
由 Leon Romanovsky 提交于
ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with two exceptions. First, it prints stack for multiple hits and not only once as WARN_ONCE() does. Second, the user can disable prints of WARN_ONCE by setting CONFIG_BUG to N. The multiple prints of dump stack are actually not needed, because calls without rtnl lock are programming errors and user can't do anything about them except to complain to the mailing list after first occurrence of such failure. The user who disabled BUG/WARN prints did it explicitly because by default in upstream kernel and distributions this option is enabled. It means that user doesn't want to see prints about missing locks too. This patch replaces open-coded variant in favor of already existing macro and change error prints to be once only. Reviewed-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 12月, 2017 11 次提交
-
-
由 Sven Eckelmann 提交于
The header file is used by different userspace programs to inject packets or to decode sniffed packets. It should therefore be available to them as userspace header. Also other components in the kernel (like the flow dissector) require access to the packet definitions to be able to decode ETH_P_BATMAN ethernet packets. Signed-off-by: NSven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shaohua Li 提交于
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for the function qdisc_create_dflt which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_create_dflt failed. The function qdisc_create_dflt will also call an init callback which can fail by any per-qdisc specific handling. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for the function qdisc_alloc which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_alloc failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for the function tcf_block_get which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why tcf_block_get failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for the function qdisc_get_rtab which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_get_rtab failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for graft callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for block callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for class change callback api. This prepares to handle extack support inside each specific class implementation. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for change callback for qdisc ops structtur to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch adds extack support for init callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 12月, 2017 9 次提交
-
-
由 Shannon Nelson 提交于
There's no reason to define netdev->xfrmdev_ops if the offload facility is not CONFIG'd in. Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Shannon Nelson 提交于
The current XFRM code assumes that we've implemented the xdo_dev_state_free() callback, even if it is meaningless to the driver. This patch adds a check for it before calling, as done in other APIs, to prevent a NULL function pointer kernel crash. Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Alexei Starovoitov 提交于
There were various issues related to the limited size of integers used in the verifier: - `off + size` overflow in __check_map_access() - `off + reg->off` overflow in check_mem_access() - `off + reg->var_off.value` overflow or 32-bit truncation of `reg->var_off.value` in check_mem_access() - 32-bit truncation in check_stack_boundary() Make sure that any integer math cannot overflow by not allowing pointer math with large values. Also reduce the scope of "scalar op scalar" tracking. Fixes: f1174f77 ("bpf/verifier: rework value tracking") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jens Axboe 提交于
A previous change blindly added massive alignment to the call_single_data structure in struct request. This ballooned it in size from 296 to 320 bytes on my setup, for no valid reason at all. Use the unaligned struct __call_single_data variant instead. Fixes: 966a9671 ("smp: Avoid using two cache lines for struct call_single_data") Cc: stable@vger.kernel.org # v4.14 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yafang Shao 提交于
sk_state_load is only used by AF_INET/AF_INET6, so rename it to inet_sk_state_load and move it into inet_sock.h. sk_state_store is removed as it is not used any more. Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yafang Shao 提交于
As sk_state is a common field for struct sock, so the state transition tracepoint should not be a TCP specific feature. Currently it traces all AF_INET state transition, so I rename this tracepoint to inet_sock_set_state tracepoint with some minor changes and move it into trace/events/sock.h. We dont need to create a file named trace/events/inet_sock.h for this one single tracepoint. Two helpers are introduced to trace sk_state transition - void inet_sk_state_store(struct sock *sk, int newstate); - void inet_sk_set_state(struct sock *sk, int state); As trace header should not be included in other header files, so they are defined in sock.c. The protocol such as SCTP maybe compiled as a ko, hence export inet_sk_set_state(). Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steven Rostedt (VMware) 提交于
The TCP trace events (specifically tcp_set_state), maps emums to symbol names via __print_symbolic(). But this only works for reading trace events from the tracefs trace files. If perf or trace-cmd were to record these events, the event format file does not convert the enum names into numbers, and you get something like: __print_symbolic(REC->oldstate, { TCP_ESTABLISHED, "TCP_ESTABLISHED" }, { TCP_SYN_SENT, "TCP_SYN_SENT" }, { TCP_SYN_RECV, "TCP_SYN_RECV" }, { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" }, { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" }, { TCP_TIME_WAIT, "TCP_TIME_WAIT" }, { TCP_CLOSE, "TCP_CLOSE" }, { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" }, { TCP_LAST_ACK, "TCP_LAST_ACK" }, { TCP_LISTEN, "TCP_LISTEN" }, { TCP_CLOSING, "TCP_CLOSING" }, { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" }) Where trace-cmd and perf do not know the values of those enums. Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert the enum strings into their values at system boot. This will allow perf and trace-cmd to see actual numbers and not enums: __print_symbolic(REC->oldstate, { 1, "TCP_ESTABLISHED" }, { 2, "TCP_SYN_SENT" }, { 3, "TCP_SYN_RECV" }, { 4, "TCP_FIN_WAIT1" }, { 5, "TCP_FIN_WAIT2" }, { 6, "TCP_TIME_WAIT" }, { 7, "TCP_CLOSE" }, { 8, "TCP_CLOSE_WAIT" }, { 9, "TCP_LAST_ACK" }, { 10, "TCP_LISTEN" }, { 11, "TCP_CLOSING" }, { 12, "TCP_NEW_SYN_RECV" }) Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shaohua Li 提交于
If a bio is throttled and split after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio to be charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common after arbitrary bio size change. To fix this, we always set the BIO_THROTTLED flag if a bio is throttled. If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. However, cloned bio could be directed to a new disk, keeping the flag be a problem. The observation is we always set new disk for the bio in this case, so we can clear the flag in bio_set_dev(). This issue exists for a long time, arbitrary bio size change just makes it worse, so this should go into stable at least since v4.2. V1-> V2: Not add extra field in bio based on discussion with Tejun Cc: Vivek Goyal <vgoyal@redhat.com> Cc: stable@vger.kernel.org Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jakub Kicinski 提交于
cls_bpf used to take care of tracking what offload state a filter is in, i.e. it would track if offload request succeeded or not. This information would then be used to issue correct requests to the driver, e.g. requests for statistics only on offloaded filters, removing only filters which were offloaded, using add instead of replace if previous filter was not added etc. This tracking of offload state no longer functions with the new callback infrastructure. There could be multiple entities trying to offload the same filter. Throw out all the tracking and corresponding commands and simply pass to the drivers both old and new bpf program. Drivers will have to deal with offload state tracking by themselves. Fixes: 3f7889c4 ("net: sched: cls_bpf: call block callbacks for offload") Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 12月, 2017 6 次提交
-
-
由 Steffen Klassert 提交于
With support of async crypto operations in the GSO codepath we have everything in place to allow GSO for local sockets. This patch enables the GSO codepath. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
This patch implements asynchronous crypto callbacks and a backlog handler that can be used when IPsec is done at layer 2 in the TX path. It also extends the skb validate functions so that we can update the driver transmit return codes based on async crypto operation or to indicate that we queued the packet in a backlog queue. Joint work with: Aviv Heller <avivh@mellanox.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
We change the ESP GSO handlers to only segment the packets. The ESP handling and encryption is defered to validate_xmit_xfrm() where this is done for non GRO packets too. This makes the code more robust and prepares for asynchronous crypto handling. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Moshe Shemesh 提交于
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error. In such failure flow the function will return without releasing all EQs irqs and then pci_free_irq_vectors will fail. Fix by only warn on destroy EQ failure and continue to release other EQs and their irqs. It fixes the following kernel trace: kernel: kernel BUG at drivers/pci/msi.c:352! ... ... kernel: Call Trace: kernel: pci_disable_msix+0xd3/0x100 kernel: pci_free_irq_vectors+0xe/0x20 kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core] Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
由 Eran Ben Elisha 提交于
In mlx5_ifc, struct size was not complete, and thus driver was sending garbage after the last defined field. Fixed it by adding reserved field to complete the struct size. In addition, rename all set_rate_limit to set_pp_rate_limit to be compliant with the Firmware <-> Driver definition. Fixes: 7486216b ("{net,IB}/mlx5: mlx5_ifc updates") Fixes: 1466cc5b ("net/mlx5: Rate limit tables support") Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
由 Saeed Mahameed 提交于
Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo <new affinity> > /proc/irq/<x>/smp_affinity fails with -EIO This reverts commit a435393a. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393a ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jes Sorensen <jsorensen@fb.com> Reported-by: NJes Sorensen <jsorensen@fb.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
- 19 12月, 2017 7 次提交
-
-
由 Michael Chan 提交于
Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware GRO. With this flag, we can now independently turn on or off hardware GRO when GRO is on. Previously, drivers were using NETIF_F_GRO to control hardware GRO and so it cannot be independently turned on or off without affecting GRO. Hardware GRO (just like GRO) guarantees that packets can be re-segmented by TSO/GSO to reconstruct the original packet stream. Logically, GRO_HW should depend on GRO since it a subset, but we will let individual drivers enforce this dependency as they see fit. Since NETIF_F_GRO is not propagated between upper and lower devices, NETIF_F_GRO_HW should follow suit since it is a subset of GRO. In other words, a lower device can independent have GRO/GRO_HW enabled or disabled and no feature propagation is required. This will preserve the current GRO behavior. This can be changed later if we decide to propagate GRO/ GRO_HW/RXCSUM from upper to lower devices. Cc: Ariel Elior <Ariel.Elior@cavium.com> Cc: everest-linux-l2@cavium.com Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tonghao Zhang 提交于
When CONFIG_PROC_FS is disabled, we will not use the prot_inuse counter. This adds an #ifdef to hide the variable definition in that case. This is not a bugfix. But we can save bytes when there are many network namespace. Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: NMartin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tonghao Zhang 提交于
In some case, we want to know how many sockets are in use in different _net_ namespaces. It's a key resource metric. This patch add a member in struct netns_core. This is a counter for socket-inuse in the _net_ namespace. The patch will add/sub counter in the sk_alloc, sk_clone_lock and __sk_free. This patch will not counter the socket created in kernel. It's not very useful for userspace to know how many kernel sockets we created. The main reasons for doing this are that: 1. When linux calls the 'do_exit' for process to exit, the functions 'exit_task_namespaces' and 'exit_task_work' will be called sequentially. 'exit_task_namespaces' may have destroyed the _net_ namespace, but 'sock_release' called in 'exit_task_work' may use the _net_ namespace if we counter the socket-inuse in sock_release. 2. socket and sock are in pair. More important, sock holds the _net_ namespace. We counter the socket-inuse in sock, for avoiding holding _net_ namespace again in socket. It's a easy way to maintain the code. Signed-off-by: NMartin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tonghao Zhang 提交于
Change the member name will make the code more readable. This patch will be used in next patch. Signed-off-by: NMartin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jonathan Corbet 提交于
Commit ca986ad9 (nl80211: allow multiple active scheduled scan requests) removed WIPHY_FLAG_SUPPORTS_SCHED_SCAN but left the kerneldoc description in place, leading to this docs-build warning: ./include/net/cfg80211.h:3278: warning: Excess enum value 'WIPHY_FLAG_SUPPORTS_SCHED_SCAN' description in 'wiphy_flags' Remove the line and gain a bit of peace. Signed-off-by: NJonathan Corbet <corbet@lwn.net> Acked-by: NArend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Jens Axboe 提交于
Commit caa4b024(blk-map: call blk_queue_bounce from blk_rq_append_bio) moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider the fact that the bounced bio becomes invisible to caller since the parameter type is 'struct bio *'. Make it a pointer to a pointer to a bio, so the caller sees the right bio also after a bounce. Fixes: caa4b024 ("blk-map: call blk_queue_bounce from blk_rq_append_bio") Cc: Christoph Hellwig <hch@lst.de> Reported-by: NMichele Ballabio <barra_cuda@katamail.com> (handling failure of blk_rq_append_bio(), only call bio_get() after blk_rq_append_bio() returns OK) Tested-by: NMichele Ballabio <barra_cuda@katamail.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Commit a8821f3f("block: Improvements to bounce-buffer handling") tries to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES, but ignores that passthrough I/O can use blk_queue_bounce() too. Especially, passthrough IO may not be sector-aligned, and the check of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may become true even though the max bvec number doesn't exceed BIO_MAX_PAGES, then cause the bio splitted, and the original passthrough bio is submited to generic_make_request(). This patch fixes this issue by checking if the bio is passthrough IO, and use bio_kmalloc() to allocate the cloned passthrough bio. Cc: NeilBrown <neilb@suse.com> Fixes: a8821f3f("block: Improvements to bounce-buffer handling") Tested-by: NMichele Ballabio <barra_cuda@katamail.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 12月, 2017 3 次提交
-
-
由 Wanpeng Li 提交于
Reported by syzkaller: BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm] Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298 CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0xab/0xe1 print_address_description+0x6b/0x290 kasan_report+0x28a/0x370 write_mmio+0x11e/0x270 [kvm] emulator_read_write_onepage+0x311/0x600 [kvm] emulator_read_write+0xef/0x240 [kvm] emulator_fix_hypercall+0x105/0x150 [kvm] em_hypercall+0x2b/0x80 [kvm] x86_emulate_insn+0x2b1/0x1640 [kvm] x86_emulate_instruction+0x39a/0xb90 [kvm] handle_exception+0x1b4/0x4d0 [kvm_intel] vcpu_enter_guest+0x15a0/0x2640 [kvm] kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall) to the guest memory, however, write_mmio tracepoint always prints 8 bytes through *(u64 *)val since kvm splits the mmio access into 8 bytes. This leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes it by just accessing the bytes which we operate on. Before patch: syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f After patch: syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Tested-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marc Zyngier 提交于
If we don't have a usable GIC, do not try to set the vcpu affinity as this is guaranteed to fail. Reported-by: NAndre Przywara <andre.przywara@arm.com> Reviewed-by: NAndre Przywara <andre.przywara@arm.com> Tested-by: NAndre Przywara <andre.przywara@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Alexei Starovoitov 提交于
Typical JIT does several passes over bpf instructions to compute total size and relative offsets of jumps and calls. With multitple bpf functions calling each other all relative calls will have invalid offsets intially therefore we need to additional last pass over the program to emit calls with correct offsets. For example in case of three bpf functions: main: call foo call bpf_map_lookup exit foo: call bar exit bar: exit We will call bpf_int_jit_compile() indepedently for main(), foo() and bar() x64 JIT typically does 4-5 passes to converge. After these initial passes the image for these 3 functions will be good except call targets, since start addresses of foo() and bar() are unknown when we were JITing main() (note that call bpf_map_lookup will be resolved properly during initial passes). Once start addresses of 3 functions are known we patch call_insn->imm to point to right functions and call bpf_int_jit_compile() again which needs only one pass. Additional safety checks are done to make sure this last pass doesn't produce image that is larger or smaller than previous pass. When constant blinding is on it's applied to all functions at the first pass, since doing it once again at the last pass can change size of the JITed code. Tested on x64 and arm64 hw with JIT on/off, blinding on/off. x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter. All other JITs that support normal BPF_CALL will behave the same way since bpf-to-bpf call is equivalent to bpf-to-kernel call from JITs point of view. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-