- 21 5月, 2016 8 次提交
-
-
由 Tom Herbert 提交于
This patch adds receive path support for IPv6 with fou. - Add address family to fou structure for open sockets. This supports AF_INET and AF_INET6. Lookups for fou ports are performed on both the port number and family. - In fou and gue receive adjust tot_len in IPv4 header or payload_len based on address family. - Allow AF_INET6 in FOU_ATTR_AF netlink attribute. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Create __fou_build_header and __gue_build_header. These implement the protocol generic parts of building the fou and gue header. fou_build_header and gue_build_header implement the IPv4 specific functions and call the __*_build_header functions. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Use helper function to set up UDP tunnel related information for a fou socket. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Consolidate all the ip_tunnel_encap definitions in one spot in the header file. Also, move ip_encap_hlen and ip_tunnel_encap from ip_tunnel.c to ip_tunnels.h so they call be called without a dependency on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
When performing foo-over-UDP, UDP packets are processed by the encapsulation handler which returns another protocol to process. This may result in processing two (or more) protocols in the loop that are marked as INET6_PROTO_FINAL. The actions taken for hitting a final protocol, in particular the skb_postpull_rcsum can only be performed once. This patch set adds a check of a final protocol has been seen. The rules are: - If the final protocol has not been seen any protocol is processed (final and non-final). In the case of a final protocol, the final actions are taken (like the skb_postpull_rcsum) - If a final protocol has been seen (e.g. an encapsulating UDP header) then no further non-final protocols are allowed (e.g. extension headers). For more final protocols the final actions are not taken (e.g. skb_postpull_rcsum). Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
In ip6_input_finish the nexthdr protocol is retrieved from the next header offset that is returned in the cb of the skb. This method does not work for UDP encapsulation that may not even have a concept of a nexthdr field (e.g. FOU). This patch checks for a final protocol (INET6_PROTO_FINAL) when a protocol handler returns > 0. If the protocol is not final then resubmission is performed on nhoff value. If the protocol is final then the nexthdr is taken to be the return value. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and NETIF_F_GSO_IPXIP6. These are used to described IP in IP tunnel and what the outer protocol is. The inner protocol can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT are removed (these are both instances of SKB_GSO_IPXIP4). SKB_GSO_IPXIP6 will be used when support for GSO with IP encapsulation over IPv6 is added. Signed-off-by: NTom Herbert <tom@herbertland.com> Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
In several gso_segment functions there are checks of gso_type against a seemingly arbitrary list of SKB_GSO_* flags. This seems like an attempt to identify unsupported GSO types, but since the stack is the one that set these GSO types in the first place this seems unnecessary to do. If a combination isn't valid in the first place that stack should not allow setting it. This is a code simplication especially for add new GSO types. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
TCP stack can now run from process context. Use read_lock_bh(&sk->sk_callback_lock) variant to restore previous assumption. Fixes: 5413d1ba ("net: do not block BH while processing socket backlog") Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
TCP stack can now run from process context. Use read_lock_bh(&sk->sk_callback_lock) variant to restore previous assumption. Fixes: 5413d1ba ("net: do not block BH while processing socket backlog") Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
skb_splice_bits() returns int, kcm_splice_read() returns ssize_t, both are signed. We may need another patch to make them all ssize_t, but that deserves a separated patch. Fixes: 91687355 ("kcm: Splice support") Reported-by: NDavid Binderman <linuxdev.baldrick@gmail.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2016 11 次提交
-
-
由 Marek Lindner 提交于
This fix prevents nodes to wrongly create a 00:00:00:00:00:00 originator which can potentially interfere with the rest of the neighbor statistics. Fixes: d6f94d91 ("batman-adv: ELP - adding basic infrastructure") Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Linus Lüssing 提交于
Two parallel calls to batadv_neigh_node_new() might race for creating and adding the same neig_node. Fix this by including the check for any already existing, identical neigh_node within the spin-lock. This fixes splats like the following: [ 739.535069] ------------[ cut here ]------------ [ 739.535079] WARNING: CPU: 0 PID: 0 at /usr/src/batman-adv/git/batman-adv/net/batman-adv/bat_iv_ogm.c:1004 batadv_iv_ogm_process_per_outif+0xe3f/0xe60 [batman_adv]() [ 739.535092] too many matching neigh_nodes [ 739.535094] Modules linked in: dm_mod tun ip6table_filter ip6table_mangle ip6table_nat nf_nat_ipv6 ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TCPMSS xt_mark iptable_mangle xt_tcpudp xt_conntrack iptable_filter ip_tables x_tables ip_gre ip_tunnel gre bridge stp llc thermal_sys kvm_intel kvm crct10dif_pclmul crc32_pclmul sha256_ssse3 sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev pcspkr ip6_gre ip6_tunnel tunnel6 batman_adv(O) libcrc32c nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 ext4 crc16 mbcache jbd2 xen_netfront xen_blkfront crc32c_intel [ 739.535177] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 4.2.0-0.bpo.1-amd64 #1 Debian 4.2.6-3~bpo8+2 [ 739.535186] 0000000000000000 ffffffffa013b050 ffffffff81554521 ffff88007d003c18 [ 739.535201] ffffffff8106fa01 0000000000000000 ffff8800047a087a ffff880079c3a000 [ 739.735602] ffff88007b82bf40 ffff88007bc2d1c0 ffffffff8106fa7a ffffffffa013aa8e [ 739.735624] Call Trace: [ 739.735639] <IRQ> [<ffffffff81554521>] ? dump_stack+0x40/0x50 [ 739.735677] [<ffffffff8106fa01>] ? warn_slowpath_common+0x81/0xb0 [ 739.735692] [<ffffffff8106fa7a>] ? warn_slowpath_fmt+0x4a/0x50 [ 739.735715] [<ffffffffa012448f>] ? batadv_iv_ogm_process_per_outif+0xe3f/0xe60 [batman_adv] [ 739.735740] [<ffffffffa0124813>] ? batadv_iv_ogm_receive+0x363/0x380 [batman_adv] [ 739.735762] [<ffffffffa0124813>] ? batadv_iv_ogm_receive+0x363/0x380 [batman_adv] [ 739.735783] [<ffffffff810b0841>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 739.735804] [<ffffffffa012cb39>] ? batadv_batman_skb_recv+0xc9/0x110 [batman_adv] [ 739.735825] [<ffffffff81464891>] ? __netif_receive_skb_core+0x841/0x9a0 [ 739.735838] [<ffffffff810b0841>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 739.735853] [<ffffffff81465681>] ? process_backlog+0xa1/0x140 [ 739.735864] [<ffffffff81464f1a>] ? net_rx_action+0x20a/0x320 [ 739.735878] [<ffffffff81073aa7>] ? __do_softirq+0x107/0x270 [ 739.735891] [<ffffffff81073d82>] ? irq_exit+0x92/0xa0 [ 739.735905] [<ffffffff8137e0d1>] ? xen_evtchn_do_upcall+0x31/0x40 [ 739.735924] [<ffffffff8155b8fe>] ? xen_do_hypervisor_callback+0x1e/0x40 [ 739.735939] <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [ 739.735965] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [ 739.735979] [<ffffffff8100a39c>] ? xen_safe_halt+0xc/0x20 [ 739.735991] [<ffffffff8101da6c>] ? default_idle+0x1c/0xa0 [ 739.736004] [<ffffffff810abf6b>] ? cpu_startup_entry+0x2eb/0x350 [ 739.736019] [<ffffffff81b2af5e>] ? start_kernel+0x480/0x48b [ 739.736032] [<ffffffff81b2d116>] ? xen_start_kernel+0x507/0x511 [ 739.736048] ---[ end trace c106bb901244bc8c ]--- Fixes: f987ed6e ("batman-adv: protect neighbor list with rcu locks") Reported-by: NMartin Weinelt <martin@darmstadt.freifunk.net> Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Sven Eckelmann 提交于
The undefined behavior sanatizer detected an signed integer overflow in a setup with near perfect link quality UBSAN: Undefined behaviour in net/batman-adv/bat_iv_ogm.c:1246:25 signed integer overflow: 8713350 * 255 cannot be represented in type 'int' The problems happens because the calculation of mixed unsigned and signed integers resulted in an integer multiplication. batadv_ogm_packet::tq (u8 255) * tq_own (u8 255) * tq_asym_penalty (int 134; max 255) * tq_iface_penalty (int 255; max 255) The tq_iface_penalty, tq_asym_penalty and inv_asym_penalty can just be changed to unsigned int because they are not expected to become negative. Fixes: c0398768 ("batman-adv: add WiFi penalty") Signed-off-by: NSven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Antonio Quartulli 提交于
When the MAC address of the primary interface is changed, update the originator address in the ELP and OGM skb buffers as well in order to reflect the change. Fixes: d6f94d91 ("batman-adv: ELP - adding basic infrastructure") Reported-by: NMarek Lindner <marek@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Sven Eckelmann 提交于
The function batadv_iv_ogm_orig_add_if allocates new buffers for bcast_own and bcast_own_sum. It is expected that these buffers are unchanged in case either bcast_own or bcast_own_sum couldn't be resized. But the error handling of this function frees the already resized buffer for bcast_own when the allocation of the new bcast_own_sum buffer failed. This will lead to an invalid memory access when some code will try to access bcast_own. Instead the resized new bcast_own buffer has to be kept. This will not lead to problems because the size of the buffer was only increased and therefore no user of the buffer will try to access bytes outside of the new buffer. Fixes: d0015fdd ("batman-adv: provide orig_node routing API") Signed-off-by: NSven Eckelmann <sven@narfation.org> Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Sven Eckelmann 提交于
The functions batadv_neigh_ifinfo_get increase the reference counter of the batadv_neigh_ifinfo. These have to be reduced again when the reference is not used anymore to correctly free the objects. Fixes: 97869060 ("batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API calls") Signed-off-by: NSven Eckelmann <sven@narfation.org> Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Sven Eckelmann 提交于
batadv_neigh_ifinfo_get can return NULL when it cannot find (even when only temporarily) anymore the neigh_ifinfo in the list neigh->ifinfo_list. This has to be checked to avoid kernel Oopses when the ifinfo is dereferenced. This a situation which isn't expected but is already handled by functions like batadv_v_neigh_cmp. The same kind of warning is therefore used before the function returns without dereferencing the pointers. Fixes: 97869060 ("batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API calls") Signed-off-by: NSven Eckelmann <sven@narfation.org> Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Florian Westphal 提交于
batadv_send_skb_to_orig() calls dev_queue_xmit() so we can't use skb->len. Fixes: 95332477 ("batman-adv: network coding - buffer unicast packets before forward") Signed-off-by: NFlorian Westphal <fw@strlen.de> Reviewed-by: NSven Eckelmann <sven@narfation.org> Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch> Signed-off-by: NAntonio Quartulli <a@unstable.cc>
-
由 Jiri Pirko 提交于
The problem is that fib_info->nh is [0] so the struct fib_info allocation size depends on number of nexthops. If we just copy fib_info, we do not copy the nexthops info and driver accesses memory which is not ours. Given the fact that fib4 does not defer operations and therefore it does not need copy, just pass the pointer down to drivers as it was done before. Fixes: 850d0cbc ("switchdev: remove pointers from switchdev objects") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
We saw the following extra refcount release on veth device: kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1 Since we heavily use mirred action to redirect packets to veth, I think this is caused by the following race condition: CPU0: tcf_mirred_release(): (in RCU callback) struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1); CPU1: mirred_device_event(): spin_lock_bh(&mirred_list_lock); list_for_each_entry(m, &mirred_list, tcfm_list) { if (rcu_access_pointer(m->tcfm_dev) == dev) { dev_put(dev); /* Note : no rcu grace period necessary, as * net_device are already rcu protected. */ RCU_INIT_POINTER(m->tcfm_dev, NULL); } } spin_unlock_bh(&mirred_list_lock); CPU0: tcf_mirred_release(): spin_lock_bh(&mirred_list_lock); list_del(&m->tcfm_list); spin_unlock_bh(&mirred_list_lock); if (dev) // <======== Stil refers to the old m->tcfm_dev dev_put(dev); // <======== dev_put() is called on it again The action init code path is good because it is impossible to modify an action that is being removed. So, fix this by moving everything under the spinlock. Fixes: 2ee22a90 ("net_sched: act_mirred: remove spinlock in fast path") Fixes: 6bd00b85 ("act_mirred: fix a race condition on mirred_list") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
The publication field of the old netlink API should contain the publication key and not the publication reference. Fixes: 44a8ae94 (tipc: convert legacy nl name table dump to nl compat) Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2016 15 次提交
-
-
由 Herbert Xu 提交于
When we free cb->skb after a dump, we do it after releasing the lock. This means that a new dump could have started in the time being and we'll end up freeing their skb instead of ours. This patch saves the skb and module before we unlock so we free the right memory. Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: NBaozeng Ding <sploving1@gmail.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
Make sure the socket for which the user is listing publication exists before parsing the socket netlink attributes. Prior to this patch a call without any socket caused a NULL pointer dereference in tipc_nl_publ_dump(). Tested-and-reported-by: NBaozeng Ding <sploving1@gmail.com> Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Acked-by: NJon Maloy <jon.maloy@ericsson.cm> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
memory_usage must be decreased in dequeue_func(), not in fq_codel_dequeue(), otherwise packets dropped by Codel algo are missing this decrease. Also we need to clear memory_usage in fq_codel_reset() Fixes: 95b58430 ("fq_codel: add memory limitation per queue") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Follow-up for 8a3a4c6e ("net: make sch_handle_ingress() drop monitor ready") to also make the egress side drop monitor ready. Also here only TC_ACT_SHOT is a clear indication that something went wrong. Hence don't provide false positives to drop monitors such as 'perf record -e skb:kfree_skb ...'. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Muhammad Falak R Wani 提交于
The function setup_timer combines the initialization of a timer with the initialization of the timer's function and data fields. The mulitiline code for timer initialization is now replaced with function setup_timer. Also, quoting the mod_timer() function comment: -> mod_timer() is a more efficient way to update the expire field of an active timer (if the timer is inactive it will be activated). Use setup_timer() and mod_timer() to setup and arm a timer, making the code compact and aid readablity. Signed-off-by: NMuhammad Falak R Wani <falakreyaz@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This work adds a generic facility for use from eBPF JIT compilers that allows for further hardening of JIT generated images through blinding constants. In response to the original work on BPF JIT spraying published by Keegan McAllister [1], most BPF JITs were changed to make images read-only and start at a randomized offset in the page, where the rest was filled with trap instructions. We have this nowadays in x86, arm, arm64 and s390 JIT compilers. Additionally, later work also made eBPF interpreter images read only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86, arm, arm64 and s390 archs as well currently. This is done by default for mentioned JITs when JITing is enabled. Furthermore, we had a generic and configurable constant blinding facility on our todo for quite some time now to further make spraying harder, and first implementation since around netconf 2016. We found that for systems where untrusted users can load cBPF/eBPF code where JIT is enabled, start offset randomization helps a bit to make jumps into crafted payload harder, but in case where larger programs that cross page boundary are injected, we again have some part of the program opcodes at a page start offset. With improved guessing and more reliable payload injection, chances can increase to jump into such payload. Elena Reshetova recently wrote a test case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which can leave some more room for payloads. Note that for all this, additional bugs in the kernel are still required to make the jump (and of course to guess right, to not jump into a trap) and naturally the JIT must be enabled, which is disabled by default. For helping mitigation, the general idea is to provide an option bpf_jit_harden that admins can tweak along with bpf_jit_enable, so that for cases where JIT should be enabled for performance reasons, the generated image can be further hardened with blinding constants for unpriviledged users (bpf_jit_harden == 1), with trading off performance for these, but not for privileged ones. We also added the option of blinding for all users (bpf_jit_harden == 2), which is quite helpful for testing f.e. with test_bpf.ko. There are no further e.g. hardening levels of bpf_jit_harden switch intended, rationale is to have it dead simple to use as on/off. Since this functionality would need to be duplicated over and over for JIT compilers to use, which are already complex enough, we provide a generic eBPF byte-code level based blinding implementation, which is then just transparently JITed. JIT compilers need to make only a few changes to integrate this facility and can be migrated one by one. This option is for eBPF JITs and will be used in x86, arm64, s390 without too much effort, and soon ppc64 JITs, thus that native eBPF can be blinded as well as cBPF to eBPF migrations, so that both can be covered with a single implementation. The rule for JITs is that bpf_jit_blind_constants() must be called from bpf_int_jit_compile(), and in case blinding is disabled, we follow normally with JITing the passed program. In case blinding is enabled and we fail during the process of blinding itself, we must return with the interpreter. Similarly, in case the JITing process after the blinding failed, we return normally to the interpreter with the non-blinded code. Meaning, interpreter doesn't change in any way and operates on eBPF code as usual. For doing this pre-JIT blinding step, we need to make use of a helper/auxiliary register, here BPF_REG_AX. This is strictly internal to the JIT and not in any way part of the eBPF architecture. Just like in the same way as JITs internally make use of some helper registers when emitting code, only that here the helper register is one abstraction level higher in eBPF bytecode, but nevertheless in JIT phase. That helper register is needed since f.e. manually written program can issue loads to all registers of eBPF architecture. The core concept with the additional register is: blind out all 32 and 64 bit constants by converting BPF_K based instructions into a small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND, and REG <OP> BPF_REG_AX, so actual operation on the target register is translated from BPF_K into BPF_X one that is operating on BPF_REG_AX's content. During rewriting phase when blinding, RND is newly generated via prandom_u32() for each processed instruction. 64 bit loads are split into two 32 bit loads to make translation and patching not too complex. Only basic thing required by JITs is to call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other() pair, and to map BPF_REG_AX into an unused register. Small bpf_jit_disasm extract from [2] when applied to x86 JIT: echo 0 > /proc/sys/net/core/bpf_jit_harden ffffffffa034f5e9 + <x>: [...] 39: mov $0xa8909090,%eax 3e: mov $0xa8909090,%eax 43: mov $0xa8ff3148,%eax 48: mov $0xa89081b4,%eax 4d: mov $0xa8900bb0,%eax 52: mov $0xa810e0c1,%eax 57: mov $0xa8908eb4,%eax 5c: mov $0xa89020b0,%eax [...] echo 1 > /proc/sys/net/core/bpf_jit_harden ffffffffa034f1e5 + <x>: [...] 39: mov $0xe1192563,%r10d 3f: xor $0x4989b5f3,%r10d 46: mov %r10d,%eax 49: mov $0xb8296d93,%r10d 4f: xor $0x10b9fd03,%r10d 56: mov %r10d,%eax 59: mov $0x8c381146,%r10d 5f: xor $0x24c7200e,%r10d 66: mov %r10d,%eax 69: mov $0xeb2a830e,%r10d 6f: xor $0x43ba02ba,%r10d 76: mov %r10d,%eax 79: mov $0xd9730af,%r10d 7f: xor $0xa5073b1f,%r10d 86: mov %r10d,%eax 89: mov $0x9a45662b,%r10d 8f: xor $0x325586ea,%r10d 96: mov %r10d,%eax [...] As can be seen, original constants that carry payload are hidden when enabled, actual operations are transformed from constant-based to register-based ones, making jumps into constants ineffective. Above extract/example uses single BPF load instruction over and over, but of course all instructions with constants are blinded. Performance wise, JIT with blinding performs a bit slower than just JIT and faster than interpreter case. This is expected, since we still get all the performance benefits from JITing and in normal use-cases not every single instruction needs to be blinded. Summing up all 296 test cases averaged over multiple runs from test_bpf.ko suite, interpreter was 55% slower than JIT only and JIT with blinding was 8% slower than JIT only. Since there are also some extremes in the test suite, I expect for ordinary workloads that the performance for the JIT with blinding case is even closer to JIT only case, f.e. nmap test case from suite has averaged timings in ns 29 (JIT), 35 (+ blinding), and 151 (interpreter). BPF test suite, seccomp test suite, eBPF sample code and various bigger networking eBPF programs have been tested with this and were running fine. For testing purposes, I also adapted interpreter and redirected blinded eBPF image to interpreter and also here all tests pass. [1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html [2] https://github.com/01org/jit-spray-poc-for-ksp/ [3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NElena Reshetova <elena.reshetova@intel.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Since the blinding is strictly only called from inside eBPF JITs, we need to change signatures for bpf_int_jit_compile() and bpf_prog_select_runtime() first in order to prepare that the eBPF program we're dealing with can change underneath. Hence, for call sites, we need to return the latest prog. No functional change in this patch. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs. Current cBPF ones: # git grep -n HAVE_CBPF_JIT arch/ arch/arm/Kconfig:44: select HAVE_CBPF_JIT arch/mips/Kconfig:18: select HAVE_CBPF_JIT if !CPU_MICROMIPS arch/powerpc/Kconfig:129: select HAVE_CBPF_JIT arch/sparc/Kconfig:35: select HAVE_CBPF_JIT Current eBPF ones: # git grep -n HAVE_EBPF_JIT arch/ arch/arm64/Kconfig:61: select HAVE_EBPF_JIT arch/s390/Kconfig:126: select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES arch/x86/Kconfig:94: select HAVE_EBPF_JIT if X86_64 Later code also needs this facility to check for eBPF JITs. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Besides others, remove redundant comments where the code is self documenting enough, and properly indent various bpf_verifier_ops and bpf_prog_type_list declarations. Moreover, remove two exports that actually have no module user. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_hdr() is slightly more expensive than using skb->data in contexts where we know they point to the same byte. In receive path, tcp_v4_rcv() and tcp_v6_rcv() are in this situation, as tcp header has not been pulled yet. In output path, the same can be said when we just pushed the tcp header in the skb, in tcp_transmit_skb() and tcp_make_synack() Also factorize the two checks for tcb->tcp_flags & TCPHDR_SYN in tcp_transmit_skb() and pass tcp header pointer to tcp_ecn_send(), so that compiler can further optimize and avoid a reload. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
__sock_cmsg_send() might return different error codes, not only -EINVAL. Fixes: 24025c46 ("ipv4: process socket-level control messages in IPv4") Fixes: ad1e46a8 ("ipv6: process socket-level control messages in IPv6") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
Having multiple loadable modules with the same name cannot work with modprobe, and having both net/qrtr/smd.ko and drivers/soc/qcom/smd.ko results in a (somewhat cryptic) build error: ERROR: "qcom_smd_driver_unregister" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_driver_register" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_set_drvdata" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_send" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_get_drvdata" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_driver_unregister" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_driver_register" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_set_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_send" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_get_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! Also, the qrtr driver uses the SMD interface and has a Kconfig dependency, but also allows for compile-testing when SMD is disabled. However, if with QCOM_SMD=m and COMPILE_TEST=y we can end up with QRTR_SMD=y and that fails with a related link error. The changes the dependency so we can still compile-test the driver but not have it built-in if SMD is a module, to avoid running in the broken configuration, and changes the Makefile to provide the driver under a different module name. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Fixes: bdabad3e ("net: Add Qualcomm IPC router") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Introduce a new command in ndo_setup_tc() for hardware offloaded filters, to call the NIC driver, and make it update the statistics. This will be done before dumping the filter and its statistics. Signed-off-by: NAmir Vadai <amirva@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Implement the stats_update callback that will be called by NIC drivers for hardware offloaded filters. Signed-off-by: NAmir Vadai <amirva@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Samudrala, Sridhar 提交于
On devices that support TC U32 offloads, this flag enables a filter to be added only to HW. skip-sw and skip-hw are mutually exclusive flags. By default without any flags, the filter is added to both HW and SW, but no error checks are done in case of failure to add to HW. With skip-sw, failure to add to HW is treated as an error. Here is a sample script that adds 2 filters, one with skip-sw and the other with skip-hw flag. # add ingress qdisc tc qdisc add dev p4p1 ingress # enable hw tc offload. ethtool -K p4p1 hw-tc-offload on # add u32 filter with skip-sw flag. tc filter add dev p4p1 parent ffff: protocol ip prio 99 \ handle 800:0:1 u32 ht 800: flowid 800:1 \ skip-sw \ match ip src 192.168.1.0/24 \ action drop # add u32 filter with skip-hw flag. tc filter add dev p4p1 parent ffff: protocol ip prio 99 \ handle 800:0:2 u32 ht 800: flowid 800:2 \ skip-hw \ match ip src 192.168.2.0/24 \ action drop Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 5月, 2016 3 次提交
-
-
由 Florian Fainelli 提交于
Switchdev has been around for quite a while now, putting "EXPERIMENTAL" in the description is no longer accurate, drop it. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Currently, when creating or updating a route, no check is performed in both ipv4 and ipv6 code to the hoplimit value. The caller can i.e. set hoplimit to 256, and when such route will be used, packets will be sent with hoplimit/ttl equal to 0. This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4 ipv6 route code, substituting any value greater than 255 with 255. This is consistent with what is currently done for ADVMSS and MTU in the ipv4 code. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501fa ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-