- 01 9月, 2015 11 次提交
-
-
由 Nikolay Aleksandrov 提交于
Fix a memory leak in the mpls netns init function in case of failure. If register_net_sysctl fails then we need to free the ctl_table. Fixes: 7720c01f ("mpls: Add a sysctl to control the size of the mpls label table") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
TOS is another key aspect of the lookup passed to fib_validate_source. Add it to the tracepoint. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
To fix build errors: kernel/built-in.o: In function `bpf_trace_printk': bpf_trace.c:(.text+0x11a254): undefined reference to `strncpy_from_unsafe' kernel/built-in.o: In function `fetch_memory_string': trace_kprobe.c:(.text+0x11acf8): undefined reference to `strncpy_from_unsafe' move strncpy_from_unsafe() next to probe_kernel_read/write() which use the same memory access style. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Reported-by: NGuenter Roeck <linux@roeck-us.net> Fixes: 1a6877b9 ("lib: introduce strncpy_from_unsafe()") Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Daniel Borkmann says: ==================== tcp: receive-side per route dctcp handling Original cover letter: Currently, the following case doesn't use DCTCP, even if it should: - responder has f.e. cubic as system wide default - 'ip route congctl dctcp $src' was set Then, DCTCP is NOT used if a DCTCP sender attempts to connect from a host in the $src range: ECT(0) is set, but listen_sk is not dctcp, so we fail the INET_ECN_is_not_ect sanity check. We also have to examine the dst used for the SYN/ACK reply to make this case work. In order to minimize additional cost, store the 'ecn is must have' information is the dst_features field. The set targets -next instead of -net since this doesn't seem to be a serious bug and to give the change more soak time until it hits linus tree. v1 -> v2: - Addressed Dave's feedback, not exposing any bits to user space - Added patch 3 to reject incorrect configurations - Rest as is, rebased and retested ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Reduce the identation a bit, there's no need to artificically have it increased. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
fib_create_info() is already quite large, so before adding more code to the metrics section move that to a helper, similar to ip6_convert_metrics. Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Philip Downey 提交于
Document the addition of a new sysctl variable which controls the generation of IGMP reports for link local multicast groups in the 224.0.0.X range. IGMP reports for local multicast groups can now be optionally inhibited by setting the value to zero e.g.: echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports To retain backwards compatibility the previous behaviour is retained by default on system boot or reverted by setting the value back to non-zero. Signed-off-by: NPhilip Downey <pdowney@brocade.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Acked-by: NThomas Graf <tgraf@suug.ch> Reviewed-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Madalin Bucur 提交于
Address remaining issue after 80ec1927. Signed-off-by: NMadalin Bucur <madalin.bucur@freescale.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2015 15 次提交
-
-
由 David S. Miller 提交于
net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64': >> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function) v = *(((u64 *)bhptr) + offt); ^ net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in net/ipv4/af_inet.c: In function 'snmp_fold_field64': >> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function) res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ >> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field' res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ net/ipv4/af_inet.c:1455:5: note: declared here u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt) ^ Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ken-ichirou MATSUZAWA 提交于
Poll() returns immediately after setting the kernel current frame (ring->head) to SKIP from user space even though there is no new frame. And in a case of all frames is VALID, user space program unintensionally sets (only) kernel current frame to UNUSED, then calls poll(), it will not return immediately even though there are VALID frames. To avoid situations like above, I think we need to scan all frames to find VALID frames at poll() like netlink_alloc_skb(), netlink_forward_ring() finding an UNUSED frame at skb allocation. Signed-off-by: NKen-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Aleksey Makarov says: ==================== net: thunderx: New features and fixes v2: - The unused affinity_mask field of the structure cmp_queue has been deleted. (thanks to David Miller) - The unneeded initializers have been dropped. (thanks to Alexey Klimov) - The commit message "net: thunderx: Rework interrupt handling" has been fixed. (thanks to Alexey Klimov) ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
Support for setting VF's corresponding BGX LMAC in internal loopback mode. This mode can be used for verifying basic HW functionality such as packet I/O, RX checksum validation, CQ/RBDR interrupts, stats e.t.c. Useful when DUT has no external network connectivity. 'loopback' mode can be enabled or disabled via ethtool. Note: This feature is not supported when no of VFs enabled are morethan no of physical interfaces i.e active BGX LMACs Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
This patch adds support for handling multiple qsets assigned to a single VF. There by increasing no of queues from earlier 8 to max no of CPUs in the system i.e 48 queues on a single node and 96 on dual node system. User doesn't have option to assign which Qsets/VFs to be merged. Upon request from VF, PF assigns next free Qsets as secondary qsets. To maintain current behavior no of queues is kept to 8 by default which can be increased via ethtool. If user wants to unbind NICVF driver from a secondary Qset then it should be done after tearing down primary VF's interface. Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NRobert Richter <rrichter@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
Rework interrupt handler to avoid checking IRQ affinity of CQ interrupts. Now separate handlers are registered for each IRQ including RBDR. Register interrupt handlers for only those which are being used. Add nicvf_dump_intr_status() and use it in irq handlers. Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
This patch configures HW to strip 802.1Q header if found in a receiving packet. The stripped VLAN ID and TCI information is passed on to software via CQE_RX. Also sets netdev's 'vlan_features' so that other HW offload features can be used for tagged packets. This offload feature can be enabled or disabled via ethtool. Network stack normally ignores RPS for 802.1Q packets and hence low throughput. With this offload enabled throughput for tagged packets will be almost same as normal packets. Note: This patch doesn't enable HW VLAN insertion for transmit packets. Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
Adding support for receive hashing HW offload by using RSS_ALG and RSS_TAG fields of CQE_RX descriptor. Also removed dependency on minimum receive queue count to configure RSS so that hash is always generated. This hash is used by RPS logic to distribute flows across multiple CPUs. Offload can be disabled via ethtool. Signed-off-by: NRobert Richter <rrichter@cavium.com> Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
Use the nicvf_send_msg_to_pf() function in the mailbox code. Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NRobert Richter <rrichter@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sunil Goutham 提交于
Added ethtool support to dump receive packet error statistics reported in CQE. Also made some small fixes Signed-off-by: NSunil Goutham <sgoutham@cavium.com> Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Aleksey Makarov 提交于
The liquidio and thunder drivers have different maintainers. Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Raghavendra K T says: ==================== Optimize the snmp stat aggregation for large cpus While creating 1000 containers, perf is showing lot of time spent in snmp_fold_field on a large cpu system. The current patch tries to improve by reordering the statistics gathering. Please note that similar overhead was also reported while creating veth pairs https://lkml.org/lkml/2013/3/19/556 Changes in V4: - remove 'item' variable and use IPSTATS_MIB_MAX to avoid sparse warning (Eric) also remove 'item' parameter (Joe) - add missing memset of padding. Changes in V3: - use memset to initialize temp buffer in leaf function. (David) - use memcpy to copy the buffer data to stat instead of unalign_pu (Joe) - Move buffer definition to leaf function __snmp6_fill_stats64() (Eric) - Changes in V2: - Allocate the stat calculation buffer in stack. (Eric) Setup: 160 cpu (20 core) baremetal powerpc system with 1TB memory 1000 docker containers was created with command docker run -itd ubuntu:15.04 /bin/bash in loop observation: Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) perf data showed, creating veth interfaces resulting in the below code path was taking more time. rtnl_fill_ifinfo -> inet6_fill_link_af -> inet6_fill_ifla6_attrs -> snmp_fold_field proposed idea: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data to of an item (iteratively for around 36 items). The patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Performance of docker creation improved by around more than 2x after the patch. before the patch: ================ 3f45ba571a42e925c4ec4aaee0e48d7610a9ed82a4c931f83324d41822cf6617 real 0m6.836s user 0m0.095s sys 0m0.011s perf record -a docker run -itd ubuntu:15.04 /bin/bash ======================================================= 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock 1.37% docker docker [.] backtrace_qsort 1.31% docker docker [.] strings.FieldsFunc cache-misses: 2.7% after the patch: ============= 9178273e9df399c8290b6c196e4aef9273be2876225f63b14a60cf97eacfafb5 real 0m3.249s user 0m0.088s sys 0m0.020s perf record -a docker run -itd ubuntu:15.04 /bin/bash ======================================================= 10.57% docker docker [.] scanblock 8.37% swapper [kernel.kallsyms] [k] snooze_loop 6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field 6.67% docker [kernel.kallsyms] [k] veth_stats_one 3.96% docker docker [.] runtime_MSpan_Sweep 2.47% docker docker [.] strings.FieldsFunc cache-misses: 1.41 % Please let me know if you have suggestions/comments. Thanks Eric, Joe and David for the comments. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Raghavendra K T 提交于
Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field. reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data of an item (iteratively for around 36 items). idea: This patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Docker creation got faster by more than 2x after the patch. Result: Before After Docker creation time 6.836s 3.25s cache miss 2.7% 1.41% perf before: 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock perf after: 10.57% docker docker [.] scanblock 8.37% swapper [kernel.kallsyms] [k] snooze_loop 6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field 6.67% docker [kernel.kallsyms] [k] veth_stats_one changes/ideas suggested: Using buffer in stack (Eric), Usage of memset (David), Using memcpy in place of unaligned_put (Joe). Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Raghavendra K T 提交于
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2015 14 次提交
-
-
由 David S. Miller 提交于
Pravin B Shelar says: ==================== openvswitch: Cleanup post vport conversion. After converting all vport to netdev implmentations there is no need for some of vport functionality. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
This structure is not used anymore. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
Since all vport types are now backed by netdev, we can directly use netdev stats. Following patch removes redundant stat from vport. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
tun info is passed using skb-dst pointer. Now we have converted all vports to netdev based implementation so Now we can remove redundant pointer to tun-info from OVS_CB. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
Remove unused get_name() function pointer from vport ops. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesse Gross 提交于
Geneve can benefit from GRO at the device level in a manner similar to other tunnels, especially as hardware offloads are still emerging. After this patch, aggregated frames are seen on the tunnel interface. Single stream throughput nearly doubles in ideal circumstances (on old hardware). Signed-off-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
When an error occurs skipping IPv6 extension headers retain the already parsed IP protocol and IPv6 addresses in the flow. Also assume that the packet is not a fragment in the absence of information to the contrary; that is always use the frag_off value set by ipv6_skip_exthdr(). This allows matching on the IP protocol and IPv6 addresses of packets with malformed extension headers. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-08-28 One more bunch of Bluetooth patches for 4.3: - Crash fix for hci_bcm driver - Enhancements to hci_intel driver (e.g. baudrate configuration) - Fix for SCO link type after multiple connect attempts - Cleanups & minor fixes in a few other places Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tony Lindgren 提交于
The interrupt handler may not be available when smsc911x probes if the interrupt handler is a GPIO controller for example. Let's fix that by adding handling for -EPROBE_DEFER. Cc: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: NTony Lindgren <tony@atomide.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Jiri Benc says: ==================== tunnels: fix incorrect IPv4/v6 headers interpretation With tunneling, it is currently possible to get an IPv6 header and interpret it as an IPv4 header, or to interpret an IPv6 address as an IPv4 address (and vice versa). This leads to things like sending packets to incorrect address, IPv6 flow label being interpreted as IP packet length, etc. Fix several places where this can happen. Most of this is net-next only. The third patch affects net, too, but it doesn't seem there's anything in user space that sets the attribute at all currently, thus net-next is fine. Changelog: v2: fixed geneve after incorrect rebase on top of Pravin's patches ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
By default (subject to the sysctl settings), IPv6 sockets listen also for IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in packets received through an IPv6 socket. In addition, it's currently not possible to have both IPv4 and IPv6 vxlan tunnel on the same port (unless bindv6only sysctl is enabled), as it's not possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's no way to specify both IPv4 and IPv6 remote/group IP addresses. Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not done globally in udp_tunnel, as l2tp and tipc seems to work okay when receiving IPv4 packets on IPv6 socket and people may rely on this behavior. The other tunnels (geneve and fou) do not support IPv6. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
fou does not really support IPv6 encapsulation. After an UDP socket is created in fou_create, the encap_rcv callback is set either to fou_udp_recv or to gue_udp_recv. Both of those unconditionally assume that the received packet has an IPv4 header and access the data at network_header as it was an IPv4 header. This leads to IPv6 flow label being interpreted as IP packet length, etc. Disallow fou tunnel to be configured as IPv6 until real IPv6 support is added to fou. CC: Tom Herbert <tom@herbertland.com> Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
There's currently nothing preventing directing packets with IPv6 encapsulation data to IPv4 tunnels (and vice versa). If this happens, IPv6 addresses are incorrectly interpreted as IPv4 ones. Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this in ip_tunnel_info. Reject packets at appropriate places if they are supposed to be encapsulated into an incompatible protocol. Signed-off-by: NJiri Benc <jbenc@redhat.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NThomas Graf <tgraf@suug.ch> Acked-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: NJiri Benc <jbenc@redhat.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NThomas Graf <tgraf@suug.ch> Acked-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-