- 01 7月, 2013 1 次提交
-
-
由 Florian Westphal 提交于
The common case is that TCP/IP checksums have already been verified, e.g. by hardware (rx checksum offload), or conntrack. Userspace can use this flag to determine when the checksum has not been validated yet. If the flag is set, this doesn't necessarily mean that the packet has an invalid checksum, e.g. if NIC doesn't support rx checksum. Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can infer that IP/TCP checksum has already been validated if either the SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED flag is unset. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 26 6月, 2013 8 次提交
-
-
由 JunweiZhang 提交于
no real problem is fixed, just save a few bytes in net_namespace structure. Signed-off-by: NJunweiZhang <junwei.zhang@6wind.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 JunweiZhang 提交于
ip_vs.h is not necessary for sysctl_binary.c. prepare for the next patch to avoid compile issue. Signed-off-by: NJunweiZhang <junwei.zhang@6wind.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Julian Anastasov 提交于
Add sync_persist_mode flag to reduce sync traffic by syncing only persistent templates. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Alexander Frolkin 提交于
By default the SH scheduler rejects connections that are hashed onto a realserver of weight 0. This patch adds a flag to make SH choose a different realserver in this case, instead of rejecting the connection. The patch also adds a flag to make SH include the source port (TCP, UDP, SCTP) in the hash as well as the source address. This basically allows for deterministic round-robin load balancing (i.e., where any director in a cluster of directors with identical config will send the same packet the same way). The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options can be set per service. They are set using a new option to ipvsadm. Signed-off-by: NAlexander Frolkin <avf@eldamar.org.uk> Acked-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Julian Anastasov 提交于
Drop SCTP connections under load (dropentry context) depending on the protocol state, just like for TCP: INIT conns are dropped immediately, established are dropped randomly while connections in progress or shutdown are skipped. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Julian Anastasov 提交于
Convert the SCTP state table, so that it is more readable. Change the states to be according to the diagram in RFC 2960 and add more states suitable for middle box. Still, such change in states adds incompatibility if systems in sync setup include this change and others do not include it. With this change we also have proper transitions in INPUT-ONLY mode (DR/TUN) where we see packets only from client. Now we should not switch to 10-second CLOSED state at a time when we should stay in ESTABLISHED state. The short names for states are because we have 16-char space in ipvsadm and 11-char limit for the connection list format. It is a sequence of the TCP implementation where the longest state name is ESTABLISHED. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Alexander Frolkin 提交于
This adds support for sloppy TCP and SCTP modes to IPVS. When enabled (sysctls net.ipv4.vs.sloppy_tcp and net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any packet, not just a TCP SYN (or SCTP INIT). This allows connections to fail over from one IPVS director to another mid-flight. Signed-off-by: NAlexander Frolkin <avf@eldamar.org.uk> Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Julian Anastasov 提交于
Before now the schedulers needed access only to IP addresses and it was easy to get them from skb by using ip_vs_fill_iph_addr_only. New changes for the SH scheduler will need the protocol and ports which is difficult to get from skb for the IPv6 case. As we have all the data in the iph structure, to avoid the same slow lookups provide the iph to schedulers. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Acked-by: NHans Schillstrom <hans@schillstrom.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
- 21 6月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
xt_socket module can be a nice replacement to conntrack module in some cases (SYN filtering for example) But it lacks the ability to match the 3rd packet of TCP handshake (ACK coming from the client). Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism. The wildcard is the legacy socket match behavior, that ignores LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent) iptables -I INPUT -p tcp --syn -j SYN_CHAIN iptables -I INPUT -m socket --nowildcard -j ACCEPT Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 20 6月, 2013 30 次提交
-
-
由 Florian Westphal 提交于
When loose tracking is enabled (default), non-syn packets cause creation of new conntracks in established state with default timeout for established state (5 days). This causes the table to fill up with UNREPLIED when the 'new ack' packet happened to be the last-ack of a previous, already timed-out connection. Consider: A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255 B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123 <61 second pause> C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123 D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255 B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout, C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout. Use UNACK timeout (5 minutes) instead to get rid of these entries sooner when in ESTABLISHED state without having seen traffic in both directions. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Daniel Borkmann 提交于
These are the only calls under net/ that do not check nla_parse_nested() for its error code, but simply continue execution. If parsing of netlink attributes fails, we should return with an error instead of continuing. In nearly all of these calls we have a policy attached, that is being type verified during nla_parse_nested(), which we would miss checking for otherwise. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Joe Perches 提交于
This typedef is unnecessary and should just be removed. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
This typedef is unnecessary and should just be removed. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rami Rosen 提交于
This patch removes an empty ifdef from inet_frag_intern() in net/ipv4/inet_fragment.c. commit b67bfe0d (hlist: drop the node parameter from iterators) removed hlist from net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command, which is now empty. Signed-off-by: NRami Rosen <ramirose@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
htb_sched structures are big, and source of false sharing on SMP. Every time a packet is queued or dequeue, many cache lines must be touched because structures are not lay out properly. By carefully splitting htb_sched in two parts, and define sub structures to increase data locality, we can improve performance dramatically on SMP. New htb_prio structure can also be used in htb_class to increase data locality. I got 26 % performance increase on a 24 threads machine, with 200 concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
In previous discussions, I tried to find some reasonable heuristics for delayed ACK, however this seems not possible, according to Eric: "ACKS might also be delayed because of bidirectional traffic, and is more controlled by the application response time. TCP stack can not easily estimate it." "ACK can be incredibly useful to recover from losses in a short time. The vast majority of TCP sessions are small lived, and we send one ACK per received segment anyway at beginning or retransmits to let the sender smoothly increase its cwnd, so an auto-tuning facility wont help them that much." and according to David: "ACKs are the only information we have to detect loss. And, for the same reasons that TCP VEGAS is fundamentally broken, we cannot measure the pipe or some other receiver-side-visible piece of information to determine when it's "safe" to stretch ACK. And even if it's "safe", we should not do it so that losses are accurately detected and we don't spuriously retransmit. The only way to know when the bandwidth increases is to "test" it, by sending more and more packets until drops happen. That's why all successful congestion control algorithms must operate on explicited tested pieces of information. Similarly, it's not really possible to universally know if it's safe to stretch ACK or not." It still makes sense to enable or disable quick ack mode like what TCP_QUICK_ACK does. Similar to TCP_QUICK_ACK option, but for people who can't modify the source code and still wants to control TCP delayed ACK behavior. As David suggested, this should belong to per-path scope, since different pathes may want different behaviors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jones 提交于
Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yijing Wang 提交于
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init() in init path. So we can use pdev->pm_cap instead of using pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code. Signed-off-by: NYijing Wang <wangyijing@huawei.com> Cc: Michael Chan <mchan@broadcom.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yijing Wang 提交于
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init() in init path. So we can use pdev->pm_cap instead of using pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code. Signed-off-by: NYijing Wang <wangyijing@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yijing Wang 提交于
Pci_enable_device() will set device power state to D0, so it's no need to do it again in bnx2x_init_dev(). Also remove redundant PM Cap find code, because pci core has been saved the pci device pm cap value. Signed-off-by: NYijing Wang <wangyijing@huawei.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: NYuval Mintz <yuvalmin@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES. I don't think anything should select NETDEVICES, so make it a dependency. It also doesn't need to select or depend on ETHERNET, which has nothing to do with the Ethernet library functions. BPCTL selects MII, which depends on NETDEVICES. But everything in the drivers/staging/silicom directory is related to net devices, so make NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant dependencies on NET. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
This has no dependency on any of the drivers under NET_CORE. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
All drivers that select MII also need to select NET_CORE because MII depends on it. This is a bit ridiculous because NET_CORE is just a menu option that doesn't enable any code by itself. There is also no need for it to be a visible option, since its users all select it. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Weiping Pan 提交于
Signed-off-by: NWeiping Pan <wpan@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Veaceslav Falico 提交于
Also, cleanup bond_alb_handle_active_change() from 2 identical ifs. Signed-off-by: NVeaceslav Falico <vfalico@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sathya Perla 提交于
be_find_vfs() is no longer needed as the common PCI calls provide the same functionality. Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
The use of this attribute has been added in 32b8a8e5 (sit: add IPv4 over IPv4 support). It is optional, by default proto is IPPROTO_IPV6. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff)) while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a TCP_SKB_MIN_TRUESIZE. Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is SKB_TRUESIZE(2048) after commit f07d960d ("tcp: avoid frag allocation for small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF as 2048 can hit a small regression for some applications setting to low SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for skb->head. The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that would appear otherwise. Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
thresh and interval are global resources, only init net can change them. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/ directory to the un-init_net, but we can still use cmd such as "ip ntable change name arp_cache locktime 129" to change the locktime of default neigh_parms. This patch disallows the un-init_net to find out the neigh_table.parms. So the un-init_net will failed to influence the init_net. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
neigh_table.parms always exist and is initialized,kmemdup can use it to create new neigh_parms, actually lookup_neigh_parms here will return neigh_table.parms too. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
Check next packet availability by validating that HW has finished CQE placement. This saves latency of another dma transaction performed to update SB indexes. Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
Adds ndo_ll_poll method and locking for FPs between LL and the napi. When receiving a packet we use skb_mark_ll to record the napi it came from. Add each napi to the napi_hash right after netif_napi_add(). Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Add basic support for LLS. Signed-off-by: NAmir Vadai <amirv@mellanox.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Pravin B Shelar says: ==================== Following patch series adds support for gre tunneling. First six patches extend kernel gre and ip_tunnel modules api so that there is more code sharing between gre modules and ovs. Rest of patches adds ovs tunneling infrastructre and gre protocol vport. V2 fixes two patches according to comments from Jesse. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
Add gre vport implementation. Most of gre protocol processing is pushed to gre module. It make use of gre demultiplexer therefore it can co-exist with linux device based gre tunnels. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Acked-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
Following patch adds start offset for sw_flow-key, so that we can skip tunneling information in key for non-tunnel flows. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Acked-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action needs extra space on action list, for now increase max actions list limit. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Acked-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-