- 24 8月, 2014 2 次提交
-
-
由 Jon Paul Maloy 提交于
The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net由 David S. Miller 提交于
Pulling to get some TIPC fixes that a net-next series depends upon. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 8月, 2014 38 次提交
-
-
由 Yuchung Cheng 提交于
Upon timeout, undo (via both timestamps/Eifel and DSACKs) was disabled if any retransmits were still in flight. The concern was perhaps that spurious retransmission sent in a previous recovery episode may trigger DSACKs to falsely undo the current recovery. However, this inadvertently misses undo opportunities (using either TCP timestamps or DSACKs) when timeout occurs during a loss episode, i.e. recurring timeouts or timeout during fast recovery. In these cases some retransmissions will be in flight but we should allow undo. Furthermore, we should only reset undo_marker and undo_retrans upon timeout if we are starting a new recovery episode. Finally, when we do reset our undo state, we now do so in a manner similar to tcp_enter_recovery(), so that we require a DSACK for each of the outstsanding retransmissions. This will achieve the original goal by requiring that we receive the same number of DSACKs as retransmissions. This patch increases the undo events by 50% on Google servers. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sergei Shtylyov 提交于
The bare register numbers are used despite <uapi/linux/mdio.h> has MDIO_DEVS[12] #define'd for those. Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As a followup to commit 676d2369 ("net: Fix use after free by removing length arg from sk_data_ready callbacks"), we can remove some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gerhard Stenzel 提交于
The first initializer in the following union vxlan_addr ipa = { .sin.sin_addr.s_addr = tip, .sa.sa_family = AF_INET, }; is optimised away by the compiler, due to the second initializer, therefore initialising .sin.sin_addr.s_addr always to 0. This results in netlink messages indicating a L3 miss never contain the missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions. The problem affects user space programs relying on an IP address being sent as part of a netlink message indicating a L3 miss. Changing .sa.sa_family = AF_INET, to .sin.sin_family = AF_INET, fixes the problem. Signed-off-by: NGerhard Stenzel <gerhard.stenzel@de.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://gitorious.org/linux-can/linux-can-next由 David S. Miller 提交于
Marc Kleine-Budde says: ==================== pull-request: can-next 2014-08-20 this is a pull request of 10 patches for net-next/master. There is one patch by Wolfram Sang to clean up the build system. Two patches by Stefan Agner that add vf610 support to the flexcan driver. Dong Aisheng add support for bosch's m_can core, which is found in the new freescale ARM SoCs. Sergei Shtylyov improves the rcar_can driver by supporting all input clocks and adding device tree support. The next patch is a small cleanup for the bit rate calculation function by Lad, Prabhakar. And finally a patch by Himangi Saraogi, which converts the mcp251x driver to use dmam_alloc_coherent. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
Merge tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm fix from Thierry Reding: "Just one bugfix for the PWM lookup table code that would cause a PWM channel to be set to the wrong period and polarity for non-perfect matches" * tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: Fix period and polarity in pwm_get() for non-perfect matches
-
由 Michal Kazior 提交于
The new_ctx pointer is set only for non-chanctx drivers. This yielded a crash for chanctx-based drivers during channel switch finalization: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211] Use an adequate chanctx pointer to fix this. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMichal Kazior <michal.kazior@tieto.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net由 Linus Torvalds 提交于
Pull networking fixes from David Miller: "Here are some bug fixes that have piled up during ksummit/linuxcon. 1) Fix endian problems in ibmveth, from Anton Blanchard. 2) IPV6 routing code does GFP_KERNEL allocation in atomic, fix from Benjamin Block. 3) SCTP association fixes from Daniel Borkmann. 4) When multiple VLAN headers are present we have to make sure the second and subsequent ones are pullable in the SKB otherwise we blindly dereference garbage. From Jiri Benc. 5) The argument adjustment of the signature of hlist_add_after*() introduced a regression in the batman-adv code, fix from Sven Eckelmann. 6) Fix TX hang handling to avoid a panic in i40e, from Anjali Singhai Jain. 7) PTP flag test is inverted in i40e driver, from Jesse Brandeburg. 8) ATM LEC driver needs to hold RTNL mutex over MTU changes, from Chas Williams. 9) Truncate packets larger then the TPACKET_V3 format configured buffers, otherwise we overwrite past the end of said buffers. From Eric Dumazet. 10) Fix endianness bugs in qlcnic firmware handling, from Rajesh Borundia and Shahed Shaikh. 11) CXGB4 sometimes doesn't get all of the TX completion events it should resulting in SKBs getting stuck in the TX queue, from Hariprasad Shenai. 12) When the FEC chip's PTP clock is disabled, you can't access the register. Add necessary checks to avoid the resulting hang, from Fugang Duan" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits) drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef net: sctp: fix suboptimal edge-case on non-active active/retrans path selection net: sctp: spare unnecessary comparison in sctp_trans_elect_best net: ethernet: broadcom: bnx2x: Remove redundant #ifdef ibmveth: Fix endian issues with rx_no_buffer statistic net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings() openvswitch: fix panic with multiple vlan headers net: ipv6: fib: don't sleep inside atomic lock net: fec: ptp: avoid register access when ipg clock is disabled cxgb4: Free completed tx skbs promptly cxgb4: Fix race condition in cleanup sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe bnx2x: Revert UNDI flushing mechanism qlcnic: Fix endianess issue in firmware load from file operation qlcnic: Fix endianess issue in FW dump template header qlcnic: Fix flash access interface to application MAINTAINERS: Add section for MRF24J40 IEEE 802.15.4 radio driver macvlan: Allow setting multicast filter on all macvlan types packet: handle too big packets for PACKET_V3 MAINTAINERS: add entry for ec_bhf driver ...
-
由 Christian Riesch 提交于
Event timestamp status messages have a variable length, ranging from 1 to 5 words (16 bit words). The current code however requires a minimum message length of sizeof(*phy_txts). In most cases this condition is fulfilled due to padding bytes. However, if several events are signaled in a single message, padding bytes may not be present. For short event timestamp status messages, the length check will fail, and the event timestamp will be dropped. Signed-off-by: NChristian Riesch <christian.riesch@omicron.at> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ley Foon Tan 提交于
This patch adds fix_mac_speed() support for Altera socfpga Ethernet controller. Emac splitter is a soft IP core in FPGA system that converts GMII interface from Synopsys mac to RGMII/SGMII interface. This splitter core is an optional IP if user would like to use RGMII/SGMII interface in their system. Software needs to update a register in splitter core when there is speed change. Signed-off-by: NLey Foon Tan <lftan@altera.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chun-Hao Lin 提交于
RTL8168H is Realtek PCIe Gigabit Ethernet controller. RTL8107E is Realtek PCIe Fast Ethernet controller. This patch add support for these two chips. Signed-off-by: NChun-Hao Lin <hau@realtek.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Userspace needs to be notified if one changes some option. Signed-off-by: NJiri Pirko <jiri@resnulli.us> Acked-by: NVeaceslav Falico <vfalico@gmail.com> Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Yuval Mintz says: ==================== bnx2x: Start utilizing 7.10.51 This series will enable bnx2x to start utlizing its 7.10.51 FW. In addition, it will also add timestamping support, as well as a couple of routine semantic cleanups. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ariel Elior 提交于
This is mostly a semantic change which modifies the code parsing and printing of FW asserts. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
Prevent dereference of pointer in case it's NULL. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
Trying to disable sriov when VFs are assigned may lead to all kinds of problems. This patch unifies the call in the driver to pci_disable_sriov() and prevents them if some of the PF's child VFs are marked as assigned. [Notice this is a bad scenario either way; User should not reach a point where the OS tries to disable SRIOV when a VF is assigned - but currently there's no way of preventing the user from doing so, and the ill-effect for the driver is smaller this way] Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
It's possible there's a bad chip configuration which will result with PCIe IOV capabilities, but with no available interrupts for VFs. In such case, we want to gracefully prevent the PF from initializing its IOV capabilities rather than encounter difficulties further along the way. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
The bnx2x panic dump spills a lot of information from the driver's fastpath, but may be called while some of the fastpath is uninitialized. This patch verifies that pointers are already allocated before dereferencing them to prevent possible kernel panics. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
This patch does several semantic things: - Fixing typos. - Removing unnecessary prints. - Removing unused functions and definitions. - Change 'strange' usage of boolean variables. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kalderon 提交于
This adds a PHC to the bnx2x driver. Driver supports timestamping send/receive PTP packets, as well as adjusting the on-chip clock. The driver has been tested with linuxptp project. Signed-off-by: NMichal Kalderon <Michal.Kalderon@qlogic.com> Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
- (L2) In some multi-function configurations, inter-PF and inter-VF Tx switching is incorrectly enabled. - (L2) Wrong assert code in FLR final cleanup in case it is sent not after FLR. - (L2) Chip may stall in very rare cases under heavy traffic with FW GRO enabled. - (L2) VF malicious notification error fixes. - (L2) Default gre tunnel to IPGRE which allows proper RSS for IPGRE packets, L2GRE traffic will reach single queue. - (FCoE) Fix data being placed in wrong buffer when corrupt FCoE frame is received. - (FCoE) Burst of FIP packets with destination MAC of ALL-FCF_MACs causes FCoE traffic to stop. Signed-off-by: NDmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Himangi Saraogi 提交于
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2,E3; @@ - jiffies - E1 >= (E2*E3) + time_after_eq(jiffies, E1+E2*E3) Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Himangi Saraogi 提交于
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ - (jiffies - E1) >= E2 + time_after_eq(jiffies, E1+E2) Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Himangi Saraogi 提交于
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ - jiffies - E1 < E2 + time_before(jiffies, E1+E2) Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Himangi Saraogi 提交于
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ ( - (jiffies - E1) < E2 + time_before(jiffies, E1+E2) ) Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andreea-Cristina Bernat 提交于
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andreea-Cristina Bernat 提交于
The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andreea-Cristina Bernat 提交于
The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andreea-Cristina Bernat 提交于
This "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes this replacement. The following Coccinelle semantic patch was used for solving it: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andreea-Cristina Bernat 提交于
The "rcu_dereference()" calls are used directly in conditions. Since their return values are never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacements. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Acked-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sébastien Barré 提交于
Commit 7a9bc9b8 ("ipv4: Elide fib_validate_source() completely when possible.") introduced a short-circuit to avoid calling fib_validate_source when not needed. That change took rp_filter into account, but not accept_local. This resulted in a change of behaviour: with rp_filter and accept_local off, incoming packets with a local address in the source field should be dropped. Here is how to reproduce the change pre/post 7a9bc9b8 commit: -configure the same IPv4 address on hosts A and B. -try to send an ARP request from B to A. -The ARP request will be dropped before that commit, but accepted and answered after that commit. This adds a check for ACCEPT_LOCAL, to maintain full fib validation in case it is 0. We also leave __fib_validate_source() earlier when possible, based on the same check as fib_validate_source(), once the accept_local stuff is verified. Cc: Gregory Detal <gregory.detal@uclouvain.be> Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: NSébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Himangi Saraogi 提交于
This patch introduces the use of the function usb_endpoint_num. The Coccinelle semantic patch that makes these changes is as follows: @@ struct usb_endpoint_descriptor *epd; @@ - (epd->bEndpointAddress & \(USB_ENDPOINT_NUMBER_MASK\|0x0f\)) + usb_endpoint_num(epd) Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Acked-by: NTilman Schmidt <tilman@imap.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Himangi Saraogi 提交于
This patch introduces the use of the function usb_endpoint_num. The Coccinelle semantic patch that makes these changes is as follows: @@ struct usb_endpoint_descriptor *epd; @@ - (epd->bEndpointAddress & \(USB_ENDPOINT_NUMBER_MASK\|0x0f\)) + usb_endpoint_num(epd) Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Acked-by: NTilman Schmidt <tilman@imap.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haiyang Zhang 提交于
When the buffer is too small for a packet from VMBus, a bigger buffer will be allocated in netvsc_channel_cb() and retry reading the packet from VMBus. Increasing this buffer size will reduce the retry overhead. Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com> Reviewed-by: NDexuan Cui <decui@microsoft.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rasmus Villemoes 提交于
Test for definedness of the macro which is actually defined (the change is hard to see: it is s/SSS/SSA/). Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
In SCTP, selection of active (T.ACT) and retransmission (T.RET) transports is being done whenever transport control operations (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport(). Commits 4c47af4d ("net: sctp: rework multihoming retransmission path selection to rfc4960") and a7288c4d ("net: sctp: improve sctp_select_active_and_retran_path selection") have both improved it towards a more fine-grained and optimal path selection. Currently, the selection algorithm for T.ACT and T.RET is as follows: 1) Elect the two most recently used ACTIVE transports T1, T2 for T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used 2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set T.ACT<-T.PRI and T.RET<-T1 3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1 4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is still slightly suboptimal as it can lead to the following scenario: Setup: <A> <B> T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22) net.sctp.rto_min = 1000 net.sctp.path_max_retrans = 2 net.sctp.pf_retrans = 0 net.sctp.hb_interval = 1000 T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to link flapping). Here, the first time transmission is sent over PF path T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks are sent over the INACTIVE path T1 (T.PRI), which is not good. After the patch, it's choosing better transports in both cases by modifying step 4): 4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.ACT_new This will still select a best possible path in PF if available (which can also include T.PRI/T.RET), and set both T.ACT/T.RET to it. In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just for a very short while before going back ACTIVE, it will guarantee that this path will be reselected for T.ACT/T.RET since T3 (PF) is not available. Previously, this was not possible, as we would only select between T.PRI and T.RET, and a possible T3 would be NULL due to the fact that we have just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE and would select a suboptimal path when T.PRI/T.RET have worse properties. In the case that T.ACT_old permanently went to INACTIVE during this transition and there's no PF path available, plus T.PRI and T.RET are INACTIVE as well, we would now camp on T.ACT_old, but if everything is being INACTIVE there's really not much we can do except hoping for a successful HB to bring one of the transports back up again and, thus cause a new selection through sctp_assoc_control_transport(). Now both tests work fine: Case 1: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(ACTIVE) T.ACT, T.RET T2 S(PF) 3. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 5. T1 S(PF) T.ACT, T.RET T2 S(INACTIVE) [ 5.1 T1 S(INACTIVE) T.ACT, T.RET T2 S(INACTIVE) ] 6. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Case 2: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(PF) T2 S(ACTIVE) T.ACT, T.RET 3. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 5. T1 S(INACTIVE) T2 S(PF) T.ACT, T.RET [ 5.1 T1 S(INACTIVE) T2 S(INACTIVE) T.ACT, T.RET ] 6. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-