1. 23 9月, 2014 1 次提交
  2. 20 9月, 2014 1 次提交
  3. 16 9月, 2014 3 次提交
  4. 06 9月, 2014 2 次提交
  5. 23 8月, 2014 1 次提交
    • Y
      tcp: improve undo on timeout · 989e04c5
      Yuchung Cheng 提交于
      Upon timeout, undo (via both timestamps/Eifel and DSACKs) was
      disabled if any retransmits were still in flight.  The concern was
      perhaps that spurious retransmission sent in a previous recovery
      episode may trigger DSACKs to falsely undo the current recovery.
      
      However, this inadvertently misses undo opportunities (using either
      TCP timestamps or DSACKs) when timeout occurs during a loss episode,
      i.e.  recurring timeouts or timeout during fast recovery. In these
      cases some retransmissions will be in flight but we should allow
      undo. Furthermore, we should only reset undo_marker and undo_retrans
      upon timeout if we are starting a new recovery episode. Finally,
      when we do reset our undo state, we now do so in a manner similar
      to tcp_enter_recovery(), so that we require a DSACK for each of
      the outstsanding retransmissions. This will achieve the original
      goal by requiring that we receive the same number of DSACKs as
      retransmissions.
      
      This patch increases the undo events by 50% on Google servers.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      989e04c5
  6. 15 8月, 2014 2 次提交
    • N
      tcp: fix ssthresh and undo for consecutive short FRTO episodes · 0c9ab092
      Neal Cardwell 提交于
      Fix TCP FRTO logic so that it always notices when snd_una advances,
      indicating that any RTO after that point will be a new and distinct
      loss episode.
      
      Previously there was a very specific sequence that could cause FRTO to
      fail to notice a new loss episode had started:
      
      (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
      (2) receiver ACKs packet 1
      (3) FRTO sends 2 more packets
      (4) RTO timer fires again (should start a new loss episode)
      
      The problem was in step (3) above, where tcp_process_loss() returned
      early (in the spot marked "Step 2.b"), so that it never got to the
      logic to clear icsk_retransmits. Thus icsk_retransmits stayed
      non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
      icsk_retransmits, decide that this RTO is not a new episode, and
      decide not to cut ssthresh and remember the current cwnd and ssthresh
      for undo.
      
      There were two main consequences to the bug that we have
      observed. First, ssthresh was not decreased in step (4). Second, when
      there was a series of such FRTO (1-4) sequences that happened to be
      followed by an FRTO undo, we would restore the cwnd and ssthresh from
      before the entire series started (instead of the cwnd and ssthresh
      from before the most recent RTO). This could result in cwnd and
      ssthresh being restored to values much bigger than the proper values.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c9ab092
    • H
      tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic · a26552af
      Hannes Frederic Sowa 提交于
      tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
      ordering of incoming connections and teardowns without the need to
      hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
      the last measured RTO. To do so, we keep the last seen timestamp in a
      per-host indexed data structure and verify if the incoming timestamp
      in a connection request is strictly greater than the saved one during
      last connection teardown. Thus we can verify later on that no old data
      packets will be accepted by the new connection.
      
      During moving a socket to time-wait state we already verify if timestamps
      where seen on a connection. Only if that was the case we let the
      time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
      will be used. But we don't verify this on incoming SYN packets. If a
      connection teardown was less than TCP_PAWS_MSL seconds in the past we
      cannot guarantee to not accept data packets from an old connection if
      no timestamps are present. We should drop this SYN packet. This patch
      closes this loophole.
      
      Please note, this patch does not make tcp_tw_recycle in any way more
      usable but only adds another safety check:
      Sporadic drops of SYN packets because of reordering in the network or
      in the socket backlog queues can happen. Users behing NAT trying to
      connect to a tcp_tw_recycle enabled server can get caught in blackholes
      and their connection requests may regullary get dropped because hosts
      behind an address translator don't have synchronized tcp timestamp clocks.
      tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
      
      In general, use of tcp_tw_recycle is disadvised.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a26552af
  7. 14 8月, 2014 1 次提交
  8. 06 8月, 2014 2 次提交
    • W
      net-timestamp: ACK timestamp for bytestreams · e1c8a607
      Willem de Bruijn 提交于
      Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
      in the send() call is acknowledged. It implements the feature for TCP.
      
      The timestamp is generated when the TCP socket cumulative ACK is moved
      beyond the tracked seqno for the first time. The feature ignores SACK
      and FACK, because those acknowledge the specific byte, but not
      necessarily the entire contents of the buffer up to that byte.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1c8a607
    • N
      tcp: reduce spurious retransmits due to transient SACK reneging · 5ae344c9
      Neal Cardwell 提交于
      This commit reduces spurious retransmits due to apparent SACK reneging
      by only reacting to SACK reneging that persists for a short delay.
      
      When a sequence space hole at snd_una is filled, some TCP receivers
      send a series of ACKs as they apparently scan their out-of-order queue
      and cumulatively ACK all the packets that have now been consecutiveyly
      received. This is essentially misbehavior B in "Misbehaviors in TCP
      SACK generation" ACM SIGCOMM Computer Communication Review, April
      2011, so we suspect that this is from several common OSes (Windows
      2000, Windows Server 2003, Windows XP). However, this issue has also
      been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
      into spurious retransmissions by lack of timestamps?" from March 2014,
      where the receiver was thought to be a BSD box.
      
      Since snd_una would temporarily be adjacent to a previously SACKed
      range in these scenarios, this receiver behavior triggered the Linux
      SACK reneging code path in the sender. This led the sender to clear
      the SACK scoreboard, enter CA_Loss, and spuriously retransmit
      (potentially) every packet from the entire write queue at line rate
      just a few milliseconds before the ACK for each packet arrives at the
      sender.
      
      To avoid such situations, now when a sender sees apparent reneging it
      does not yet retransmit, but rather adjusts the RTO timer to give the
      receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
      that will restore sanity to the SACK scoreboard. If the reneging
      persists until this RTO then, as before, we clear the SACK scoreboard
      and enter CA_Loss.
      
      A 10ms delay tolerates a receiver sending such a stream of ACKs at
      56Kbit/sec. And to allow for receivers with slower or more congested
      paths, we wait for at least RTT/2.
      
      We validated the resulting max(RTT/2, 10ms) delay formula with a mix
      of North American and South American Google web server traffic, and
      found that for ACKs displaying transient reneging:
      
       (1) 90% of inter-ACK delays were less than 10ms
       (2) 99% of inter-ACK delays were less than RTT/2
      
      In tests on Google web servers this commit reduced reneging events by
      75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
      any measurable impact on latency for user HTTP and SPDY requests.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae344c9
  9. 16 7月, 2014 1 次提交
  10. 08 7月, 2014 1 次提交
    • Y
      tcp: fix false undo corner cases · 6e08d5e3
      Yuchung Cheng 提交于
      The undo code assumes that, upon entering loss recovery, TCP
      1) always retransmit something
      2) the retransmission never fails locally (e.g., qdisc drop)
      
      so undo_marker is set in tcp_enter_recovery() and undo_retrans is
      incremented only when tcp_retransmit_skb() is successful.
      
      When the assumption is broken because TCP's cwnd is too small to
      retransmit or the retransmit fails locally. The next (DUP)ACK
      would incorrectly revert the cwnd and the congestion state in
      tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
      may enter the recovery state. The sender repeatedly enter and
      (incorrectly) exit recovery states if the retransmits continue to
      fail locally while receiving (DUP)ACKs.
      
      The fix is to initialize undo_retrans to -1 and start counting on
      the first retransmission. Always increment undo_retrans even if the
      retransmissions fail locally because they couldn't cause DSACKs to
      undo the cwnd reduction.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e08d5e3
  11. 30 6月, 2014 1 次提交
  12. 28 6月, 2014 1 次提交
  13. 20 6月, 2014 1 次提交
    • N
      tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb · 2cd0d743
      Neal Cardwell 提交于
      If there is an MSS change (or misbehaving receiver) that causes a SACK
      to arrive that covers the end of an skb but is less than one MSS, then
      tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
      the skb ("Round if necessary..."), then chopping all bytes off the skb
      and creating a zero-byte skb in the write queue.
      
      This was visible now because the recently simplified TLP logic in
      bef1909e ("tcp: fixing TLP's FIN recovery") could find that 0-byte
      skb at the end of the write queue, and now that we do not check that
      skb's length we could send it as a TLP probe.
      
      Consider the following example scenario:
      
       mss: 1000
       skb: seq: 0 end_seq: 4000  len: 4000
       SACK: start_seq: 3999 end_seq: 4000
      
      The tcp_match_skb_to_sack() code will compute:
      
       in_sack = false
       pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
       new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
       new_len += mss = 4000
      
      Previously we would find the new_len > skb->len check failing, so we
      would fall through and set pkt_len = new_len = 4000 and chop off
      pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
      afterward in the write queue.
      
      With this new commit, we notice that the new new_len >= skb->len check
      succeeds, so that we return without trying to fragment.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cd0d743
  14. 11 6月, 2014 1 次提交
  15. 03 6月, 2014 1 次提交
  16. 04 5月, 2014 1 次提交
  17. 21 4月, 2014 1 次提交
  18. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  19. 11 3月, 2014 1 次提交
  20. 04 3月, 2014 2 次提交
  21. 27 2月, 2014 1 次提交
    • E
      tcp: switch rtt estimations to usec resolution · 740b0f18
      Eric Dumazet 提交于
      Upcoming congestion controls for TCP require usec resolution for RTT
      estimations. Millisecond resolution is simply not enough these days.
      
      FQ/pacing in DC environments also require this change for finer control
      and removal of bimodal behavior due to the current hack in
      tcp_update_pacing_rate() for 'small rtt'
      
      TCP_CONG_RTT_STAMP is no longer needed.
      
      As Julian Anastasov pointed out, we need to keep user compatibility :
      tcp_metrics used to export RTT and RTTVAR in msec resolution,
      so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
      to use the new attributes if provided by the kernel.
      
      In this example ss command displays a srtt of 32 usecs (10Gbit link)
      
      lpk51:~# ./ss -i dst lpk52
      Netid  State      Recv-Q Send-Q   Local Address:Port       Peer
      Address:Port
      tcp    ESTAB      0      1         10.246.11.51:42959
      10.246.11.52:64614
               cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
      cwnd:10 send
      3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559
      
      Updated iproute2 ip command displays :
      
      lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
      10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
      10.246.11.51
      
      Old binary displays :
      
      lpk51:~# ip tcp_metrics | grep 10.246.11.52
      10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
      10.246.11.51
      
      With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Larry Brakmo <brakmo@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      740b0f18
  22. 07 2月, 2014 1 次提交
    • E
      tcp: remove 1ms offset in srtt computation · 4a5ab4e2
      Eric Dumazet 提交于
      TCP pacing depends on an accurate srtt estimation.
      
      Current srtt estimation is using jiffie resolution,
      and has an artificial offset of at least 1 ms, which can produce
      slowdowns when FQ/pacing is used, especially in DC world,
      where typical rtt is below 1 ms.
      
      We are planning a switch to usec resolution for linux-3.15,
      but in the meantime, this patch removes the 1 ms offset.
      
      All we need is to have tp->srtt minimal value of 1 to differentiate
      the case of srtt being initialized or not, not 8.
      
      The problematic behavior was observed on a 40Gbit testbed,
      where 32 concurrent netperf were reaching 12Gbps of aggregate
      speed, instead of line speed.
      
      This patch also has the effect of reporting more accurate srtt and send
      rates to iproute2 ss command as in :
      
      $ ss -i dst cca2
      Netid  State      Recv-Q Send-Q          Local Address:Port
      Peer Address:Port
      tcp    ESTAB      0      0                10.244.129.1:56984
      10.244.129.2:12865
      	 cubic wscale:6,6 rto:200 rtt:0.25/0.25 ato:40 mss:1448 cwnd:10 send
      463.4Mbps rcv_rtt:1 rcv_space:29200
      tcp    ESTAB      0      390960           10.244.129.1:60247
      10.244.129.2:50204
      	 cubic wscale:6,6 rto:200 rtt:0.875/0.75 mss:1448 cwnd:73 ssthresh:51
      send 966.4Mbps unacked:73 retrans:0/121 rcv_space:29200
      Reported-by: NVytautas Valancius <valas@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a5ab4e2
  23. 30 12月, 2013 1 次提交
  24. 27 12月, 2013 1 次提交
  25. 05 11月, 2013 1 次提交
    • Y
      tcp: properly handle stretch acks in slow start · 9f9843a7
      Yuchung Cheng 提交于
      Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
      regardless the number of packets. Consequently slow start performance
      is highly dependent on the degree of the stretch ACKs caused by
      receiver or network ACK compression mechanisms (e.g., delayed-ACK,
      GRO, etc).  But slow start algorithm is to send twice the amount of
      packets of packets left so it should process a stretch ACK of degree
      N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
      follow up patch will use the remainder of the N (if greater than 1)
      to adjust cwnd in the congestion avoidance phase.
      
      In addition this patch retires the experimental limited slow start
      (LSS) feature. LSS has multiple drawbacks but questionable benefit. The
      fractional cwnd increase in LSS requires a loop in slow start even
      though it's rarely used. Configuring such an increase step via a global
      sysctl on different BDPS seems hard. Finally and most importantly the
      slow start overshoot concern is now better covered by the Hybrid slow
      start (hystart) enabled by default.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9843a7
  26. 28 10月, 2013 3 次提交
  27. 22 10月, 2013 1 次提交
    • N
      tcp: initialize passive-side sk_pacing_rate after 3WHS · 02cf4ebd
      Neal Cardwell 提交于
      For passive TCP connections, upon receiving the ACK that completes the
      3WHS, make sure we set our pacing rate after we get our first RTT
      sample.
      
      On passive TCP connections, when we receive the ACK completing the
      3WHS we do not take an RTT sample in tcp_ack(), but rather in
      tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
      3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.
      
      Originally the initial sk_pacing_rate value was 0, so passive-side
      connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
      made in the first RTT. With a default initial cwnd of 10 packets, this
      happened to be correct for RTTs 5ms or bigger, so it was hard to
      see problems in WAN or emulated WAN testing.
      
      Since 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing"), the
      initial sk_pacing_rate is 0xffffffff. So after that change, passive
      TCP connections were keeping this value (and using large numbers of
      segments per skbuff) until receiving an ACK for data.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02cf4ebd
  28. 18 10月, 2013 1 次提交
  29. 10 10月, 2013 1 次提交
  30. 05 10月, 2013 1 次提交
    • E
      tcp: do not forget FIN in tcp_shifted_skb() · 5e8a402f
      Eric Dumazet 提交于
      Yuchung found following problem :
      
       There are bugs in the SACK processing code, merging part in
       tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
       skbs FIN flag. When a receiver first SACK the FIN sequence, and later
       throw away ofo queue (e.g., sack-reneging), the sender will stop
       retransmitting the FIN flag, and hangs forever.
      
      Following packetdrill test can be used to reproduce the bug.
      
      $ cat sack-merge-bug.pkt
      `sysctl -q net.ipv4.tcp_fack=0`
      
      // Establish a connection and send 10 MSS.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +.000 bind(3, ..., ...) = 0
      +.000 listen(3, 1) = 0
      
      +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.001 < . 1:1(0) ack 1 win 1024
      +.000 accept(3, ..., ...) = 4
      
      +.100 write(4, ..., 12000) = 12000
      +.000 shutdown(4, SHUT_WR) = 0
      +.000 > . 1:10001(10000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257
      +.000 > FP. 10001:12001(2000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
      // SACK reneg
      +.050 < . 1:1(0) ack 12001 win 257
      +0 %{ print "unacked: ",tcpi_unacked }%
      +5 %{ print "" }%
      
      First, a typo inverted left/right of one OR operation, then
      code forgot to advance end_seq if the merged skb carried FIN.
      
      Bug was added in 2.6.29 by commit 832d11c5
      ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e8a402f
  31. 04 10月, 2013 1 次提交
  32. 03 10月, 2013 1 次提交
    • E
      tcp: sndbuf autotuning improvements · 6ae70532
      Eric Dumazet 提交于
      tcp_fixup_sndbuf() is underestimating initial send buffer requirements.
      
      It was not noticed because big GSO packets were escaping the limitation,
      but with smaller TSO packets (or TSO/GSO/SG off), application hits
      sk_sndbuf before having a chance to fill enough packets in socket write
      queue.
      
      - initial cwnd can be bigger than 10 for specific routes
      
      - SKB_TRUESIZE() is a bit under real needs in some cases,
        because of power-of-two rounding in kmalloc()
      
      - Fast Recovery (RFC 5681 3.2) : Cubic needs 70% factor
      
      - Extra cushion (application might react slowly to POLLOUT)
      
      tcp_v4_conn_req_fastopen() needs to call tcp_init_metrics() before
      calling tcp_init_buffer_space()
      
      Then we realize tcp_new_space() should call tcp_fixup_sndbuf()
      instead of duplicating this stuff.
      
      Rename tcp_fixup_sndbuf() to tcp_sndbuf_expand() to be more
      descriptive.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ae70532