1. 03 2月, 2021 1 次提交
    • A
      net: ipv4: Emit notification when fib hardware flags are changed · 680aea08
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel,
      but not necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      It is also possible for a route already installed in hardware to change
      its action and therefore its flags. For example, a host route that is
      trapping packets can be "promoted" to perform decapsulation following
      the installation of an IPinIP/VXLAN tunnel.
      
      Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed. The aim is to provide an indication to user-space
      (e.g., routing daemons) about the state of the route in hardware.
      
      Introduce a sysctl that controls this behavior.
      
      Keep the default value at 0 (i.e., do not emit notifications) for several
      reasons:
      - Multiple RTM_NEWROUTE notification per-route might confuse existing
        routing daemons.
      - Convergence reasons in routing daemons.
      - The extra notifications will negatively impact the insertion rate.
      - Not all users are interested in these notifications.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Acked-by: NRoopa Prabhu <roopa@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      680aea08
  2. 11 9月, 2020 1 次提交
    • W
      tcp: reflect tos value received in SYN to the socket · ac8f1710
      Wei Wang 提交于
      This commit adds a new TCP feature to reflect the tos value received in
      SYN, and send it out on the SYN-ACK, and eventually set the tos value of
      the established socket with this reflected tos value. This provides a
      way to set the traffic class/QoS level for all traffic in the same
      connection to be the same as the incoming SYN request. It could be
      useful in data centers to provide equivalent QoS according to the
      incoming request.
      This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
      default turned off.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac8f1710
  3. 11 8月, 2020 1 次提交
    • J
      tcp: correct read of TFO keys on big endian systems · f19008e6
      Jason Baron 提交于
      When TFO keys are read back on big endian systems either via the global
      sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
      don't match what was written.
      
      For example, on s390x:
      
      # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
      # cat /proc/sys/net/ipv4/tcp_fastopen_key
      02000000-01000000-04000000-03000000
      
      Instead of:
      
      # cat /proc/sys/net/ipv4/tcp_fastopen_key
      00000001-00000002-00000003-00000004
      
      Fix this by converting to the correct endianness on read. This was
      reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
      selftest on s390x, which depends on the read value matching what was
      written. I've confirmed that the test now passes on big and little endian
      systems.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Fixes: 438ac880 ("net: fastopen: robustness and endianness fixes for SipHash")
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-and-tested-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f19008e6
  4. 01 5月, 2020 1 次提交
  5. 29 4月, 2020 1 次提交
    • R
      net: ipv4: add sysctl for nexthop api compatibility mode · 4f80116d
      Roopa Prabhu 提交于
      Current route nexthop API maintains user space compatibility
      with old route API by default. Dumps and netlink notifications
      support both new and old API format. In systems which have
      moved to the new API, this compatibility mode cancels some
      of the performance benefits provided by the new nexthop API.
      
      This patch adds new sysctl nexthop_compat_mode which is on
      by default but provides the ability to turn off compatibility
      mode allowing systems to run entirely with the new routing
      API. Old route API behaviour and support is not modified by this
      sysctl.
      
      Uses a single sysctl to cover both ipv4 and ipv6 following
      other sysctls. Covers dumps and delete notifications as
      suggested by David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f80116d
  6. 27 4月, 2020 1 次提交
  7. 13 3月, 2020 1 次提交
    • K
      tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted. · 4b01a967
      Kuniyuki Iwashima 提交于
      Commit aacd9289 ("tcp: bind() use stronger
      condition for bind_conflict") introduced a restriction to forbid to bind
      SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
      assign ports dispersedly so that we can connect to the same remote host.
      
      The change results in accelerating port depletion so that we fail to bind
      sockets to the same local port even if we want to connect to the different
      remote hosts.
      
      You can reproduce this issue by following instructions below.
      
        1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
        2. set SO_REUSEADDR to two sockets.
        3. bind two sockets to (localhost, 0) and the latter fails.
      
      Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
      the legacy behaviour to enable the SO_REUSEADDR option and make it possible
      to connect to different remote (addr, port) tuples.
      
      This patch allows us to bind SO_REUSEADDR enabled sockets to the same
      (addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all
      ephemeral ports are exhausted. This also allows connect() and listen() to
      share ports in the following way and may break some applications. So the
      ip_autobind_reuse is 0 by default and disables the feature.
      
        1. setsockopt(sk1, SO_REUSEADDR)
        2. setsockopt(sk2, SO_REUSEADDR)
        3. bind(sk1, saddr, 0)
        4. bind(sk2, saddr, 0)
        5. connect(sk1, daddr)
        6. listen(sk2)
      
      If it is set 1, we can fully utilize the 4-tuples, but we should use
      IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible.
      
      The notable thing is that if all sockets bound to the same port have
      both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
      ephemeral port and also do listen().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b01a967
  8. 20 2月, 2020 1 次提交
    • C
      net/ipv4/sysctl: show tcp_{allowed, available}_congestion_control in non-initial netns · 9cb8e048
      Christian Brauner 提交于
      It is currenty possible to switch the TCP congestion control algorithm
      in non-initial network namespaces:
      
      unshare -U --map-root --net --fork --pid --mount-proc
      echo "reno" > /proc/sys/net/ipv4/tcp_congestion_control
      
      works just fine. But currently non-initial network namespaces have no
      way of kowing which congestion algorithms are available or allowed other
      than through trial and error by writing the names of the algorithms into
      the aforementioned file.
      Since we already allow changing the congestion algorithm in non-initial
      network namespaces by exposing the tcp_congestion_control file there is
      no reason to not also expose the
      tcp_{allowed,available}_congestion_control files to non-initial network
      namespaces. After this change a container with a separate network
      namespace will show:
      
      root@f1:~# ls -al /proc/sys/net/ipv4/tcp_* | grep congestion
      -rw-r--r-- 1 root root 0 Feb 19 11:54 /proc/sys/net/ipv4/tcp_allowed_congestion_control
      -r--r--r-- 1 root root 0 Feb 19 11:54 /proc/sys/net/ipv4/tcp_available_congestion_control
      -rw-r--r-- 1 root root 0 Feb 19 11:54 /proc/sys/net/ipv4/tcp_congestion_control
      
      Link: https://github.com/lxc/lxc/issues/3267Reported-by: NHaw Loeung <haw.loeung@canonical.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cb8e048
  9. 10 12月, 2019 1 次提交
    • K
      net-tcp: Disable TCP ssthresh metrics cache by default · 65e6d901
      Kevin(Yudong) Yang 提交于
      This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save"
      that disables TCP ssthresh metrics cache by default. Other parts of TCP
      metrics cache, e.g. rtt, cwnd, remain unchanged.
      
      As modern networks becoming more and more dynamic, TCP metrics cache
      today often causes more harm than benefits. For example, the same IP
      address is often shared by different subscribers behind NAT in residential
      networks. Even if the IP address is not shared by different users,
      caching the slow-start threshold of a previous short flow using loss-based
      congestion control (e.g. cubic) often causes the future longer flows of
      the same network path to exit slow-start prematurely with abysmal
      throughput.
      
      Caching ssthresh is very risky and can lead to terrible performance.
      Therefore it makes sense to make disabling ssthresh caching by
      default and opt-in for specific networks by the administrators.
      This practice also has worked well for several years of deployment with
      CUBIC congestion control at Google.
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NKevin(Yudong) Yang <yyd@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65e6d901
  10. 21 11月, 2019 1 次提交
  11. 19 11月, 2019 1 次提交
  12. 10 8月, 2019 1 次提交
    • J
      tcp: add new tcp_mtu_probe_floor sysctl · c04b79b6
      Josh Hunt 提交于
      The current implementation of TCP MTU probing can considerably
      underestimate the MTU on lossy connections allowing the MSS to get down to
      48. We have found that in almost all of these cases on our networks these
      paths can handle much larger MTUs meaning the connections are being
      artificially limited. Even though TCP MTU probing can raise the MSS back up
      we have seen this not to be the case causing connections to be "stuck" with
      an MSS of 48 when heavy loss is present.
      
      Prior to pushing out this change we could not keep TCP MTU probing enabled
      b/c of the above reasons. Now with a reasonble floor set we've had it
      enabled for the past 6 months.
      
      The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
      administrators the ability to control the floor of MSS probing.
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c04b79b6
  13. 19 7月, 2019 1 次提交
  14. 23 6月, 2019 1 次提交
    • A
      net: fastopen: robustness and endianness fixes for SipHash · 438ac880
      Ard Biesheuvel 提交于
      Some changes to the TCP fastopen code to make it more robust
      against future changes in the choice of key/cookie size, etc.
      
      - Instead of keeping the SipHash key in an untyped u8[] buffer
        and casting it to the right type upon use, use the correct
        type directly. This ensures that the key will appear at the
        correct alignment if we ever change the way these data
        structures are allocated. (Currently, they are only allocated
        via kmalloc so they always appear at the correct alignment)
      
      - Use DIV_ROUND_UP when sizing the u64[] array to hold the
        cookie, so it is always of sufficient size, even if
        TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.
      
      - Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
        function, which is no longer used.
      
      - Add endian swabbing when setting the keys and calculating the hash,
        to ensure that cookie values are the same for a given key and
        source/destination address pair regardless of the endianness of
        the server.
      
      Note that none of these are functional changes wrt the current
      state of the code, with the exception of the swabbing, which only
      affects big endian systems.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      438ac880
  15. 17 6月, 2019 1 次提交
  16. 16 6月, 2019 1 次提交
    • E
      tcp: add tcp_min_snd_mss sysctl · 5f3e2bf0
      Eric Dumazet 提交于
      Some TCP peers announce a very small MSS option in their SYN and/or
      SYN/ACK messages.
      
      This forces the stack to send packets with a very high network/cpu
      overhead.
      
      Linux has enforced a minimal value of 48. Since this value includes
      the size of TCP options, and that the options can consume up to 40
      bytes, this means that each segment can include only 8 bytes of payload.
      
      In some cases, it can be useful to increase the minimal value
      to a saner value.
      
      We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
      reasons.
      
      Note that TCP_MAXSEG socket option enforces a minimal value
      of (TCP_MIN_MSS). David Miller increased this minimal value
      in commit c39508d6 ("tcp: Make TCP_MAXSEG minimum more correct.")
      from 64 to 88.
      
      We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
      
      CVE-2019-11479 -- tcp mss hardcoded to 48
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Suggested-by: NJonathan Looney <jtl@netflix.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f3e2bf0
  17. 15 6月, 2019 3 次提交
    • E
      tcp: add tcp_tx_skb_cache sysctl · 0b7d7f6b
      Eric Dumazet 提交于
      Feng Tang reported a performance regression after introduction
      of per TCP socket tx/rx caches, for TCP over loopback (netperf)
      
      There is high chance the regression is caused by a change on
      how well the 32 KB per-thread page (current->task_frag) can
      be recycled, and lack of pcp caches for order-3 pages.
      
      I could not reproduce the regression myself, cpus all being
      spinning on the mm spinlocks for page allocs/freeing, regardless
      of enabling or disabling the per tcp socket caches.
      
      It seems best to disable the feature by default, and let
      admins enabling it.
      
      MM layer either needs to provide scalable order-3 pages
      allocations, or could attempt a trylock on zone->lock if
      the caller only attempts to get a high-order page and is
      able to fallback to order-0 ones in case of pressure.
      
      Tests run on a 56 cores host (112 hyper threads)
      
      -	35.49%	netperf 		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
         - 35.49% queued_spin_lock_slowpath
      	  - 18.18% get_page_from_freelist
      		 - __alloc_pages_nodemask
      			- 18.18% alloc_pages_current
      				 skb_page_frag_refill
      				 sk_page_frag_refill
      				 tcp_sendmsg_locked
      				 tcp_sendmsg
      				 inet_sendmsg
      				 sock_sendmsg
      				 __sys_sendto
      				 __x64_sys_sendto
      				 do_syscall_64
      				 entry_SYSCALL_64_after_hwframe
      				 __libc_send
      	  + 17.31% __free_pages_ok
      +	31.43%	swapper 		 [kernel.vmlinux]	  [k] intel_idle
      +	 9.12%	netperf 		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
      +	 6.53%	netserver		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
      +	 0.69%	netserver		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
      +	 0.68%	netperf 		 [kernel.vmlinux]	  [k] skb_release_data
      +	 0.52%	netperf 		 [kernel.vmlinux]	  [k] tcp_sendmsg_locked
      	 0.46%	netperf 		 [kernel.vmlinux]	  [k] _raw_spin_lock_irqsave
      
      Fixes: 472c2e07 ("tcp: add one skb cache for tx")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NFeng Tang <feng.tang@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b7d7f6b
    • E
      tcp: add tcp_rx_skb_cache sysctl · ede61ca4
      Eric Dumazet 提交于
      Instead of relying on rps_needed, it is safer to use a separate
      static key, since we do not want to enable TCP rx_skb_cache
      by default. This feature can cause huge increase of memory
      usage on hosts with millions of sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ede61ca4
    • S
      ipv4: Support multipath hashing on inner IP pkts for GRE tunnel · 363887a2
      Stephen Suryaputra 提交于
      Multipath hash policy value of 0 isn't distributing since the outer IP
      dest and src aren't varied eventhough the inner ones are. Since the flow
      is on the inner ones in the case of tunneled traffic, hashing on them is
      desired.
      
      This is done mainly for IP over GRE, hence only tested for that. But
      anything else supported by flow dissection should work.
      
      v2: Use skb_flow_dissect_flow_keys() directly so that other tunneling
          can be supported through flow dissection (per Nikolay Aleksandrov).
      v3: Remove accidental inclusion of ports in the hash keys and clarify
          the documentation (Nikolay Alexandrov).
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      363887a2
  18. 31 5月, 2019 2 次提交
  19. 18 4月, 2019 1 次提交
    • Z
      ipv4: set the tcp_min_rtt_wlen range from 0 to one day · 19fad20d
      ZhangXiaoxu 提交于
      There is a UBSAN report as below:
      UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
      signed integer overflow:
      2147483647 * 1000 cannot be represented in type 'int'
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e3 #1
      Call Trace:
       <IRQ>
       dump_stack+0x8c/0xba
       ubsan_epilogue+0x11/0x60
       handle_overflow+0x12d/0x170
       ? ttwu_do_wakeup+0x21/0x320
       __ubsan_handle_mul_overflow+0x12/0x20
       tcp_ack_update_rtt+0x76c/0x780
       tcp_clean_rtx_queue+0x499/0x14d0
       tcp_ack+0x69e/0x1240
       ? __wake_up_sync_key+0x2c/0x50
       ? update_group_capacity+0x50/0x680
       tcp_rcv_established+0x4e2/0xe10
       tcp_v4_do_rcv+0x22b/0x420
       tcp_v4_rcv+0xfe8/0x1190
       ip_protocol_deliver_rcu+0x36/0x180
       ip_local_deliver+0x15b/0x1a0
       ip_rcv+0xac/0xd0
       __netif_receive_skb_one_core+0x7f/0xb0
       __netif_receive_skb+0x33/0xc0
       netif_receive_skb_internal+0x84/0x1c0
       napi_gro_receive+0x2a0/0x300
       receive_buf+0x3d4/0x2350
       ? detach_buf_split+0x159/0x390
       virtnet_poll+0x198/0x840
       ? reweight_entity+0x243/0x4b0
       net_rx_action+0x25c/0x770
       __do_softirq+0x19b/0x66d
       irq_exit+0x1eb/0x230
       do_IRQ+0x7a/0x150
       common_interrupt+0xf/0xf
       </IRQ>
      
      It can be reproduced by:
        echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
      
      Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19fad20d
  20. 22 3月, 2019 1 次提交
    • D
      ipv4: Allow amount of dirty memory from fib resizing to be controllable · 9ab948a9
      David Ahern 提交于
      fib_trie implementation calls synchronize_rcu when a certain amount of
      pages are dirty from freed entries. The number of pages was determined
      experimentally in 2009 (commit c3059477).
      
      At the current setting, synchronize_rcu is called often -- 51 times in a
      second in one test with an average of an 8 msec delay adding a fib entry.
      The total impact is a lot of slow down modifying the fib. This is seen
      in the output of 'time' - the difference between real time and sys+user.
      For example, using 720,022 single path routes and 'ip -batch'[1]:
      
          $ time ./ip -batch ipv4/routes-1-hops
          real    0m14.214s
          user    0m2.513s
          sys     0m6.783s
      
      So roughly 35% of the actual time to install the routes is from the ip
      command getting scheduled out, most notably due to synchronize_rcu (this
      is observed using 'perf sched timehist').
      
      This patch makes the amount of dirty memory configurable between 64k where
      the synchronize_rcu is called often (small, low end systems that are memory
      sensitive) to 64M where synchronize_rcu is called rarely during a large
      FIB change (for high end systems with lots of memory). The default is 512kB
      which corresponds to the current setting of 128 pages with a 4kB page size.
      
      As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
      in a second blocking for up to 30 msec in a single instance, and a total
      of almost 100 msec across the 4 calls in the second. The trade off is
      allowing FIB entries to consume more memory in a given time window but
      but with much better fib insertion rates (~30% increase in prefixes/sec).
      With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
      file runs in:
      
          $ time ./ip -batch ipv4/routes-1-hops
          real    0m9.692s
          user    0m2.491s
          sys     0m6.769s
      
      So the dead time is reduced to about 1/2 second or <5% of the real time.
      
      [1] 'ip' modified to not request ACK messages which improves route
          insertion times by about 20%
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab948a9
  21. 08 11月, 2018 1 次提交
  22. 27 9月, 2018 1 次提交
  23. 02 8月, 2018 2 次提交
  24. 06 7月, 2018 1 次提交
    • T
      ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns · 70ba5b6d
      Tyler Hicks 提交于
      The low and high values of the net.ipv4.ping_group_range sysctl were
      being silently forced to the default disabled state when a write to the
      sysctl contained GIDs that didn't map to the associated user namespace.
      Confusingly, the sysctl's write operation would return success and then
      a subsequent read of the sysctl would indicate that the low and high
      values are the overflowgid.
      
      This patch changes the behavior by clearly returning an error when the
      sysctl write operation receives a GID range that doesn't map to the
      associated user namespace. In such a situation, the previous value of
      the sysctl is preserved and that range will be returned in a subsequent
      read of the sysctl.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70ba5b6d
  25. 30 6月, 2018 1 次提交
  26. 05 6月, 2018 1 次提交
    • M
      net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization · 79e9fed4
      Maciej Żenczykowski 提交于
      This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
      to an integer.
      
      It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
      while 2 enables timewait socket reuse only for sockets that we can
      prove are loopback connections:
        ie. bound to 'lo' interface or where one of source or destination
        IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.
      
      This enables quicker reuse of ephemeral ports for loopback connections
      - where tcp_tw_reuse is 100% safe from a protocol perspective
      (this assumes no artificially induced packet loss on 'lo').
      
      This also makes estblishing many loopback connections *much* faster
      (allocating ports out of the first half of the ephemeral port range
      is significantly faster, then allocating from the second half)
      
      Without this change in a 32K ephemeral port space my sample program
      (it just establishes and closes [::1]:ephemeral -> [::1]:server_port
      connections in a tight loop) fails after 32765 connections in 24 seconds.
      With it enabled 50000 connections only take 4.7 seconds.
      
      This is particularly problematic for IPv6 where we only have one local
      address and cannot play tricks with varying source IP from 127.0.0.0/8
      pool.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e9fed4
  27. 18 5月, 2018 2 次提交
  28. 28 3月, 2018 1 次提交
  29. 17 3月, 2018 1 次提交
  30. 05 3月, 2018 1 次提交
  31. 13 2月, 2018 1 次提交
  32. 15 11月, 2017 1 次提交
  33. 10 11月, 2017 1 次提交
  34. 03 11月, 2017 1 次提交
  35. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318