1. 17 11月, 2016 5 次提交
    • X
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long 提交于
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fda702f
    • E
      netpoll: more efficient locking · 89c4b442
      Eric Dumazet 提交于
      Callers of netpoll_poll_lock() own NAPI_STATE_SCHED
      
      Callers of netpoll_poll_unlock() have BH blocked between
      the NAPI_STATE_SCHED being cleared and poll_lock is released.
      
      We can avoid the spinlock which has no contention, and use cmpxchg()
      on poll_owner which we need to set anyway.
      
      This removes a possible lockdep violation after the cited commit,
      since sk_busy_loop() re-enables BH before calling busy_poll_stop()
      
      Fixes: 217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89c4b442
    • E
      net: busy-poll: return busypolling status to drivers · 364b6055
      Eric Dumazet 提交于
      NAPI drivers use napi_complete_done() or napi_complete() when
      they drained RX ring and right before re-enabling device interrupts.
      
      In busy polling, we can avoid interrupts being delivered since
      we are polling RX ring in a controlled loop.
      
      Drivers can chose to use napi_complete_done() return value
      to reduce interrupts overhead while busy polling is active.
      
      This is optional, legacy drivers should work fine even
      if not updated.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364b6055
    • E
      net: busy-poll: allow preemption in sk_busy_loop() · 217f6974
      Eric Dumazet 提交于
      After commit 4cd13c21 ("softirq: Let ksoftirqd do its job"),
      sk_busy_loop() needs a bit of care :
      softirqs might be delayed since we do not allow preemption yet.
      
      This patch adds preemptiom points in sk_busy_loop(),
      and makes sure no unnecessary cache line dirtying
      or atomic operations are done while looping.
      
      A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL
      
      This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
      so that sk_busy_loop() does not have to grab it again.
      
      Similarly, netpoll_poll_lock() is done one time.
      
      This gives about 10 to 20 % improvement in various busy polling
      tests, especially when many threads are busy polling in
      configurations with large number of NIC queues.
      
      This should allow experimenting with bigger delays without
      hurting overall latencies.
      
      Tested:
       On a 40Gb mlx4 NIC, 32 RX/TX queues.
      
       echo 70 >/proc/sys/net/core/busy_read
       for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done
      
          Before:      After:
       1:   90072   92819
       2:  157289  184007
       3:  235772  213504
       4:  344074  357513
       5:  394755  458267
       6:  461151  487819
       7:  549116  625963
       8:  544423  716219
       9:  720460  738446
      10:  794686  837612
      11:  915998  923960
      12:  937507  925107
      13: 1019677  971506
      14: 1046831 1113650
      15: 1114154 1148902
      16: 1105221 1179263
      17: 1266552 1299585
      18: 1258454 1383817
      19: 1341453 1312194
      20: 1363557 1488487
      21: 1387979 1501004
      22: 1417552 1601683
      23: 1550049 1642002
      24: 1568876 1601915
      25: 1560239 1683607
      26: 1640207 1745211
      27: 1706540 1723574
      28: 1638518 1722036
      29: 1734309 1757447
      30: 1782007 1855436
      31: 1724806 1888539
      32: 1717716 1944297
      33: 1778716 1869118
      34: 1805738 1983466
      35: 1815694 2020758
      36: 1893059 2035632
      37: 1843406 2034653
      38: 1888830 2086580
      39: 1972827 2143567
      40: 1877729 2181851
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      217f6974
    • D
      ipv6: sr: add option to control lwtunnel support · 46738b13
      David Lebrun 提交于
      This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable
      support of encapsulation with the lightweight tunnels. When this option
      is enabled, CONFIG_LWTUNNEL is automatically selected.
      
      Fix commit 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      
      Without a proper option to control lwtunnel support for SR-IPv6, if
      CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence
      of seg6_iptunnel_init() failure with EOPNOTSUPP:
      
      NET: Registered protocol family 10
      IPv6: Attempt to unregister permanent protocol 6
      IPv6: Attempt to unregister permanent protocol 136
      IPv6: Attempt to unregister permanent protocol 17
      NET: Unregistered protocol family 10
      
      Tested (compiling, booting, and loading ipv6 module when relevant)
      with possible combinations of CONFIG_IPV6={y,m,n},
      CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}.
      Reported-by: NLorenzo Colitti <lorenzo@google.com>
      Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46738b13
  2. 16 11月, 2016 3 次提交
  3. 15 11月, 2016 2 次提交
  4. 14 11月, 2016 3 次提交
    • J
      netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test · eb1a6bdc
      Julia Lawall 提交于
      Since commit 7926dbfa ("netfilter: don't use
      mutex_lock_interruptible()"), the function xt_find_table_lock can only
      return NULL on an error.  Simplify the call sites and update the
      comment before the function.
      
      The semantic patch that change the code is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression t,e;
      @@
      
      t = \(xt_find_table_lock(...)\|
            try_then_request_module(xt_find_table_lock(...),...)\)
      ... when != t=e
      - ! IS_ERR_OR_NULL(t)
      + t
      
      @@
      expression t,e;
      @@
      
      t = \(xt_find_table_lock(...)\|
            try_then_request_module(xt_find_table_lock(...),...)\)
      ... when != t=e
      - IS_ERR_OR_NULL(t)
      + !t
      
      @@
      expression t,e,e1;
      @@
      
      t = \(xt_find_table_lock(...)\|
            try_then_request_module(xt_find_table_lock(...),...)\)
      ... when != t=e
      ?- t ? PTR_ERR(t) : e1
      + e1
      ... when any
      
      // </smpl>
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb1a6bdc
    • E
      tcp: take care of truncations done by sk_filter() · ac6e7800
      Eric Dumazet 提交于
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMarco Grassi <marco.gra@gmail.com>
      Reported-by: NVladis Dronov <vdronov@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac6e7800
    • S
      ipv4: use new_gw for redirect neigh lookup · 969447f2
      Stephen Suryaputra Lin 提交于
      In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
      and then the state of the neigh for the new_gw is checked. If the state
      isn't valid then the redirected route is deleted. This behavior is
      maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
      is assigned to peer->redirect_learned.a4 before calling
      ipv4_neigh_lookup().
      
      After commit 5943634f ("ipv4: Maintain redirect and PMTU info in
      struct rtable again."), ipv4_neigh_lookup() is performed without the
      rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
      isn't zero, the function uses it as the key. The neigh is most likely
      valid since the old_gw is the one that sends the ICMP redirect message.
      Then the new_gw is assigned to fib_nh_exception. The problem is: the
      new_gw ARP may never gets resolved and the traffic is blackholed.
      
      So, use the new_gw for neigh lookup.
      
      Changes from v1:
       - use __ipv4_neigh_lookup instead (per Eric Dumazet).
      
      Fixes: 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
      Signed-off-by: NStephen Suryaputra Lin <ssurya@ieee.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      969447f2
  5. 13 11月, 2016 10 次提交
  6. 11 11月, 2016 4 次提交
  7. 10 11月, 2016 13 次提交
新手
引导
客服 返回
顶部