1. 01 4月, 2018 10 次提交
    • E
      inet: frags: change inet_frags_init_net() return value · 787bea77
      Eric Dumazet 提交于
      We will soon initialize one rhashtable per struct netns_frags
      in inet_frags_init_net().
      
      This patch changes the return value to eventually propagate an
      error.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      787bea77
    • W
      vlan: vlan_hw_filter_capable() can be static · eeb0a2a5
      Wei Yongjun 提交于
      Fixes the following sparse warning:
      
      net/8021q/vlan_core.c:168:6: warning:
       symbol 'vlan_hw_filter_capable' was not declared. Should it be static?
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeb0a2a5
    • K
      net: Do not take net_rwsem in __rtnl_link_unregister() · 554873e5
      Kirill Tkhai 提交于
      This function calls call_netdevice_notifier(), which also
      may take net_rwsem. So, we can't use net_rwsem here.
      
      This patch makes callers of this functions take pernet_ops_rwsem,
      like register_netdevice_notifier() does. This will protect
      the modifications of net_namespace_list, and allows notifiers
      to take it (they won't have to care about context).
      
      Since __rtnl_link_unregister() is used on module load
      and unload (which are not frequent operations), this looks
      for me better, than make all call_netdevice_notifier()
      always executing in "protected net_namespace_list" context.
      
      Also, this fixes the problem we had a deal in 328fbe74
      "Close race between {un, }register_netdevice_notifier and ...",
      and guarantees __rtnl_link_unregister() does not skip
      exitting net.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      554873e5
    • K
      net: Remove net_rwsem from {, un}register_netdevice_notifier() · fc1dd369
      Kirill Tkhai 提交于
      These functions take net_rwsem, while wireless_nlevent_flush()
      also takes it. But down_read() can't be taken recursive,
      because of rw_semaphore design, which prevents it to be occupied
      by only readers forever.
      
      Since we take pernet_ops_rwsem in {,un}register_netdevice_notifier(),
      net list can't change, so these down_read()/up_read() can be removed.
      
      Fixes: f0b07bb1 "net: Introduce net_rwsem to protect net_namespace_list"
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc1dd369
    • J
      tipc: avoid possible string overflow · 7494cfa6
      Jon Maloy 提交于
      gcc points out that the combined length of the fixed-length inputs to
      l->name is larger than the destination buffer size:
      
      net/tipc/link.c: In function 'tipc_link_create':
      net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
      into a region of size between 26 and 58 [-Werror=format-overflow=]
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
      (assuming 75) into a destination of size 60
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      A detailed analysis reveals that the theoretical maximum length of
      a link name is:
      max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
      16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
      Since we also need space for a trailing zero we now set MAX_LINK_NAME
      to 68.
      
      Just to be on the safe side we also replace the sprintf() call with
      snprintf().
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address
      hash values")
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7494cfa6
    • J
      tipc: permit overlapping service ranges in name table · 37922ea4
      Jon Maloy 提交于
      With the new RB tree structure for service ranges it becomes possible to
      solve an old problem; - we can now allow overlapping service ranges in
      the table.
      
      When inserting a new service range to the tree, we use 'lower' as primary
      key, and when necessary 'upper' as secondary key.
      
      Since there may now be multiple service ranges matching an indicated
      'lower' value, we must also add the 'upper' value to the functions
      used for removing publications, so that the correct, corresponding
      range item can be found.
      
      These changes guarantee that a well-formed publication/withdrawal item
      from a peer node never will be rejected, and make it possible to
      eliminate the problematic backlog functionality we currently have for
      handling such cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37922ea4
    • J
      tipc: refactor name table translate function · f20889f7
      Jon Maloy 提交于
      The function tipc_nametbl_translate() function is ugly and hard to
      follow. This can be improved somewhat by introducing a stack variable
      for holding the publication list to be used and re-ordering the if-
      clauses for selection of algorithm.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f20889f7
    • J
      tipc: replace name table service range array with rb tree · 218527fe
      Jon Maloy 提交于
      The current design of the binding table has an unnecessary memory
      consuming and complex data structure. It aggregates the service range
      items into an array, which is expanded by a factor two every time it
      becomes too small to hold a new item. Furthermore, the arrays never
      shrink when the number of ranges diminishes.
      
      We now replace this array with an RB tree that is holding the range
      items as tree nodes, each range directly holding a list of bindings.
      
      This, along with a few name changes, improves both readability and
      volume of the code, as well as reducing memory consumption and hopefully
      improving cache hit rate.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      218527fe
    • N
      net: bridge: disable bridge MTU auto tuning if it was set manually · 804b854d
      Nikolay Aleksandrov 提交于
      As Roopa noted today the biggest source of problems when configuring
      bridge and ports is that the bridge MTU keeps changing automatically on
      port events (add/del/changemtu). That leads to inconsistent behaviour
      and network config software needs to chase the MTU and fix it on each
      such event. Let's improve on that situation and allow for the user to
      set any MTU within ETH_MIN/MAX limits, but once manually configured it
      is the user's responsibility to keep it correct afterwards.
      
      In case the MTU isn't manually set - the behaviour reverts to the
      previous and the bridge follows the minimum MTU.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      804b854d
    • N
      net: bridge: set min MTU on port events and allow user to set max · f40aa233
      Nikolay Aleksandrov 提交于
      Recently the bridge was changed to automatically set maximum MTU on port
      events (add/del/changemtu) when vlan filtering is enabled, but that
      actually changes behaviour in a way which breaks some setups and can lead
      to packet drops. In order to still allow that maximum to be set while being
      compatible, we add the ability for the user to tune the bridge MTU up to
      the maximum when vlan filtering is enabled, but that has to be done
      explicitly and all port events (add/del/changemtu) lead to resetting that
      MTU to the minimum as before.
      Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40aa233
  2. 31 3月, 2018 13 次提交
    • D
      rxrpc: Fix leak of rxrpc_peer objects · 17226f12
      David Howells 提交于
      When a new client call is requested, an rxrpc_conn_parameters struct object
      is passed in with a bunch of parameters set, such as the local endpoint to
      use.  A pointer to the target peer record is also placed in there by
      rxrpc_get_client_conn() - and this is removed if and only if a new
      connection object is allocated.  Thus it leaks if a new connection object
      isn't allocated.
      
      Fix this by putting any peer object attached to the rxrpc_conn_parameters
      object in the function that allocated it.
      
      Fixes: 19ffa01c ("rxrpc: Use structs to hold connection params and protocol info")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      17226f12
    • D
      rxrpc: Add a tracepoint to track rxrpc_peer refcounting · 1159d4b4
      David Howells 提交于
      Add a tracepoint to track reference counting on the rxrpc_peer struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1159d4b4
    • D
      rxrpc: Fix apparent leak of rxrpc_local objects · 31f5f9a1
      David Howells 提交于
      rxrpc_local objects cannot be disposed of until all the connections that
      point to them have been RCU'd as a connection object holds refcount on the
      local endpoint it is communicating through.  Currently, this can cause an
      assertion failure to occur when a network namespace is destroyed as there's
      no check that the RCU destructors for the connections have been run before
      we start trying to destroy local endpoints.
      
      The kernel reports:
      
      	rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5}
      	------------[ cut here ]------------
      	kernel BUG at ../net/rxrpc/local_object.c:439!
      
      Fix this by keeping a count of the live connections and waiting for it to
      go to zero at the end of rxrpc_destroy_all_connections().
      
      Fixes: dee46364 ("rxrpc: Add RCU destruction for connections and calls")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      31f5f9a1
    • D
      rxrpc: Add a tracepoint to track rxrpc_local refcounting · 09d2bf59
      David Howells 提交于
      Add a tracepoint to track reference counting on the rxrpc_local struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      09d2bf59
    • D
      rxrpc: Fix potential call vs socket/net destruction race · d3be4d24
      David Howells 提交于
      rxrpc_call structs don't pin sockets or network namespaces, but may attempt
      to access both after their refcount reaches 0 so that they can detach
      themselves from the network namespace.  However, there's no guarantee that
      the socket still exists at this point (so sock_net(&call->socket->sk) may
      be invalid) and the namespace may have gone away if the call isn't pinning
      a peer.
      
      Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
      waiting for all calls to be destroyed when the network namespace goes away.
      
      This was detected by checker:
      
      net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces)
      net/rxrpc/call_object.c:634:57:    expected struct sock const *sk
      net/rxrpc/call_object.c:634:57:    got struct sock [noderef] <asn:4>*<noident>
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d3be4d24
    • D
      rxrpc: Fix checker warnings and errors · 88f2a825
      David Howells 提交于
      Fix various issues detected by checker.
      
      Errors:
      
       (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
           call->socket.
      
      Warnings:
      
       (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
           trace_rxrpc_conn() as the where argument.
      
       (*) rxrpc_disconnect_client_call() should get its net pointer via the
           call->conn rather than call->sock to avoid a warning about accessing
           an RCU pointer without protection.
      
       (*) Proc seq start/stop functions need annotation as they pass locks
           between the functions.
      
      False positives:
      
       (*) Checker doesn't correctly handle of seq-retry lock context balance in
           rxrpc_find_service_conn_rcu().
      
       (*) Checker thinks execution may proceed past the BUG() in
           rxrpc_publish_service_conn().
      
       (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
           rxkad.c.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      88f2a825
    • S
      rxrpc: remove unused static variables · edb63e2b
      Sebastian Andrzej Siewior 提交于
      The rxrpc_security_methods and rxrpc_security_sem user has been removed
      in 648af7fc ("rxrpc: Absorb the rxkad security module"). This was
      noticed by kbuild test robot for the -RT tree but is also true for !RT.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      edb63e2b
    • M
      rxrpc: Fix resend event time calculation · 59299aa1
      Marc Dionne 提交于
      Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation
      for the next resend event.  For this calculation, "oldest" will be before
      "now", so ktime_sub(oldest, now) will yield a negative value.  When passed
      to nsecs_to_jiffies which expects an unsigned value, the end result will be
      a very large value, and a resend event scheduled far into the future.  This
      could cause calls to stall if some packets were lost.
      
      Fix by ordering the arguments to ktime_sub correctly.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      59299aa1
    • D
      rxrpc: Don't treat call aborts as conn aborts · 57b0c9d4
      David Howells 提交于
      If a call-level abort is received for the previous call to complete on a
      connection channel, then that abort is queued for the connection processor
      to handle.  Unfortunately, the connection processor then assumes without
      checking that the abort is connection-level (ie. callNumber is 0) and
      distributes it over all active calls on that connection, thereby
      incorrectly aborting them.
      
      Fix this by discarding aborts aimed at a completed call.
      
      Further, discard all packets aimed at a call that's complete if there's
      currently an active call on a channel, since the DATA packets associated
      with the new call automatically terminate the old call.
      
      Fixes: 18bfeba5 ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      57b0c9d4
    • D
      rxrpc: Fix Tx ring annotation after initial Tx failure · 03877bf6
      David Howells 提交于
      rxrpc calls have a ring of packets that are awaiting ACK or retransmission
      and a parallel ring of annotations that tracks the state of those packets.
      If the initial transmission of a packet on the underlying UDP socket fails
      then the packet annotation is marked for resend - but the setting of this
      mark accidentally erases the last-packet mark also stored in the same
      annotation slot.  If this happens, a call won't switch out of the Tx phase
      when all the packets have been transmitted.
      
      Fix this by retaining the last-packet mark and only altering the packet
      state.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      03877bf6
    • D
      rxrpc: Fix a bit of time confusion · f82eb88b
      David Howells 提交于
      The rxrpc_reduce_call_timer() function should be passed the 'current time'
      in jiffies, not the current ktime time.  It's confusing in rxrpc_resend
      because that has to deal with both.  Pass the correct current time in.
      
      Note that this only affects the trace produced and not the functioning of
      the code.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f82eb88b
    • D
      rxrpc: Fix firewall route keepalive · ace45bec
      David Howells 提交于
      Fix the firewall route keepalive part of AF_RXRPC which is currently
      function incorrectly by replying to VERSION REPLY packets from the server
      with VERSION REQUEST packets.
      
      Instead, send VERSION REPLY packets to the peers of service connections to
      act as keep-alives 20s after the latest packet was transmitted to that
      peer.
      
      Also, just discard VERSION REPLY packets rather than replying to them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ace45bec
    • L
      ipv6: do not set routes if disable_ipv6 has been enabled · 428604fb
      Lorenzo Bianconi 提交于
      Do not allow setting ipv6 routes from userspace if disable_ipv6 has been
      enabled. The issue can be triggered using the following reproducer:
      
      - sysctl net.ipv6.conf.all.disable_ipv6=1
      - ip -6 route add a:b:c:d::/64 dev em1
      - ip -6 route show
        a:b:c:d::/64 dev em1 metric 1024 pref medium
      
      Fix it checking disable_ipv6 value in ip6_route_info_create routine
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      428604fb
  3. 30 3月, 2018 17 次提交