1. 24 3月, 2018 13 次提交
    • J
      tipc: allow closest-first lookup algorithm when legacy address is configured · b89afb11
      Jon Maloy 提交于
      The removal of an internal structure of the node address has an unwanted
      side effect.
      - Currently, if a user is sending an anycast message with destination
        domain 0, the tipc_namebl_translate() function will use the 'closest-
        first' algorithm to first look for a node local destination, and only
        when no such is found, will it resort to the cluster global 'round-
        robin' lookup algorithm.
      - Current users can get around this, and enforce unconditional use of
        global round-robin by indicating a destination as Z.0.0 or Z.C.0.
      - This option disappears when we make the node address flat, since the
        lookup algorithm has no way of recognizing this case. So, as long as
        there are node local destinations, the algorithm will always select
        one of those, and there is nothing the sender can do to change this.
      
      We solve this by eliminating the 'closest-first' option, which was never
      a good idea anyway, for non-legacy users, but only for those. To
      distinguish between legacy users and non-legacy users we introduce a new
      flag 'legacy_addr_format' in struct tipc_core, to be set when the user
      configures a legacy-style Z.C.N node address. Hence, when a legacy user
      indicates a zero lookup domain 'closest-first' is selected, and in all
      other cases we use 'round-robin'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b89afb11
    • J
      tipc: remove restrictions on node address values · 20263641
      Jon Maloy 提交于
      Nominally, TIPC organizes network nodes into a three-level network
      hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
      hierarchy is reflected in the node address format, - it is sub-divided
      into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
      
      However, the 'zone' and 'cluster' levels have in reality never been
      fully implemented,and never will be. The result of this has been
      that the first 20 bits the node identity structure have been wasted,
      and the usable node identity range within a cluster has been limited
      to 12 bits. This is starting to become a problem.
      
      In the following commits, we will need to be able to connect between
      nodes which are using the whole 32-bit value space of the node address.
      We therefore remove the restrictions on which values can be assigned
      to node identity, -it is from now on only a 32-bit integer with no
      assumed internal structure.
      
      Isolation between clusters is now achieved only by setting different
      values for the 'network id' field used during neighbor discovery, in
      practice leading to the latter becoming the new cluster identity.
      
      The rules for accepting discovery requests/responses from neighboring
      nodes now become:
      
      - If the user is using legacy address format on both peers, reception
        of discovery messages is subject to the legacy lookup domain check
        in addition to the cluster id check.
      
      - Otherwise, the discovery request/response is always accepted, provided
        both peers have the same network id.
      
      This secures backwards compatibility for users who have been using zone
      or cluster identities as cluster separators, instead of the intended
      'network id'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20263641
    • J
      tipc: some cleanups in the file discover.c · b39e465e
      Jon Maloy 提交于
      To facilitate the coming changes in the neighbor discovery functionality
      we make some renaming and refactoring of that code. The functional changes
      in this commit are trivial, e.g., that we move the message sending call in
      tipc_disc_timeout() outside the spinlock protected region.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b39e465e
    • J
      tipc: refactor function tipc_enable_bearer() · cb30a633
      Jon Maloy 提交于
      As a preparation for the next commits we try to reduce the footprint of
      the function tipc_enable_bearer(), while hopefully making is simpler to
      follow.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb30a633
    • K
      net: Convert rxrpc_net_ops · b2864fbd
      Kirill Tkhai 提交于
      These pernet_operations modifies rxrpc_net_id-pointed
      per-net entities. There is external link to AF_RXRPC
      in fs/afs/Kconfig, but it seems there is no other
      pernet_operations interested in that per-net entities.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2864fbd
    • K
      net: Convert udp_sysctl_ops · fc18999e
      Kirill Tkhai 提交于
      These pernet_operations just initialize udp4 defaults.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc18999e
    • N
      net: bridge: fix direct access to bridge vlan_enabled and use helper · 82792a07
      Nikolay Aleksandrov 提交于
      We need to use br_vlan_enabled() helper otherwise we'll break builds
      without bridge vlans:
      net/bridge//br_if.c: In function ‘br_mtu’:
      net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
      member named ‘vlan_enabled’
        if (br->vlan_enabled)
              ^
      net/bridge//br_if.c:462:1: warning: control reaches end of non-void
      function [-Wreturn-type]
       }
       ^
      scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
      failed
      
      Fixes: 419d14af ("bridge: Allow max MTU when multiple VLANs present")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82792a07
    • D
      tls: RX path for ktls · c46234eb
      Dave Watson 提交于
      Add rx path for tls software implementation.
      
      recvmsg, splice_read, and poll implemented.
      
      An additional sockopt TLS_RX is added, with the same interface as
      TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
      together (with two different setsockopt calls with appropriate keys).
      
      Control messages are passed via CMSG in a similar way to transmit.
      If no cmsg buffer is passed, then only application data records
      will be passed to userspace, and EIO is returned for other types of
      alerts.
      
      EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
      framing too big, and EBADMSG for framing too small (matching openssl
      semantics). EINVAL is returned for TLS versions that do not match the
      original setsockopt call.  All are unrecoverable.
      
      strparser is used to parse TLS framing.   Decryption is done directly
      in to userspace buffers if they are large enough to support it, otherwise
      sk_cow_data is called (similar to ipsec), and buffers are decrypted in
      place and copied.  splice_read always decrypts in place, since no
      buffers are provided to decrypt in to.
      
      sk_poll is overridden, and only returns POLLIN if a full TLS message is
      received.  Otherwise we wait for strparser to finish reading a full frame.
      Actual decryption is only done during recvmsg or splice_read calls.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c46234eb
    • D
      tls: Refactor variable names · 58371585
      Dave Watson 提交于
      Several config variables are prefixed with tx, drop the prefix
      since these will be used for both tx and rx.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58371585
    • D
      tls: Pass error code explicitly to tls_err_abort · f4a8e43f
      Dave Watson 提交于
      Pass EBADMSG explicitly to tls_err_abort.  Receive path will
      pass additional codes - EMSGSIZE if framing is larger than max
      TLS record size, EINVAL if TLS version mismatch.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4a8e43f
    • D
      tls: Move cipher info to a separate struct · dbe42559
      Dave Watson 提交于
      Separate tx crypto parameters to a separate cipher_context struct.
      The same parameters will be used for rx using the same struct.
      
      tls_advance_record_sn is modified to only take the cipher info.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbe42559
    • D
      tls: Generalize zerocopy_from_iter · 69ca9293
      Dave Watson 提交于
      Refactor zerocopy_from_iter to take arguments for pages and size,
      such that it can be used for both tx and rx. RX will also support
      zerocopy direct to output iter, as long as the full message can
      be copied at once (a large enough userspace buffer was provided).
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69ca9293
    • C
      bridge: Allow max MTU when multiple VLANs present · 419d14af
      Chas Williams 提交于
      If the bridge is allowing multiple VLANs, some VLANs may have
      different MTUs.  Instead of choosing the minimum MTU for the
      bridge interface, choose the maximum MTU of the bridge members.
      With this the user only needs to set a larger MTU on the member
      ports that are participating in the large MTU VLANS.
      Signed-off-by: NChas Williams <3chas3@gmail.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      419d14af
  2. 23 3月, 2018 13 次提交
    • K
      net: Replace ip_ra_lock with per-net mutex · d9ff3049
      Kirill Tkhai 提交于
      Since ra_chain is per-net, we may use per-net mutexes
      to protect them in ip_ra_control(). This improves
      scalability.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9ff3049
    • K
      net: Make ip_ra_chain per struct net · 5796ef75
      Kirill Tkhai 提交于
      This is optimization, which makes ip_call_ra_chain()
      iterate less sockets to find the sockets it's looking for.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5796ef75
    • K
      net: Revert "ipv4: fix a deadlock in ip_ra_control" · 128aaa98
      Kirill Tkhai 提交于
      This reverts commit 1215e51e.
      Since raw_close() is used on every RAW socket destruction,
      the changes made by 1215e51e scale sadly. This clearly
      seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
      kwork spends a lot of time waiting for rtnl_lock() introduced
      by this commit.
      
      Previous patch moved IP_ROUTER_ALERT out of rtnl_lock(),
      so we revert this patch.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      128aaa98
    • K
      net: Move IP_ROUTER_ALERT out of lock_sock(sk) · 0526947f
      Kirill Tkhai 提交于
      ip_ra_control() does not need sk_lock. Who are the another
      users of ip_ra_chain? ip_mroute_setsockopt() doesn't take
      sk_lock, while parallel IP_ROUTER_ALERT syscalls are
      synchronized by ip_ra_lock. So, we may move this command
      out of sk_lock.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0526947f
    • K
      net: Revert "ipv4: get rid of ip_ra_lock" · 76d3e153
      Kirill Tkhai 提交于
      This reverts commit ba3f571d. The commit was made
      after 1215e51e "ipv4: fix a deadlock in ip_ra_control",
      and killed ip_ra_lock, which became useless after rtnl_lock()
      made used to destroy every raw ipv4 socket. This scales
      very bad, and next patch in series reverts 1215e51e.
      ip_ra_lock will be used again.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76d3e153
    • C
      gre: fix TUNNEL_SEQ bit check on sequence numbering · 15746394
      Colin Ian King 提交于
      The current logic of flags | TUNNEL_SEQ is always non-zero and hence
      sequence numbers are always incremented no matter the setting of the
      TUNNEL_SEQ bit.  Fix this by using & instead of |.
      
      Detected by CoverityScan, CID#1466039 ("Operands don't affect result")
      
      Fixes: 77a5196a ("gre: add sequence number for collect md mode.")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15746394
    • G
      tipc: step sk->sk_drops when rcv buffer is full · 872619d8
      GhantaKrishnamurthy MohanKrishna 提交于
      Currently when tipc is unable to queue a received message on a
      socket, the message is rejected back to the sender with error
      TIPC_ERR_OVERLOAD. However, the application on this socket
      has no knowledge about these discards.
      
      In this commit, we try to step the sk_drops counter when tipc
      is unable to queue a received message. Export sk_drops
      using tipc socket diagnostics.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      872619d8
    • G
      tipc: implement socket diagnostics for AF_TIPC · c30b70de
      GhantaKrishnamurthy MohanKrishna 提交于
      This commit adds socket diagnostics capability for AF_TIPC in netlink
      family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).
      
      The following are key design considerations:
      - config TIPC_DIAG has default y, like INET_DIAG.
      - only requests with flag NLM_F_DUMP is supported (dump all).
      - tipc_sock_diag_req message is introduced to send filter parameters.
      - the response attributes are of TLV, some nested.
      
      To avoid exposing data structures between diag and tipc modules and
      avoid code duplication, the following additions are required:
      - export tipc_nl_sk_walk function to reuse socket iterator.
      - export tipc_sk_fill_sock_diag to fill the tipc diag attributes.
      - create a sock_diag response message in __tipc_add_sock_diag defined
        in diag.c and use the above exported tipc_sk_fill_sock_diag
        to fill response.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c30b70de
    • G
      tipc: modify socket iterator for sock_diag · dfde331e
      GhantaKrishnamurthy MohanKrishna 提交于
      The current socket iterator function tipc_nl_sk_dump, handles socket
      locks and calls __tipc_nl_add_sk for each socket.
      To reuse this logic in sock_diag implementation, we do minor
      modifications to make these functions generic as described below.
      
      In this commit, we add a two new functions __tipc_nl_sk_walk,
      __tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk
      accordingly.
      
      In __tipc_nl_sk_walk we:
      1. acquire and release socket locks
      2. for each socket, execute the specified callback function
      
      In __tipc_nl_add_sk we:
      - Move the netlink attribute insertion to __tipc_nl_add_sk_info.
      
      tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument.
      
      sock_diag will use these generic functions in a later commit.
      
      There is no functional change in this commit.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfde331e
    • D
      devlink: Remove top_hierarchy arg to devlink_resource_register · 14530746
      David Ahern 提交于
      top_hierarchy arg can be determined by comparing parent_resource_id to
      DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate
      argument.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14530746
    • D
      net/ipv6: Handle onlink flag with multipath routes · 68e2ffde
      David Ahern 提交于
      For multipath routes the ONLINK flag can be specified per nexthop in
      rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
      to consider the ONLINK setting coming from rtnh_flags. Each loop over
      nexthops the config for the sibling route is initialized to the global
      config and then per nexthop settings overlayed. The flag is 'or'ed into
      fib6_config to handle the ONLINK flag coming from either rtm_flags or
      rtnh_flags.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68e2ffde
    • D
      ipv6: sr: fix NULL pointer dereference when setting encap source address · 8936ef76
      David Lebrun 提交于
      When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
      source address of the outer IPv6 header, in case none was specified.
      Using skb->dev can lead to BUG() when it is in an inconsistent state.
      This patch uses the net_device attached to the skb's dst instead.
      
      [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
      [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
      [940807.815725] PGD 0 P4D 0
      [940807.847173] Oops: 0000 [#1] SMP PTI
      [940807.890073] Modules linked in:
      [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
      [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
      [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
      [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
      [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
      [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
      [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
      [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
      [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
      [940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
      [940808.939634] Call Trace:
      [940808.970041]  <IRQ>
      [940808.995250]  ? ip6t_do_table+0x265/0x640
      [940809.043341]  seg6_do_srh_encap+0x28f/0x300
      [940809.093516]  ? seg6_do_srh+0x1a0/0x210
      [940809.139528]  seg6_do_srh+0x1a0/0x210
      [940809.183462]  seg6_output+0x28/0x1e0
      [940809.226358]  lwtunnel_output+0x3f/0x70
      [940809.272370]  ip6_xmit+0x2b8/0x530
      [940809.313185]  ? ac6_proc_exit+0x20/0x20
      [940809.359197]  inet6_csk_xmit+0x7d/0xc0
      [940809.404173]  tcp_transmit_skb+0x548/0x9a0
      [940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
      [940809.506603]  ? ip6_default_advmss+0x40/0x40
      [940809.557824]  ? tcp_current_mss+0x24/0x90
      [940809.605925]  tcp_retransmit_skb+0xd/0x80
      [940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
      [940809.719797]  tcp_ack+0xa47/0x1110
      [940809.760612]  tcp_rcv_established+0x13c/0x570
      [940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
      [940809.858879]  tcp_v6_rcv+0xa5c/0xb10
      [940809.901770]  ? seg6_output+0xdd/0x1e0
      [940809.946745]  ip6_input_finish+0xbb/0x460
      [940809.994837]  ip6_input+0x74/0x80
      [940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
      [940810.081663]  ipv6_rcv+0x31c/0x4c0
      ...
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Reported-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8936ef76
    • D
      ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state · 191f86ca
      David Lebrun 提交于
      The seg6_build_state() function is called with RCU read lock held,
      so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.
      
      [   92.770271] =============================
      [   92.770628] WARNING: suspicious RCU usage
      [   92.770921] 4.16.0-rc4+ #12 Not tainted
      [   92.771277] -----------------------------
      [   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
      [   92.772279]
      [   92.772279] other info that might help us debug this:
      [   92.772279]
      [   92.773067]
      [   92.773067] rcu_scheduler_active = 2, debug_locks = 1
      [   92.773514] 2 locks held by ip/2413:
      [   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
      [   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
      [   92.775065]
      [   92.775065] stack backtrace:
      [   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
      [   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
      [   92.776608] Call Trace:
      [   92.776852]  dump_stack+0x7d/0xbc
      [   92.777130]  __schedule+0x133/0xf00
      [   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
      [   92.777783]  ? __sched_text_start+0x8/0x8
      [   92.778073]  ? rcu_is_watching+0x19/0x30
      [   92.778383]  ? kernel_text_address+0x49/0x60
      [   92.778800]  ? __kernel_text_address+0x9/0x30
      [   92.779241]  ? unwind_get_return_address+0x29/0x40
      [   92.779727]  ? pcpu_alloc+0x102/0x8f0
      [   92.780101]  _cond_resched+0x23/0x50
      [   92.780459]  __mutex_lock+0xbd/0xad0
      [   92.780818]  ? pcpu_alloc+0x102/0x8f0
      [   92.781194]  ? seg6_build_state+0x11d/0x240
      [   92.781611]  ? save_stack+0x9b/0xb0
      [   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
      [   92.782480]  ? seg6_build_state+0x11d/0x240
      [   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
      [   92.783393]  ? ip6_route_info_create+0x687/0x1640
      [   92.783846]  ? ip6_route_add+0x74/0x110
      [   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      191f86ca
  3. 22 3月, 2018 11 次提交
    • S
      rds: tcp: remove register_netdevice_notifier infrastructure. · bdf5bd7f
      Sowmini Varadhan 提交于
      The netns deletion path does not need to wait for all net_devices
      to be unregistered before dismantling rds_tcp state for the netns
      (we are able to dismantle this state on module unload even when
      all net_devices are active so there is no dependency here).
      
      This patch removes code related to netdevice notifiers and
      refactors all the code needed to dismantle rds_tcp state
      into a ->exit callback for the pernet_operations used with
      register_pernet_device().
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdf5bd7f
    • K
      net: Convert nf_ct_net_ops · aa65f636
      Kirill Tkhai 提交于
      These pernet_operations register and unregister sysctl.
      Also, there is inet_frags_exit_net() called in exit method,
      which has to be safe after a5600024 "net: Fix hlist
      corruptions in inet_evict_bucket()".
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa65f636
    • K
      net: Convert lowpan_frags_ops · 08012631
      Kirill Tkhai 提交于
      These pernet_operations register and unregister sysctl.
      Also, there is inet_frags_exit_net() called in exit method,
      which has to be safe after a5600024 "net: Fix hlist
      corruptions in inet_evict_bucket()".
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08012631
    • K
      net: Convert can_pernet_ops · 1ae77627
      Kirill Tkhai 提交于
      These pernet_operations create and destroy /proc entries
      and cancel per-net timer.
      
      Also, there are unneed iterations over empty list of net
      devices, since all net devices must be already moved
      to init_net or unregistered by default_device_ops. This
      already was mentioned here:
      
      https://marc.info/?l=linux-can&m=150169589119335&w=2
      
      So, it looks safe to make them async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ae77627
    • D
      net/sched: fix idr leak in the error path of tcf_skbmod_init() · f29cdfbe
      Davide Caratti 提交于
      tcf_skbmod_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure skbmod rules
      using the same idr value will systematically fail with -ENOSPC, unless
      the first attempt was done using the 'replace' keyword:
      
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
      on the error path when the idr has been reserved, but not yet inserted.
      Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
      implicitly become a 'delete' that leaks refcount in act_skbmod module:
      
       # rmmod act_skbmod; modprobe act_skbmod
       # tc action add action skbmod swap mac index 100
       # tc action add action skbmod swap mac continue index 100
       RTNETLINK answers: File exists
       We have an error talking to the kernel
       # tc action replace action skbmod swap mac continue index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action skbmod
       #
       # rmmod  act_skbmod
       rmmod: ERROR: Module act_skbmod is in use
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f29cdfbe
    • D
      net/sched: fix idr leak in the error path of tcf_vlan_init() · d7f20015
      Davide Caratti 提交于
      tcf_vlan_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure vlan rules using
      the same idr value will systematically fail with -ENOSPC, unless the first
      attempt was done using the 'replace' keyword.
      
       # tc action add action vlan pop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
      the error path when the idr has been reserved, but not yet inserted. Also,
      don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
      become a 'delete' that leaks refcount in act_vlan module:
      
       # rmmod act_vlan; modprobe act_vlan
       # tc action add action vlan push id 5 index 100
       # tc action replace action vlan push id 7 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action vlan
       #
       # rmmod act_vlan
       rmmod: ERROR: Module act_vlan is in use
      
      Fixes: 4c5b9d96 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7f20015
    • D
      net/sched: fix idr leak in the error path of __tcf_ipt_init() · 1e46ef17
      Davide Caratti 提交于
      __tcf_ipt_init() can fail after the idr has been successfully reserved.
      When this happens, subsequent attempts to configure xt/ipt rules using
      the same idr value systematically fail with -ENOSPC:
      
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
      when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
      to avoid NULL pointer dereference.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e46ef17
    • D
      net/sched: fix idr leak in the error path of tcp_pedit_init() · 94fa3f92
      Davide Caratti 提交于
      tcf_pedit_init() can fail to allocate 'keys' after the idr has been
      successfully reserved. When this happens, subsequent attempts to configure
      a pedit rule using the same idr value systematically fail with -ENOSPC:
      
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_act_pedit_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94fa3f92
    • D
      net/sched: fix idr leak in the error path of tcf_act_police_init() · 5bf7f818
      Davide Caratti 提交于
      tcf_act_police_init() can fail after the idr has been successfully
      reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
      subsequent attempts to configure a police rule using the same idr value
      systematiclly fail with -ENOSPC:
      
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       ...
      
      Fix this in the error path of tcf_act_police_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bf7f818
    • D
      net/sched: fix idr leak in the error path of tcf_simp_init() · 60e10b3a
      Davide Caratti 提交于
      if the kernel fails to duplicate 'sdata', creation of a new action fails
      with -ENOMEM. However, subsequent attempts to install the same action
      using the same value of 'index' systematically fail with -ENOSPC, and
      that value of 'index' will no more be usable by act_simple, until rmmod /
      insmod of act_simple.ko is done:
      
       # tc actions add action simple sdata hello index 100
       # tc actions list action simple
      
              action order 0: Simple <hello>
               index 100 ref 1 bind 0
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60e10b3a
    • D
      net/sched: fix idr leak on the error path of tcf_bpf_init() · bbc09e78
      Davide Caratti 提交于
      when the following command sequence is entered
      
       # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: Invalid argument
       We have an error talking to the kernel
       # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
      
      act_bpf correctly refuses to install the first TC rule, because 31 is not
      a valid instruction. However, it refuses to install the second TC rule,
      even if the BPF code is correct. Furthermore, it's no more possible to
      install any other rule having the same value of 'index' until act_bpf
      module is unloaded/inserted again. After the idr has been reserved, call
      tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbc09e78
  4. 21 3月, 2018 2 次提交
  5. 20 3月, 2018 1 次提交