1. 28 3月, 2018 2 次提交
  2. 27 3月, 2018 15 次提交
  3. 26 3月, 2018 4 次提交
    • K
      net: Drop NETDEV_UNREGISTER_FINAL · 070f2d7e
      Kirill Tkhai 提交于
      Last user is gone after bdf5bd7f "rds: tcp: remove
      register_netdevice_notifier infrastructure.", so we can
      remove this netdevice command. This allows to delete
      rtnl_lock() in netdev_run_todo(), which is hot path for
      net namespace unregistration.
      
      dev_change_net_namespace() and netdev_wait_allrefs()
      have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
      and the source commits say they were introduced to
      delemit the call with NETDEV_UNREGISTER, but this patch
      leaves them on the places, since they require additional
      analysis, whether we need in them for something else.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      070f2d7e
    • K
      net: Make NETDEV_XXX commands enum { } · ede2762d
      Kirill Tkhai 提交于
      This patch is preparation to drop NETDEV_UNREGISTER_FINAL.
      Since the cmd is used in usnic_ib_netdev_event_to_string()
      to get cmd name, after plain removing NETDEV_UNREGISTER_FINAL
      from everywhere, we'd have holes in event2str[] in this
      function.
      
      Instead of that, let's make NETDEV_XXX commands names
      available for everyone, and to define netdev_cmd_to_name()
      in the way we won't have to shaffle names after their
      numbers are changed.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ede2762d
    • K
      tipc: tipc_disc_addr_trial_msg() can be static · da18ab32
      kbuild test robot 提交于
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Acked-by: Jon Maloy jon.maloy@ericsson.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da18ab32
    • Y
      net: permit skb_segment on head_frag frag_list skb · 13acc94e
      Yonghong Song 提交于
      One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
      function skb_segment(), line 3667. The bpf program attaches to
      clsact ingress, calls bpf_skb_change_proto to change protocol
      from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
      to send the changed packet out.
      
      3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
      3473                             netdev_features_t features)
      3474 {
      3475         struct sk_buff *segs = NULL;
      3476         struct sk_buff *tail = NULL;
      ...
      3665                 while (pos < offset + len) {
      3666                         if (i >= nfrags) {
      3667                                 BUG_ON(skb_headlen(list_skb));
      3668
      3669                                 i = 0;
      3670                                 nfrags = skb_shinfo(list_skb)->nr_frags;
      3671                                 frag = skb_shinfo(list_skb)->frags;
      3672                                 frag_skb = list_skb;
      ...
      
      call stack:
      ...
       #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
       #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
       #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
       #4 [ffff883ffef03668] die at ffffffff8101deb2
       #5 [ffff883ffef03698] do_trap at ffffffff8101a700
       #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
       #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
       #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
          [exception RIP: skb_segment+3044]
          RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
          RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
          RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
          RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
          R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
          R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13acc94e
  4. 24 3月, 2018 19 次提交
    • D
      net/sched: act_vlan: declare push_vid with host byte order · 94cb5492
      Davide Caratti 提交于
      use u16 in place of __be16 to suppress the following sparse warnings:
      
       net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types)
       net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid
       net/sched/act_vlan.c:150:26: got unsigned short
       net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer
       net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types)
       net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid
       net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94cb5492
    • D
      net/sched: remove tcf_idr_cleanup() · affaa0c7
      Davide Caratti 提交于
      tcf_idr_cleanup() is no more used, so remove it.
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      affaa0c7
    • J
      tipc: obtain node identity from interface by default · 52dfae5c
      Jon Maloy 提交于
      Selecting and explicitly configuring a TIPC node identity may be
      unwanted in some cases.
      
      In this commit we introduce a default setting if the identity has not
      been set at the moment the first bearer is enabled. We do this by
      using a raw copy of a unique identifier from the used interface: MAC
      address in the case of an L2 bearer, IPv4/IPv6 address in the case
      of a UDP bearer.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52dfae5c
    • J
      tipc: handle collisions of 32-bit node address hash values · 25b0b9c4
      Jon Maloy 提交于
      When a 32-bit node address is generated from a 128-bit identifier,
      there is a risk of collisions which must be discovered and handled.
      
      We do this as follows:
      - We don't apply the generated address immediately to the node, but do
        instead initiate a 1 sec trial period to allow other cluster members
        to discover and handle such collisions.
      
      - During the trial period the node periodically sends out a new type
        of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
        to all the other nodes in the cluster.
      
      - When a node is receiving such a message, it must check that the
        presented 32-bit identifier either is unused, or was used by the very
        same peer in a previous session. In both cases it accepts the request
        by not responding to it.
      
      - If it finds that the same node has been up before using a different
        address, it responds with a DSC_TRIAL_FAIL_MSG containing that
        address.
      
      - If it finds that the address has already been taken by some other
        node, it generates a new, unused address and returns it to the
        requester.
      
      - During the trial period the requesting node must always be prepared
        to accept a failure message, i.e., a message where a peer suggests a
        different (or equal)  address to the one tried. In those cases it
        must apply the suggested value as trial address and restart the trial
        period.
      
      This algorithm ensures that in the vast majority of cases a node will
      have the same address before and after a reboot. If a legacy user
      configures the address explicitly, there will be no trial period and
      messages, so this protocol addition is completely backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25b0b9c4
    • J
      tipc: add 128-bit node identifier · d50ccc2d
      Jon Maloy 提交于
      We add a 128-bit node identity, as an alternative to the currently used
      32-bit node address.
      
      For the sake of compatibility and to minimize message header changes
      we retain the existing 32-bit address field. When not set explicitly by
      the user, this field will be filled with a hash value generated from the
      much longer node identity, and be used as a shorthand value for the
      latter.
      
      We permit either the address or the identity to be set by configuration,
      but not both, so when the address value is set by a legacy user the
      corresponding 128-bit node identity is generated based on the that value.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d50ccc2d
    • J
      tipc: remove direct accesses to own_addr field in struct tipc_net · 23fd3eac
      Jon Maloy 提交于
      As a preparation to changing the addressing structure of TIPC we replace
      all direct accesses to the tipc_net::own_addr field with the function
      dedicated for this, tipc_own_addr().
      
      There are no changes to program logics in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fd3eac
    • J
      tipc: allow closest-first lookup algorithm when legacy address is configured · b89afb11
      Jon Maloy 提交于
      The removal of an internal structure of the node address has an unwanted
      side effect.
      - Currently, if a user is sending an anycast message with destination
        domain 0, the tipc_namebl_translate() function will use the 'closest-
        first' algorithm to first look for a node local destination, and only
        when no such is found, will it resort to the cluster global 'round-
        robin' lookup algorithm.
      - Current users can get around this, and enforce unconditional use of
        global round-robin by indicating a destination as Z.0.0 or Z.C.0.
      - This option disappears when we make the node address flat, since the
        lookup algorithm has no way of recognizing this case. So, as long as
        there are node local destinations, the algorithm will always select
        one of those, and there is nothing the sender can do to change this.
      
      We solve this by eliminating the 'closest-first' option, which was never
      a good idea anyway, for non-legacy users, but only for those. To
      distinguish between legacy users and non-legacy users we introduce a new
      flag 'legacy_addr_format' in struct tipc_core, to be set when the user
      configures a legacy-style Z.C.N node address. Hence, when a legacy user
      indicates a zero lookup domain 'closest-first' is selected, and in all
      other cases we use 'round-robin'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b89afb11
    • J
      tipc: remove restrictions on node address values · 20263641
      Jon Maloy 提交于
      Nominally, TIPC organizes network nodes into a three-level network
      hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
      hierarchy is reflected in the node address format, - it is sub-divided
      into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
      
      However, the 'zone' and 'cluster' levels have in reality never been
      fully implemented,and never will be. The result of this has been
      that the first 20 bits the node identity structure have been wasted,
      and the usable node identity range within a cluster has been limited
      to 12 bits. This is starting to become a problem.
      
      In the following commits, we will need to be able to connect between
      nodes which are using the whole 32-bit value space of the node address.
      We therefore remove the restrictions on which values can be assigned
      to node identity, -it is from now on only a 32-bit integer with no
      assumed internal structure.
      
      Isolation between clusters is now achieved only by setting different
      values for the 'network id' field used during neighbor discovery, in
      practice leading to the latter becoming the new cluster identity.
      
      The rules for accepting discovery requests/responses from neighboring
      nodes now become:
      
      - If the user is using legacy address format on both peers, reception
        of discovery messages is subject to the legacy lookup domain check
        in addition to the cluster id check.
      
      - Otherwise, the discovery request/response is always accepted, provided
        both peers have the same network id.
      
      This secures backwards compatibility for users who have been using zone
      or cluster identities as cluster separators, instead of the intended
      'network id'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20263641
    • J
      tipc: some cleanups in the file discover.c · b39e465e
      Jon Maloy 提交于
      To facilitate the coming changes in the neighbor discovery functionality
      we make some renaming and refactoring of that code. The functional changes
      in this commit are trivial, e.g., that we move the message sending call in
      tipc_disc_timeout() outside the spinlock protected region.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b39e465e
    • J
      tipc: refactor function tipc_enable_bearer() · cb30a633
      Jon Maloy 提交于
      As a preparation for the next commits we try to reduce the footprint of
      the function tipc_enable_bearer(), while hopefully making is simpler to
      follow.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb30a633
    • K
      net: Convert rxrpc_net_ops · b2864fbd
      Kirill Tkhai 提交于
      These pernet_operations modifies rxrpc_net_id-pointed
      per-net entities. There is external link to AF_RXRPC
      in fs/afs/Kconfig, but it seems there is no other
      pernet_operations interested in that per-net entities.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2864fbd
    • K
      net: Convert udp_sysctl_ops · fc18999e
      Kirill Tkhai 提交于
      These pernet_operations just initialize udp4 defaults.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc18999e
    • N
      net: bridge: fix direct access to bridge vlan_enabled and use helper · 82792a07
      Nikolay Aleksandrov 提交于
      We need to use br_vlan_enabled() helper otherwise we'll break builds
      without bridge vlans:
      net/bridge//br_if.c: In function ‘br_mtu’:
      net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
      member named ‘vlan_enabled’
        if (br->vlan_enabled)
              ^
      net/bridge//br_if.c:462:1: warning: control reaches end of non-void
      function [-Wreturn-type]
       }
       ^
      scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
      failed
      
      Fixes: 419d14af ("bridge: Allow max MTU when multiple VLANs present")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82792a07
    • D
      tls: RX path for ktls · c46234eb
      Dave Watson 提交于
      Add rx path for tls software implementation.
      
      recvmsg, splice_read, and poll implemented.
      
      An additional sockopt TLS_RX is added, with the same interface as
      TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
      together (with two different setsockopt calls with appropriate keys).
      
      Control messages are passed via CMSG in a similar way to transmit.
      If no cmsg buffer is passed, then only application data records
      will be passed to userspace, and EIO is returned for other types of
      alerts.
      
      EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
      framing too big, and EBADMSG for framing too small (matching openssl
      semantics). EINVAL is returned for TLS versions that do not match the
      original setsockopt call.  All are unrecoverable.
      
      strparser is used to parse TLS framing.   Decryption is done directly
      in to userspace buffers if they are large enough to support it, otherwise
      sk_cow_data is called (similar to ipsec), and buffers are decrypted in
      place and copied.  splice_read always decrypts in place, since no
      buffers are provided to decrypt in to.
      
      sk_poll is overridden, and only returns POLLIN if a full TLS message is
      received.  Otherwise we wait for strparser to finish reading a full frame.
      Actual decryption is only done during recvmsg or splice_read calls.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c46234eb
    • D
      tls: Refactor variable names · 58371585
      Dave Watson 提交于
      Several config variables are prefixed with tx, drop the prefix
      since these will be used for both tx and rx.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58371585
    • D
      tls: Pass error code explicitly to tls_err_abort · f4a8e43f
      Dave Watson 提交于
      Pass EBADMSG explicitly to tls_err_abort.  Receive path will
      pass additional codes - EMSGSIZE if framing is larger than max
      TLS record size, EINVAL if TLS version mismatch.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4a8e43f
    • D
      tls: Move cipher info to a separate struct · dbe42559
      Dave Watson 提交于
      Separate tx crypto parameters to a separate cipher_context struct.
      The same parameters will be used for rx using the same struct.
      
      tls_advance_record_sn is modified to only take the cipher info.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbe42559
    • D
      tls: Generalize zerocopy_from_iter · 69ca9293
      Dave Watson 提交于
      Refactor zerocopy_from_iter to take arguments for pages and size,
      such that it can be used for both tx and rx. RX will also support
      zerocopy direct to output iter, as long as the full message can
      be copied at once (a large enough userspace buffer was provided).
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69ca9293
    • C
      bridge: Allow max MTU when multiple VLANs present · 419d14af
      Chas Williams 提交于
      If the bridge is allowing multiple VLANs, some VLANs may have
      different MTUs.  Instead of choosing the minimum MTU for the
      bridge interface, choose the maximum MTU of the bridge members.
      With this the user only needs to set a larger MTU on the member
      ports that are participating in the large MTU VLANS.
      Signed-off-by: NChas Williams <3chas3@gmail.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      419d14af