1. 24 10月, 2015 20 次提交
    • D
      Merge branch 'tipc-next' · 687f079a
      David S. Miller 提交于
      Jon Maloy says:
      
      ====================
      tipc: improve broadcast implementation
      
      The TIPC broadcast link implementation is currently complex and hard to
      follow. It also incurs some amount of code and structure duplication,
      something that can be reduced significantly with a little effort.
      
      This commit series introduces a number of improvements which address
      both the locking structure, the code/structure duplication issue, and
      the overall readbility of the code.
      
      The series consists of three main parts:
      
      1-7: Adaptation to the new link structure, and preparation for the next
           step. In particular, we want the broadcast transmission link to
           have a life cycle that is longer than any of its potential (unicast
           and broadcast receive links) users. This eliminates the need to
           always test for the presence of this link before accessing it.
      
      8-10: This is what is really new in this series. Commit #9 is by far
            the largest and most important one, because it moves most of
            the broadcast functionality into link.c, partially reusing the
            fields and functionality of the unicast link. The removal of
            the "node_map" infrastructure in commit #10 is also an important
            achievement.
      
      11-16: Some improvements leveraging the changes made in the previous
             commits.
      
      The series needs commit 53387c4e ("tipc: extend broadcast link window size")
      and commit e5356794 ("tipc: conditionally expand buffer headroom over udp tunnel")
      which are both present in 'net' but not yet in 'net-next', to apply cleanly.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      687f079a
    • J
      tipc: clean up unused code and structures · 2af5ae37
      Jon Paul Maloy 提交于
      After the previous changes in this series, we can now remove some
      unused code and structures, both in the broadcast, link aggregation
      and link code.
      
      There are no functional changes in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2af5ae37
    • J
      tipc: ensure binding table initial distribution is sent via first link · c49a0a84
      Jon Paul Maloy 提交于
      Correct synchronization of the broadcast link at first contact between
      two nodes is dependent on the assumption that the binding table "bulk"
      update passes via the same link as the initial broadcast syncronization
      message, i.e., via the first link that is established.
      
      This is not guaranteed in the current implementation. If two link
      come up very close to each other in time, the "bulk" may quite well
      pass via the second link, and hence void the guarantee of a correct
      initial synchronization before the broadcast link is opened.
      
      This commit makes two small changes to strengthen this guarantee.
      
      1) We let the second established link occupy slot 1 of the
         "active_links" array, while the first link will retain slot 0.
         (This is in reality a cosmetic change, we could just as well keep
          the current, opposite order)
      
      2) We let the name distributor always use link selector/slot 0 when
         it sends it binding table updates.
      
      The extra traffic bias on the first link caused by this change should
      be negligible, since binding table updates constitutes a very small
      fraction of the total traffic.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c49a0a84
    • J
      tipc: eliminate link's reference to owner node · c72fa872
      Jon Paul Maloy 提交于
      With the recent commit series, we have established a one-way dependency
      between the link aggregation (struct tipc_node) instances and their
      pertaining tipc_link instances. This has enabled quite significant code
      and structure simplifications.
      
      In this commit, we eliminate the field 'owner', which points to an
      instance of struct tipc_node, from struct tipc_link, and replace it with
      a pointer to struct net, which is the only external reference now needed
      by a link instance.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c72fa872
    • J
      tipc: eliminate redundant buffer cloning at transmission · 7214bcf8
      Jon Paul Maloy 提交于
      Since all packet transmitters (link, bcast, discovery) are now sending
      consumable buffer clones to the bearer layer, we can remove the
      redundant buffer cloning that is perfomed in the lower level functions
      tipc_l2_send_msg() and tipc_udp_send_msg().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7214bcf8
    • J
      tipc: let neighbor discoverer tranmsit consumable buffers · 60852d67
      Jon Paul Maloy 提交于
      The neighbor discovery function currently uses the function
      tipc_bearer_send() for transmitting packets, assuming that the
      sent buffers are not consumed by the called function.
      
      We want to change this, in order to avoid unnecessary buffer cloning
      elswhere in the code.
      
      This commit introduces a new function tipc_bearer_skb() which consumes
      the sent buffers, and let the discoverer functions use this new call
      instead. The discoverer does now itself perform the cloning when
      that is necessary.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60852d67
    • J
      tipc: introduce jumbo frame support for broadcast · 959e1781
      Jon Paul Maloy 提交于
      Until now, we have only been supporting a fix MTU size of 1500 bytes
      for all broadcast media, irrespective of their actual capability.
      
      We now make the broadcast MTU adaptable to the carrying media, i.e.,
      we use the smallest MTU supported by any of the interfaces attached
      to TIPC.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      959e1781
    • J
      tipc: simplify bearer level broadcast · b06b281e
      Jon Paul Maloy 提交于
      Until now, we have been keeping track of the exact set of broadcast
      destinations though the help structure tipc_node_map. This leads us to
      have to maintain a whole infrastructure for supporting this, including
      a pseudo-bearer and a number of functions to manipulate both the bearers
      and the node map correctly. Apart from the complexity, this approach is
      also limiting, as struct tipc_node_map only can support cluster local
      broadcast if we want to avoid it becoming excessively large. We want to
      eliminate this limitation, in order to enable introduction of scoped
      multicast in the future.
      
      A closer analysis reveals that it is unnecessary maintaining this "full
      set" overview; it is sufficient to keep a counter per bearer, indicating
      how many nodes can be reached via this bearer at the moment. The protocol
      is now robust enough to handle transitional discrepancies between the
      nominal number of reachable destinations, as expected by the broadcast
      protocol itself, and the number which is actually reachable at the
      moment. The initial broadcast synchronization, in conjunction with the
      retransmission mechanism, ensures that all packets will eventually be
      acknowledged by the correct set of destinations.
      
      This commit introduces these changes.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b06b281e
    • J
      tipc: let broadcast packet reception use new link receive function · 52666986
      Jon Paul Maloy 提交于
      The code path for receiving broadcast packets is currently distinct
      from the unicast path. This leads to unnecessary code and data
      duplication, something that can be avoided with some effort.
      
      We now introduce separate per-peer tipc_link instances for handling
      broadcast packet reception. Each receive link keeps a pointer to the
      common, single, broadcast link instance, and can hence handle release
      and retransmission of send buffers as if they belonged to the own
      instance.
      
      Furthermore, we let each unicast link instance keep a reference to both
      the pertaining broadcast receive link, and to the common send link.
      This makes it possible for the unicast links to easily access data for
      broadcast link synchronization, as well as for carrying acknowledges for
      received broadcast packets.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52666986
    • J
      tipc: introduce capability bit for broadcast synchronization · fd556f20
      Jon Paul Maloy 提交于
      Until now, we have tried to support both the newer, dedicated broadcast
      synchronization mechanism along with the older, less safe, RESET_MSG/
      ACTIVATE_MSG based one. The latter method has turned out to be a hazard
      in a highly dynamic cluster, so we find it safer to disable it completely
      when we find that the former mechanism is supported by the peer node.
      
      For this purpose, we now introduce a new capabability bit,
      TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
      syncronization is supported by the present node. The new bit is conveyed
      between peers in the 'capabilities' field of neighbor discovery messages.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd556f20
    • J
      tipc: let broadcast transmission use new link transmit function · 2f566124
      Jon Paul Maloy 提交于
      This commit simplifies the broadcast link transmission function, by
      leveraging previous changes to the link transmission function and the
      broadcast transmission link life cycle.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f566124
    • J
      tipc: make struct tipc_link generic to support broadcast · c1ab3f1d
      Jon Paul Maloy 提交于
      Realizing that unicast is just a special case of broadcast, we also see
      that we can go in the other direction, i.e., that modest changes to the
      current unicast link can make it generic enough to support broadcast.
      
      The following changes are introduced here:
      
      - A new counter ("ackers") in struct tipc_link, to indicate how many
        peers need to ack a packet before it can be released.
      - A corresponding counter in the skb user area, to keep track of how
        many peers a are left to ack before a buffer can be released.
      - A new counter ("acked"), to keep persistent track of how far a peer
        has acked at the moment, i.e., where in the transmission queue to
        start updating buffers when the next ack arrives. This is to avoid
        double acknowledgements from a peer, with inadvertent relase of
        packets as a result.
      - A more generic tipc_link_retrans() function, where retransmit starts
        from a given sequence number, instead of the first packet in the
        transmision queue. This is to minimize the number of retransmitted
        packets on the broadcast media.
      
      When the new functionality is taken into use in the next commits,
      we expect it to have minimal effect on unicast mode performance.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1ab3f1d
    • J
      tipc: use explicit allocation of broadcast send link · 32301906
      Jon Paul Maloy 提交于
      The broadcast link instance (struct tipc_link) used for sending is
      currently aggregated into struct tipc_bclink. This means that we cannot
      use the regular tipc_link_create() function for initiating the link, but
      do instead have to initiate numerous fields directly from the
      bcast_init() function.
      
      We want to reduce dependencies between the broadcast functionality
      and the inner workings of tipc_link. In this commit, we introduce
      a new function tipc_bclink_create() to link.c, and allocate the
      instance of the link separately using this function.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32301906
    • J
      tipc: make link implementation independent from struct tipc_bearer · 0e05498e
      Jon Paul Maloy 提交于
      In reality, the link implementation is already independent from
      struct tipc_bearer, in that it doesn't store any reference to it.
      However, we still pass on a pointer to a bearer instance in the
      function tipc_link_create(), just to have it extract some
      initialization information from it.
      
      I later commits, we need to create instances of tipc_link without
      having any associated struct tipc_bearer. To facilitate this, we
      want to extract the initialization data already in the creator
      function in node.c, before calling tipc_link_create(), and pass
      this info on as individual parameters in the call.
      
      This commit introduces this change.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e05498e
    • J
      tipc: create broadcast transmission link at namespace init · 5fd9fd63
      Jon Paul Maloy 提交于
      The broadcast transmission link is currently instantiated when the
      network subsystem is started, i.e., on order from user space via netlink.
      
      This forces the broadcast transmission code to do unnecessary tests for
      the existence of the transmission link, as well in single mode node as
      in network mode.
      
      In this commit, we do instead create the link during initialization of
      the name space, and remove it when it is stopped. The fact that the
      transmission link now has a guaranteed longer life cycle than any of its
      potential clients paves the way for further code simplifcations
      and optimizations.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fd9fd63
    • J
      tipc: move broadcast link lock to struct tipc_net · 0043550b
      Jon Paul Maloy 提交于
      The broadcast lock will need to be acquired outside bcast.c in a later
      commit. For this reason, we move the lock to struct tipc_net. Consistent
      with the changes in the previous commit, we also introducee two new
      functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
      currently using tipc_bclink_lock()/unlock() will be phased out during
      the coming commits in this series.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0043550b
    • J
      tipc: move bcast definitions to bcast.c · 6beb19a6
      Jon Paul Maloy 提交于
      Currently, a number of structure and function definitions related
      to the broadcast functionality are unnecessarily exposed in the file
      bcast.h. This obscures the fact that the external interface towards
      the broadcast link in fact is very narrow, and causes unnecessary
      recompilations of other files when anything changes in those
      definitions.
      
      In this commit, we move as many of those definitions as is currently
      possible to the file bcast.c.
      
      We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
      since the name does not correctly describe the contents of this
      struct, and will do so even less in the future, and because we want
      to use the term 'link' more appropriately in the functionality
      introduced later in this series.
      
      Finally, we rename a couple of functions, such as tipc_bclink_xmit()
      and others that will be kept in the future, to include the term 'bcast'
      instead.
      
      There are no functional changes in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6beb19a6
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ba3e2084
      David S. Miller 提交于
      Conflicts:
      	net/ipv6/xfrm6_output.c
      	net/openvswitch/flow_netlink.c
      	net/openvswitch/vport-gre.c
      	net/openvswitch/vport-vxlan.c
      	net/openvswitch/vport.c
      	net/openvswitch/vport.h
      
      The openvswitch conflicts were overlapping changes.  One was
      the egress tunnel info fix in 'net' and the other was the
      vport ->send() op simplification in 'net-next'.
      
      The xfrm6_output.c conflicts was also a simplification
      overlapping a bug fix.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba3e2084
    • D
      Merge branch 'for-upstream' of... · a72c9512
      David S. Miller 提交于
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2015-10-22
      
      Here's probably the last bluetooth-next pull request for 4.4. Among
      several other changes it contains the rest of the fixes & cleanups from
      the Bluetooth UnplugFest (that didn't need to be hurried to 4.3).
      
       - Refactoring & cleanups to 6lowpan code
       - New USB ids for two Atheros controllers and BCM43142A0 from Broadcom
       - Fix (quirk) for broken Broadcom BCM2045 controllers
       - Support for latest Apple controllers
       - Improvements to the vendor diagnostic message support
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a72c9512
    • M
      bnxt_en: Fix compile errors when CONFIG_BNXT_SRIOV is not set. · 379a80a1
      Michael Chan 提交于
      struct bnxt_pf_info needs to be always defined.  Move bnxt_update_vf_mac()
      to bnxt_sriov.c and add some missing #ifdef CONFIG_BNXT_SRIOV.
      Reported-by: NJim Hull <jim.hull@hpe.com>
      Tested-by: NJim Hull <jim.hull@hpe.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      379a80a1
  2. 23 10月, 2015 20 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · bf795860
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-10-23
      
      This series contains updates to i40e, i40evf, if_link, ixgbe and ixgbevf.
      
      Anjali adds a workaround to drop any flow control frames from being
      transmitted from any VSI, so that a malicious VF cannot send flow control
      or PFC packets out on the wire.  Also fixed a bug in debugfs by grabbing
      the filter list lock before adding or deleting a filter.
      
      Akeem fixes an issue where we were unconditionally returning VEB bridge
      mode before allowing LB in the add VSI routine, resolve by checking if
      the bridge is actually in VEB mode first.
      
      Mitch fixed an issue where the incorrect structure was being used for
      VLAN filter list, which meant the VLAN filter list did not get
      processed correctly and VLAN filters would not be re-enabled after any
      kind of reset.
      
      Helin fixed a problem of possibly getting inconsistent flow control
      status after a PF reset.  The issue was requested_mode was being set
      with a default value during probe, but the hardware state could be a
      different value from this mode.
      
      Carolyn fixed a problem where the driver output of the OEM version
      string varied from the other tools.
      
      Jean Sacren fixes up kernel documentation by fixing function header
      comments to match actual variables used in the functions.  Also
      cleaned up variable initialization, when the variable would be
      over-written immediately.
      
      Hiroshi Shimanoto provides three patches to add "trusted" VF by adding
      netlink directives and an NDO entry.  Then implement these new controls
      in ixgbe and ixgbevf.  This series has gone through several iterations
      to address all the suggested community changes and concerns.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf795860
    • D
      Merge branch 'mpls_multipath' · e74f5105
      David S. Miller 提交于
      Roopa Prabhu says:
      
      ====================
      mpls: multipath support
      
      This patch adds support for MPLS multipath routes.
      
      Includes following changes to support multipath:
      - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.
      
      - struct mpls_nh represents a mpls nexthop label forwarding entry
      
      - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
      multipath routes similar to ipv4/v6 fib
      
      - In the process of restructuring, this patch also consistently changes all
      labels to u8
      
      $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
      		nexthop as 700 via inet 10.1.1.6 dev swp2 \
      		nexthop as 800 via inet 40.1.1.2 dev swp3
      
      $ip  -f mpls route show
      100
      	nexthop as to 200 via inet 10.1.1.2  dev swp1
      	nexthop as to 700 via inet 10.1.1.6  dev swp2
      	nexthop as to 800 via inet 40.1.1.2  dev swp3
      ====================
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e74f5105
    • R
      mpls: flow-based multipath selection · 1c78efa8
      Robert Shearman 提交于
      Change the selection of a multipath route to use a flow-based
      hash. This more suitable for traffic sensitive to reordering within a
      flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
      of traffic given enough flows.
      
      Selection of the path for a multipath route is done using a hash of:
      1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
         including entropy label, whichever is first.
      2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
         payload, if present.
      
      Naturally, a 5-tuple hash using L4 information in addition would be
      possible and be better in some scenarios, but there is a tradeoff
      between looking deeper into the packet to achieve good distribution,
      and packet forwarding performance, and I have erred on the side of the
      latter as the default.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c78efa8
    • R
      mpls: multipath route support · f8efb73c
      Roopa Prabhu 提交于
      This patch adds support for MPLS multipath routes.
      
      Includes following changes to support multipath:
      - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
      
      - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
      
      - moves mpls route and nexthop structures into internal.h
      
      - A mpls_route can point to multiple mpls_nh structs
      
      - the nexthops are maintained as a array (similar to ipv4 fib)
      
      - In the process of restructuring, this patch also consistently changes
        all labels to u8
      
      - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
      multipath routes similar to ipv4/v6 fib
      
      - In this patch, the multipath route nexthop selection algorithm
      simply returns the first nexthop. It is replaced by a
      hash based algorithm from Robert Shearman in the next patch
      
      - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
      mpls_route_update though implemented to update based on dev, it was
      never used that way. And the dev handling gets tricky with multiple
      nexthops. Cannot match against any single nexthops dev. So, this patch
      removes the unused 'dev' handling in mpls_route_update.
      
      - dead route/path handling will be implemented in a subsequent patch
      
      Example:
      
      $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
                      nexthop as 700 via inet 10.1.1.6 dev swp2 \
                      nexthop as 800 via inet 40.1.1.2 dev swp3
      
      $ip  -f mpls route show
      100
              nexthop as to 200 via inet 10.1.1.2  dev swp1
              nexthop as to 700 via inet 10.1.1.6  dev swp2
              nexthop as to 800 via inet 40.1.1.2  dev swp3
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8efb73c
    • L
      net: sysctl: fix a kmemleak warning · ce9d9b8e
      Li RongQing 提交于
      the returned buffer of register_sysctl() is stored into net_header
      variable, but net_header is not used after, and compiler maybe
      optimise the variable out, and lead kmemleak reported the below warning
      
      	comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s)
      	hex dump (first 32 bytes):
      	90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8..............
      	01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      	backtrace:
      	[<ffffffc00020f134>] create_object+0x10c/0x2a0
      	[<ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0
      	[<ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8
      	[<ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0
      	[<ffffffc00028eef0>] register_sysctl+0x30/0x40
      	[<ffffffc00099c304>] net_sysctl_init+0x20/0x58
      	[<ffffffc000994dd8>] sock_init+0x10/0xb0
      	[<ffffffc0000842e0>] do_one_initcall+0x90/0x1b8
      	[<ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0
      	[<ffffffc00070ed6c>] kernel_init+0x1c/0xe8
      	[<ffffffc000083bfc>] ret_from_fork+0xc/0x50
      	[<ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>>
      
      Before fix, the objdump result on ARM64:
      0000000000000000 <net_sysctl_init>:
         0:   a9be7bfd        stp     x29, x30, [sp,#-32]!
         4:   90000001        adrp    x1, 0 <net_sysctl_init>
         8:   90000000        adrp    x0, 0 <net_sysctl_init>
         c:   910003fd        mov     x29, sp
        10:   91000021        add     x1, x1, #0x0
        14:   91000000        add     x0, x0, #0x0
        18:   a90153f3        stp     x19, x20, [sp,#16]
        1c:   12800174        mov     w20, #0xfffffff4                // #-12
        20:   94000000        bl      0 <register_sysctl>
        24:   b4000120        cbz     x0, 48 <net_sysctl_init+0x48>
        28:   90000013        adrp    x19, 0 <net_sysctl_init>
        2c:   91000273        add     x19, x19, #0x0
        30:   9101a260        add     x0, x19, #0x68
        34:   94000000        bl      0 <register_pernet_subsys>
        38:   2a0003f4        mov     w20, w0
        3c:   35000060        cbnz    w0, 48 <net_sysctl_init+0x48>
        40:   aa1303e0        mov     x0, x19
        44:   94000000        bl      0 <register_sysctl_root>
        48:   2a1403e0        mov     w0, w20
        4c:   a94153f3        ldp     x19, x20, [sp,#16]
        50:   a8c27bfd        ldp     x29, x30, [sp],#32
        54:   d65f03c0        ret
      After:
      0000000000000000 <net_sysctl_init>:
         0:   a9bd7bfd        stp     x29, x30, [sp,#-48]!
         4:   90000000        adrp    x0, 0 <net_sysctl_init>
         8:   910003fd        mov     x29, sp
         c:   a90153f3        stp     x19, x20, [sp,#16]
        10:   90000013        adrp    x19, 0 <net_sysctl_init>
        14:   91000000        add     x0, x0, #0x0
        18:   91000273        add     x19, x19, #0x0
        1c:   f90013f5        str     x21, [sp,#32]
        20:   aa1303e1        mov     x1, x19
        24:   12800175        mov     w21, #0xfffffff4                // #-12
        28:   94000000        bl      0 <register_sysctl>
        2c:   f9002260        str     x0, [x19,#64]
        30:   b40001a0        cbz     x0, 64 <net_sysctl_init+0x64>
        34:   90000014        adrp    x20, 0 <net_sysctl_init>
        38:   91000294        add     x20, x20, #0x0
        3c:   9101a280        add     x0, x20, #0x68
        40:   94000000        bl      0 <register_pernet_subsys>
        44:   2a0003f5        mov     w21, w0
        48:   35000080        cbnz    w0, 58 <net_sysctl_init+0x58>
        4c:   aa1403e0        mov     x0, x20
        50:   94000000        bl      0 <register_sysctl_root>
        54:   14000004        b       64 <net_sysctl_init+0x64>
        58:   f9402260        ldr     x0, [x19,#64]
        5c:   94000000        bl      0 <unregister_sysctl_table>
        60:   f900227f        str     xzr, [x19,#64]
        64:   2a1503e0        mov     w0, w21
        68:   f94013f5        ldr     x21, [sp,#32]
        6c:   a94153f3        ldp     x19, x20, [sp,#16]
        70:   a8c37bfd        ldp     x29, x30, [sp],#48
        74:   d65f03c0        ret
      
      Add the possible error handle to free the net_header to remove the
      kmemleak warning
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce9d9b8e
    • D
      Merge branch 'mdiobus_nested_read_write' · 654c9c54
      David S. Miller 提交于
      Neil Armstrong says:
      
      ====================
      Refactor nested mdiobus read/write functions
      
      In order to avoid locked signal false positive for nested mdiobus
      read/write calls, nested code was introduced in mv88e6xxx and
      mdio-mux.
      But mv88e6060 also needs such nested mdiobus read/write calls.
      For sake of refactoring, introduce nested variants of mdiobus read/write
      and make them used by mv88e6xxx and mv88e6060.
      In a next patch, mdio-mux should also use these variant calls.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      654c9c54
    • N
      net: dsa: Make mv88e6060 use nested mdiobus read/write · f0505610
      Neil Armstrong 提交于
      Like mv88e6xxx and mdio-mux, to avoid lockdep give false positives
      because of nested MDIO busses, switch to previously introduced
      nested mdiobus_read/write variants.
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0505610
    • N
      net: dsa: Make mv88e6xxx use nested mdiobus read/write · 6e899e6c
      Neil Armstrong 提交于
      Make the mv88e6xxx driver use the previously introduced nested
      variants of mdiobus_read/write functions.
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e899e6c
    • N
      net: phy: Add nested variants of mdiobus read/write · 21dd19fe
      Neil Armstrong 提交于
      Since nested variants of mdiobus_read/write are used in multiple
      drivers, add nested variants in the mdiobus core.
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21dd19fe
    • H
      ixgbe, ixgbevf: Add new mbox API xcast mode · 8443c1a4
      Hiroshi Shimamoto 提交于
      The limitation of the number of multicast address for VF is not enough
      for the large scale server with SR-IOV feature. IPv6 requires the multicast
      MAC address for each IP address to handle the Neighbor Solicitation
      message. We couldn't assign over 30 IPv6 addresses to a single VF.
      
      This patch introduces the new mailbox API, IXGBE_VF_UPDATE_XCAST_MODE,
      to update multicast mode of VF. This adds 3 modes;
        - NONE     only L2 exact match addresses or Flow Director enabled
        - MULTI    BAM and ROMPE set
        - ALLMULTI BAM, ROMPE and MPE set
      
      If a guest VF user wants over 30 MAC multicast addresses, set IFF_ALLMULTI
      to request PF to update xcast mode to enable VF multicast promiscuous mode.
      
      On the other hand, enabling VF multicast promiscuous mode may affect
      security and performance in the network of the NIC. Only trusted VF can
      enable multicast promiscuous mode. The behavior of untrusted VF is the
      same as previous version.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8443c1a4
    • H
      ixgbe: Add new ndo to trust VF · 54011e4d
      Hiroshi Shimamoto 提交于
      Implements the new netdev op to trust VF in ixgbe.
      
      The administrator can turn on and off VF trusted by ip command which
      supports trust message.
       # ip link set dev eth0 vf 1 trust on
      or
       # ip link set dev eth0 vf 1 trust off
      
      Send a ping to reset VF on changing the status of trusting.
      VF driver will reconfigure its features on reset.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      54011e4d
    • G
      drivers: net: cpsw: use module_platform_driver · 6fb3b6b5
      Grygorii Strashko 提交于
      There is no reasons to probe cpsw from late_initcall level
      and it's not recommended. Hence, use module_platform_driver()
      to register and probe cpsw driver from module_init() level.
      
      Cc: Tony Lindgren <tony@atomide.com>
      Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fb3b6b5
    • H
      if_link: Add control trust VF · dd461d6a
      Hiroshi Shimamoto 提交于
      Add netlink directives and ndo entry to trust VF user.
      
      This controls the special permission of VF user.
      The administrator will dedicatedly trust VF user to use some features
      which impacts security and/or performance.
      
      The administrator never turn it on unless VF user is fully trusted.
      
      CC: Sy Jong Choi <sy.jong.choi@intel.com>
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Acked-by: NGreg Rose <gregory.v.rose@intel.com>
      Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      dd461d6a
    • E
      tcp/dccp: fix hashdance race for passive sessions · 5e0724d0
      Eric Dumazet 提交于
      Multiple cpus can process duplicates of incoming ACK messages
      matching a SYN_RECV request socket. This is a rare event under
      normal operations, but definitely can happen.
      
      Only one must win the race, otherwise corruption would occur.
      
      To fix this without adding new atomic ops, we use logic in
      inet_ehash_nolisten() to detect the request was present in the same
      ehash bucket where we try to insert the new child.
      
      If request socket was not found, we have to undo the child creation.
      
      This actually removes a spin_lock()/spin_unlock() pair in
      reqsk_queue_unlink() for the fast path.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e0724d0
    • J
      i40e: fix unconditional execution of cpu_to_le16() · 2fc4cd52
      Jean Sacren 提交于
      The commit 3092e5e4cc79 ("i40e: add little endian conversion for
      checksum") fixed the checksum bug on big-endian architecture.
      
      But we should not execute cpu_to_le16() unconditionally. Thus, put
      cpu_to_le16() under certain condition.
      
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2fc4cd52
    • J
      i40e: clean up local variable initialization · 0e5229c6
      Jean Sacren 提交于
      In both i40e_calc_nvm_checksum() and i40e_update_nvm_checksum(), the
      local variables designated by 'ret_code' are overwritten immediately. As
      such, they should merely be declared.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0e5229c6
    • J
      i40evf: clean up local variable initialization · ed17f7e5
      Jean Sacren 提交于
      In i40evf_msix_aq(), the first two lines of rd32() are mainly to clear
      the registers. If we initialize 'val' at this point, it will be
      overwritten immediately. We shall simply discard the return value here.
      
      When we initialize 'val', we might as well include the mask in one step.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ed17f7e5
    • J
      i40e: add missing kernel-doc argument · 554f4544
      Jean Sacren 提交于
      The following kernel-doc arguments for their respective functions are
      missing:
      
      1) @cd_type_cmd_tso_mss for i40e_tso();
      2) @cd_type_cmd_tso_mss for i40e_tsyn();
      3) @tx_ring for i40e_tx_enable_csum().
      
      Add them all for the kernel-doc requirement.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      554f4544
    • J
      i40evf: add missing kernel-doc argument · 69c1d70a
      Jean Sacren 提交于
      @flush has been missing since the inception of i40evf_irq_enable(). Add
      it for the kernel doc.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      69c1d70a
    • A
      i40e: re-use %*ph specifier to hexdump a data · a3524e95
      Andy Shevchenko 提交于
      Instead of using a custom approach change the code to use %*ph format
      specifier.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a3524e95