1. 27 1月, 2015 14 次提交
  2. 26 1月, 2015 26 次提交
    • E
      ipv6: tcp: fix race in IPV6_2292PKTOPTIONS · 1dc7b90f
      Eric Dumazet 提交于
      IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
      to charge the skb to socket.
      
      It means that destructor must be called while socket is locked.
      
      Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
      to protect ourselves : kfree_skb() might race with other users
      manipulating sk->sk_forward_alloc
      
      Fix this race by holding socket lock for the duration of
      ip6_datagram_recv_ctl()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dc7b90f
    • P
      rhashtable: fix rht_for_each_entry_safe() endless loop · 607954b0
      Patrick McHardy 提交于
      "next" is not updated, causing an endless loop for buckets with more than
      one element.
      
      Fixes: 88d6ed15 ("rhashtable: Convert bucket iterators to take table and index")
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      607954b0
    • S
      net/fsl: Replace spin_event_timeout() with arch independent in xgmac_mdio · 22f6bba7
      Shaohui Xie 提交于
      spin_event_timeout() is PPC dependent, use an arch independent
      equivalent instead.
      Signed-off-by: NShaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f6bba7
    • S
      net/fsl: drop in_be32() & out_be32() in xgmac_mdio · ca43e58c
      Shaohui Xie 提交于
      Use ioread32be() & iowrite32be() instead.
      Signed-off-by: NShaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca43e58c
    • E
      bonding: handle more gso types · 24f87d4c
      Eric Dumazet 提交于
      In commit 5a7baa78 ("bonding: Advertize vxlan offload features when
      supported"), Or Gerlitz added support conditional vxlan offload.
      
      In this patch I also add support for all kind of tunnels,
      but we allow a bonding device to not require segmentation,
      as it is always better to make this segmentation at the very last stage,
      if a particular slave device requires it.
      
      Tested:
      
       Setup a GRE tunnel,
       on a physical NIC not having tx-gre-segmentation.
       Results on bnx2x are even better, as we no longer have to segment
       in software.
      
      ethtool -K bond0 tx-gre-segmentation off
      
      super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
      7538.32
      
      ethtool -K bond0 tx-gre-segmentation on
      
      super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
      10200.5
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24f87d4c
    • D
      bridge: simplify br_getlink() a bit · 1b846f92
      Dan Carpenter 提交于
      Static checkers complain that we should maybe set "ret" before we do the
      "goto out;".  They interpret the NULL return from br_port_get_rtnl() as
      a failure and forgetting to set the error code is a common bug in this
      situation.
      
      The code is confusing but it's actually correct.  We are returning zero
      deliberately.  Let's re-write it a bit to be more clear.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b846f92
    • D
      Merge branch 'phy_dsa' · 5c66cfe0
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: phy and dsa random fixes/cleanups
      
      These two patches were already present as part of my attempt to make
      DSA modules work properly, these are the only two "valid" patches at
      this point which should not need any further rework.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c66cfe0
    • F
      net: dsa: bcm_sf2: factor interrupt disabling in a function · 691c9a8f
      Florian Fainelli 提交于
      Factor the interrupt disabling in a function: bcm_sf2_intr_disable()
      since we are doing the same thing in the setup and suspend paths.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      691c9a8f
    • F
      net: phy: fixed: allow setting no update_link callback · 799d4444
      Florian Fainelli 提交于
      fixed_phy_set_link_update() contains an early check against a NULL
      callback pointer, which basically prevents us from removing any
      previous callback we may have set. The users of the fp->link_update
      callback deal with a NULL callback just fine, so we really want to allow
      "removing" a link_update callback to avoid dangling callback pointers
      during e.g: module removal.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      799d4444
    • H
      net: ipv6: Add sysctl entry to disable MTU updates from RA · c2943f14
      Harout Hedeshian 提交于
      The kernel forcefully applies MTU values received in router
      advertisements provided the new MTU is less than the current. This
      behavior is undesirable when the user space is managing the MTU. Instead
      a sysctl flag 'accept_ra_mtu' is introduced such that the user space
      can control whether or not RA provided MTU updates should be applied. The
      default behavior is unchanged; user space must explicitly set this flag
      to 0 for RA MTUs to be ignored.
      Signed-off-by: NHarout Hedeshian <harouth@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2943f14
    • D
      Merge branch 'fib_trie_next' · 46a93af2
      David S. Miller 提交于
      Alexander Duyck says:
      
      ====================
      Fixes and improvements for recent fib_trie updates
      
      While performing testing and prepping the next round of patches I found a
      few minor issues and improvements that could be made.
      
      These changes should help to reduce the overall code size and improve the
      performance slighlty as I noticed a 20ns or so improvement in my worst-case
      testing which will likely only result in a 1ns difference with a standard
      sized trie.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46a93af2
    • A
      fib_trie: Various clean-ups for handling slen · 64c62723
      Alexander Duyck 提交于
      While doing further work on the fib_trie I noted a few items.
      
      First I was using calls that were far more complicated than they needed to
      be for determining when to push/pull the suffix length.  I have updated the
      code to reflect the simplier logic.
      
      The second issue is that I realised we weren't necessarily handling the
      case of a leaf_info struct surviving a flush.  I have updated the logic so
      that now we will call pull_suffix in the event of having a leaf info value
      left in the leaf after flushing it.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64c62723
    • A
      fib_trie: Move fib_find_alias to file where it is used · 02525368
      Alexander Duyck 提交于
      The function fib_find_alias is only accessed by functions in fib_trie.c as
      such it makes sense to relocate it and cast it as static so that the
      compiler can take advantage of optimizations it can do to it as a local
      function.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02525368
    • A
      fib_trie: Use empty_children instead of counting empty nodes in stats collection · 30cfe7c9
      Alexander Duyck 提交于
      It doesn't make much sense to count the pointers ourselves when
      empty_children already has a count for the number of NULL pointers stored
      in the tnode.  As such save ourselves the cycles and just use
      empty_children.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30cfe7c9
    • A
      fib_trie: Add collapse() and should_collapse() to resize · 95f60ea3
      Alexander Duyck 提交于
      This patch really does two things.
      
      First it pulls the logic for determining if we should collapse one node out
      of the tree and the actual code doing the collapse into a separate pair of
      functions.  This helps to make the changes to these areas more readable.
      
      Second it encodes the upper 32b of the empty_children value onto the
      full_children value in the case of bits == KEYLENGTH.  By doing this we are
      able to handle the case of a 32b node where empty_children would appear to
      be 0 when it was actually 1ul << 32.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95f60ea3
    • A
      fib_trie: Fall back to slen update on inflate/halve failure · a80e89d4
      Alexander Duyck 提交于
      This change corrects an issue where if inflate or halve fails we were
      exiting the resize function without at least updating the slen for the
      node.  To correct this I have moved the update of max_size into the while
      loop so that it is only decremented on a successful call to either inflate
      or halve.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a80e89d4
    • A
      fib_trie: Fix RCU bug and merge similar bits of inflate/halve · 69fa57b1
      Alexander Duyck 提交于
      This patch addresses two issues.
      
      The first issue is the fact that I believe I had the RCU freeing sequence
      slightly out of order.  As a result we could get into an issue if a caller
      went into a child of a child of the new node, then backtraced into the to be
      freed parent, and then attempted to access a child of a child that may have
      been consumed in a resize of one of the new nodes children.  To resolve this I
      have moved the resize after we have freed the oldtnode.  The only side effect
      of this is that we will now be calling resize on more nodes in the case of
      inflate due to the fact that we don't have a good way to test to see if a
      full_tnode on the new node was there before or after the allocation.  This
      should have minimal impact however since the node should already be
      correctly size so it is just the cost of calling should_inflate that we
      will be taking on the node which is only a couple of cycles.
      
      The second issue is the fact that inflate and halve were essentially doing
      the same thing after the new node was added to the trie replacing the old
      one.  As such it wasn't really necessary to keep the code in both functions
      so I have split it out into two other functions, called replace and
      update_children.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69fa57b1
    • A
      fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits · b3832117
      Alexander Duyck 提交于
      In doing performance testing and analysis of the changes I recently found
      that by shifting the index I had created an unnecessary dependency.
      
      I have updated the code so that we instead shift a mask by bits and then
      just test against that as that should save us about 2 CPU cycles since we
      can generate the mask while the key and pos are being processed.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3832117
    • D
      Merge branch 'mlx4-next' · bc579ae5
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      mlx4: Fix and enhance the device reset flow
      
      This series from Yishai Hadas fixes the device reset flow and adds SRIOV support.
      
      Reset flows are required whenever a device experiences errors, is unresponsive,
      or is not in a deterministic state. In such cases, the driver is expected to
      reset the HW and continue operation. When SRIOV is enabled, these requirements
      apply both to PF and VF devices.
      
      Currently, the mlx4 reset flow doesn't work properly: when a fatal error is
      detected on the FW internal buffer the chip is not reset and stays in its
      bad state. There are cases that assumed to be fatal such as non-responsive FW,
      errors via closing commands but are not handled today.
      
      The AER mechanism should also be fixed:
      - It should use mlx4_load_one instead of __mlx4_init_one which is done
        upon HCA probing.
      - It must be aligned with concurrent catas flow, mark device to be in
        an error state, reset chip, etc.
      - Port types should be restored to their original values before error occurred.
      
      In addition, there the SRIOV use-case isn't supported.
      
      In above cases when the device state becomes fatal we must act as follows:
      1) Reset the chip and mark the HW device state as in fatal error.
      2) Wake up any pending commands, preventing new ones to come in.
      3) Restart the software stack.
      
      We also address the SRIOV mode as follows: In case the PF detects a fatal error,
      it lets VFs know about that, then both itself and VFs are restarted asynchronously.
      However, in case only the VF encountered a fatal case or forced to be reset, they
      reset the VF stuff and then restart software.
      
      changes from V0:
      
      No need to call pci_disable_device upon permanent PCI error. This will
      be done as part of mlx4_remove_one which is called later once we
      return PCI_ERS_RESULT_DISCONNECT from the pci error handler.
      
      Initial toggle value should use only the T bit and not the whole byte value.
      Not doing so sometimes broke SRIOV as of junky value seen by the VF as a
      non-ready comm channel
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc579ae5
    • Y
      net/mlx4_core: Reset flow activation upon SRIOV fatal command cases · 0cd93027
      Yishai Hadas 提交于
      When SRIOV commands are executed over the comm-channel and get
      a fatal error (e.g. timeout, closing command failure) the VF enters
      into error state and reset flow is activated.
      
      To be able to recognize whether the failure was on a closing command, the
      operational code for the given VHCR command is used. Once the device entered
      into an error state we prevent redundant error messages from being printed.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cd93027
    • Y
      net/mlx4_core: Enable device recovery flow with SRIOV · 55ad3592
      Yishai Hadas 提交于
      In SRIOV, both the PF and the VF may attempt device recovery whenever they
      assume that the device is not functioning.  When the PF driver resets the
      device, the VF should detect this and attempt to reinitialize itself.
      
      The VF must be able to reset itself under all circumstances, even
      if the PF is not responsive.
      
      The VF shall reset itself in the following cases:
      
      1. Commands are not processed within reasonable time over the communication channel.
      This is done considering device state and the correct return code based on
      the command as was done in the native mode, done in the next patch.
      
      2. The VF driver receives an internal error event reported by the PF on the
      communication channel. This occurs when the PF driver resets the device or
      when VF is out of sync with the PF.
      
      Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
      PF is not responsive.
      
      As PF and VF may run their reset flow simulantanisly, there are several cases
      that are handled:
      - Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
      - Prevent PF getting VF commands before it has finished initializing its resources.
      - Upon VF startup, check that comm-channel is online before sending
        commands to the PF and getting timed-out.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55ad3592
    • Y
      net/mlx4_core: Handle AER flow properly · 2ba5fbd6
      Yishai Hadas 提交于
      Fix AER callbacks to work properly, it includes:
      - Refractoring AER to be aligned with Reset flow support.
      - Sync with concurrent catas flow.
      
      In addition, fix the shutdown PCI callback to sync with
      concurrent catas flow.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ba5fbd6
    • Y
      net/mlx4_core: Manage interface state for Reset flow cases · c69453e2
      Yishai Hadas 提交于
      We need to manage interface state to sync between reset flow and some other
      relative cases such as remove_one. This has to be done to prevent certain
      races. For example in case software stack is down as a result of unload call,
      the remove_one should skip the unload phase.
      
      Implement the remove_one case, handling AER and other cases comes next.
      
      The interface can be up/down, upon remove_one, the state will include an extra
      bit indicating that the device is cleaned-up, forcing other tasks to finish
      before the final cleanup.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c69453e2
    • Y
      net/mlx4_core: Activate reset flow upon fatal command cases · f5aef5aa
      Yishai Hadas 提交于
      We activate reset flow upon command fatal errors, when the device enters an
      erroneous state, and must be reset.
      
      The cases below are assumed to be fatal: FW command timed-out, an error from FW
      on closing commands, pci is offline when posting/pending a command.
      
      In those cases we place the device into an error state: chip is reset, pending
      commands are awakened and completed immediately. Subsequent commands will
      return immediately.
      
      The return code in the above cases will depend on the command. Commands which
      free and close resources will return success (because the chip was reset, so
      callers may safely free their kernel resources). Other commands will return -EIO.
      
      Since the device's state was marked as error, the catas poller will
      detect this and restart the device's software stack (as is done when a FW
      internal error is directly detected). The device state is protected by a
      persistent mutex lives on its mlx4_dev, as such no need any more for the
      hcr_mutex which is removed.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5aef5aa
    • Y
      net/mlx4_core: Enhance the catas flow to support device reset · f6bc11e4
      Yishai Hadas 提交于
      This includes:
      
      - resetting the chip when a fatal error is detected (the current code
        does not do this).
      
      - exposing the ability to enter error state from outside the catas code
        by calling its functionality. (E.g. FW Command timeout, AER error).
      
      - managing a persistent device state. This is needed to sync between
        reset flow cases.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6bc11e4
    • Y
      net/mlx4_core: Refactor the catas flow to work per device · ad9a0bf0
      Yishai Hadas 提交于
      Using a WQ per device instead of a single global WQ, this allows
      independent reset handling per device even when SRIOV is used.
      
      This comes as a pre-patch for supporting chip reset
      for both native and SRIOV.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad9a0bf0