1. 05 1月, 2015 5 次提交
  2. 14 12月, 2014 5 次提交
  3. 12 12月, 2014 15 次提交
    • M
      Fix race condition between vxlan_sock_add and vxlan_sock_release · 00c83b01
      Marcelo Leitner 提交于
      Currently, when trying to reuse a socket, vxlan_sock_add will grab
      vn->sock_lock, locate a reusable socket, inc refcount and release
      vn->sock_lock.
      
      But vxlan_sock_release() will first decrement refcount, and then grab
      that lock. refcnt operations are atomic but as currently we have
      deferred works which hold vs->refcnt each, this might happen, leading to
      a use after free (specially after vxlan_igmp_leave):
      
        CPU 1                            CPU 2
      
      deferred work                    vxlan_sock_add
        ...                              ...
                                         spin_lock(&vn->sock_lock)
                                         vs = vxlan_find_sock();
        vxlan_sock_release
          dec vs->refcnt, reaches 0
          spin_lock(&vn->sock_lock)
                                         vxlan_sock_hold(vs), refcnt=1
                                         spin_unlock(&vn->sock_lock)
          hlist_del_rcu(&vs->hlist);
          vxlan_notify_del_rx_port(vs)
          spin_unlock(&vn->sock_lock)
      
      So when we look for a reusable socket, we check if it wasn't freed
      already before reusing it.
      Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Fixes: 7c47cedf ("vxlan: move IGMP join/leave to work queue")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00c83b01
    • C
    • M
      net/mlx4: Add support for A0 steering · 7d077cd3
      Matan Barak 提交于
      Add the required firmware commands for A0 steering and a way to enable
      that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
      QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
      to configure and query the device.
      
      The different A0 DMFS (steering) modes are:
      
      Static - optimized performance, but flow steering rules are
      limited. This mode should be choosed explicitly by the user
      in order to be used.
      
      Dynamic - this mode should be explicitly choosed by the user.
      In this mode, the FW works in optimized steering mode as long as
      it can and afterwards automatically drops to classic (full) DMFS.
      
      Disable - this mode should be explicitly choosed by the user.
      The user instructs the system not to use optimized steering, even if
      the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
      steering in Default A0 DMFS mode).
      
      Default - this mode is implicitly choosed. In this mode, if the FW
      supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
      work at Disable A0 DMFS mode.
      
      Under SRIOV configuration, when the A0 steering mode is enabled,
      older guest VF drivers who aren't using the RX QP allocation flag
      (MLX4_RESERVE_A0_QP) will get a QP from the general range and
      fail when attempting to register a steering rule. To avoid that,
      the PF context behaviour is changed once on A0 static mode, to
      require support for the allocation flag in VF drivers too.
      
      In order to enable A0 steering, we use log_num_mgm_entry_size param.
      If the value of the parameter is not positive, we treat the absolute
      value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
      bit field enables static A0 steering.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d077cd3
    • M
      net/mlx4: Refactor QUERY_PORT · 431df8c7
      Matan Barak 提交于
      Currently QUERY_PORT is done as a part of QUERY_DEV_CAP firmware command.
      
      Since we would like to use it without querying all device capabilities,
      extract this part to be a function of its own.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      431df8c7
    • M
      net/mlx4_core: Add explicit error message when rule doesn't meet configuration · 579d059b
      Matan Barak 提交于
      When a given flow steering rule is invalid in respect to the current
      steering configuration, print the correct error message to the system log.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      579d059b
    • M
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak 提交于
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d57febe1
    • M
      net/mlx4: Add mlx4_bitmap zone allocator · 7a89399f
      Matan Barak 提交于
      The zone allocator is a mechanism which manages a few mlx4_bitmaps.
      
      When allocating a resource, the user indicates the desired zone of
      which this resource will be allocated from. If possible, the resource
      will be allocated from this zone. Otherwise, the resource will be
      allocated from a less-than, equal-to, higher-than priority zone,
      according to the desired zone's properties with that respective
      allocation order.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a89399f
    • D
      net/mlx4: Add a check if there are too many reserved QPs · ab256e5a
      Dotan Barak 提交于
      The number of reserved QPs is affected both from the firmware and
      from the driver's requirements. This patch adds a check that
      validates that this number is indeed feasable.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab256e5a
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
    • M
      net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42
      Matan Barak 提交于
      Previously, we've fired all our completion callbacks straight from our ISR.
      
      Some of those callbacks were lightweight (for example, mlx4_en's and
      IPoIB napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over those
      events could be so long, that we'll get into a hard lockup by the system
      watchdog.
      
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive completion
      event to a per-EQ list and schedule a tasklet. In the tasklet context
      we loop over all the CQs in the list and invoke the user callback.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dca0f42
    • O
      net/mlx4_core: Mask out host side virtualization features for guests · 383677da
      Or Gerlitz 提交于
      When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
      need not be told that host side virtualization features such as VST, FSM
      (MAC anti-spoofing) and running > 80 VFs are supported by the device.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      383677da
    • O
      net/mlx4_en: Set csum level for encapsulated packets · c58942f2
      Or Gerlitz 提交于
      This was dropped by mistake for the napi_gro_frags flow, fix that.
      
      Fixes: dd65beac ('net/mlx4_en: Extend usage of napi_gro_frags')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c58942f2
    • S
      be2net: Export tunnel offloads only when a VxLAN tunnel is created · 630f4b70
      Sriharsha Basavapatna 提交于
      The encapsulated offload flags shouldn't be unconditionally exported
      to the stack. The stack expects offloading to work across all tunnel
      types when those flags are set. This would break other tunnels (like
      GRE) since be2net currently supports tunnel offload for VxLAN only.
      
      Also, with VxLANs Skyhawk-R can offload only 1 UDP dport. If more
      than 1 UDP port is added, we should disable offloads in that case too.
      Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
      Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      630f4b70
    • K
      gianfar: Fix dma check map error when DMA_API_DEBUG is enabled · 0a4b5a24
      Kevin Hao 提交于
      We need to use dma_mapping_error() to check the dma address returned
      by dma_map_single/page(). Otherwise we would get warning like this:
        WARNING: at lib/dma-debug.c:1140
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc2-next-20141029 #196
        task: c0834300 ti: effe6000 task.ti: c0874000
        NIP: c02b2c98 LR: c02b2c98 CTR: c030abc4
        REGS: effe7d70 TRAP: 0700   Not tainted  (3.18.0-rc2-next-20141029)
        MSR: 00021000 <CE,ME>  CR: 22044022  XER: 20000000
      
        GPR00: c02b2c98 effe7e20 c0834300 00000098 00021000 00000000 c030b898 00000003
        GPR08: 00000001 00000000 00000001 749eec9d 22044022 1001abe0 00000020 ef278678
        GPR16: ef278670 ef278668 ef278660 070a8040 c087f99c c08cdc60 00029000 c0840d44
        GPR24: c08be6e8 c0840000 effe7e78 ef041340 00000600 ef114e10 00000000 c08be6e0
        NIP [c02b2c98] check_unmap+0x51c/0x9e4
        LR [c02b2c98] check_unmap+0x51c/0x9e4
        Call Trace:
        [effe7e20] [c02b2c98] check_unmap+0x51c/0x9e4 (unreliable)
        [effe7e70] [c02b31d8] debug_dma_unmap_page+0x78/0x8c
        [effe7ed0] [c03d1640] gfar_clean_rx_ring+0x208/0x488
        [effe7f40] [c03d1a9c] gfar_poll_rx_sq+0x3c/0xa8
        [effe7f60] [c04f8714] net_rx_action+0xc0/0x178
        [effe7f90] [c00435a0] __do_softirq+0x100/0x1fc
        [effe7fe0] [c0043958] irq_exit+0xa4/0xc8
        [effe7ff0] [c000d14c] call_do_irq+0x24/0x3c
        [c0875e90] [c00048a0] do_IRQ+0x8c/0xf8
        [c0875eb0] [c000ed10] ret_from_except+0x0/0x18
      
      For TX, we need to unmap the pages which has already been mapped and
      free the skb before return.
      
      For RX, move the dma mapping and error check to gfar_new_skb(). We
      would reuse the original skb in the rx ring when either allocating
      skb failure or dma mapping error.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a4b5a24
    • H
      cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call · 666224d4
      Hariprasad Shenai 提交于
      Remove use of calls into t4_fw_hello() with MASTER_MUST, which results in
      FW_HELLO_CMD_MASTERFORCE being set. The firmware doesn't support this and of
      course any existing PF Drivers will totally go for a toss.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      666224d4
  4. 11 12月, 2014 9 次提交
  5. 10 12月, 2014 6 次提交
    • F
      dummy: use MODULE_VERSION · 6c702fab
      Flavio Leitner 提交于
      Use MODULE_VERSION() now that dummy driver has a version.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c702fab
    • L
      amd-xgbe: Use disable_irq_nosync when in IRQ context · f9c5c62d
      Lendacky, Thomas 提交于
      The disable_irq_nosync function, not the disable_irq function, must be
      used to disable the DMA channel interrupt from within the interrupt
      service routine. Change the disable_irq call to disable_irq_nosync.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9c5c62d
    • D
      xen-netfront: use correct linear area after linearizing an skb · 11d3d2a1
      David Vrabel 提交于
      Commit 97a6d1bb (xen-netfront: Fix
      handling packets on compound pages with skb_linearize) attempted to
      fix a problem where an skb that would have required too many slots
      would be dropped causing TCP connections to stall.
      
      However, it filled in the first slot using the original buffer and not
      the new one and would use the wrong offset and grant access to the
      wrong page.
      
      Netback would notice the malformed request and stop all traffic on the
      VIF, reporting:
      
          vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
          vif vif-3-0 vif3.0: fatal error; disabling device
      Reported-by: NAnthony Wright <anthony@overnetdata.com>
      Tested-by: NAnthony Wright <anthony@overnetdata.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11d3d2a1
    • J
      netback: don't store invalid vif pointer · f15650b7
      Jan Beulich 提交于
      When xenvif_alloc() fails, it returns a non-NULL error indicator. To
      avoid eventual races, we shouldn't store that into struct backend_info
      as readers of it only check for NULL.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f15650b7
    • N
      net: fec: avoid kernal crash by NULL pointer when no phy connection · 213a9922
      Nimrod Andy 提交于
      On i.MX6SX sabreauto board, when there have no phy daughter board connection,
      there have kernel crash by NULL pointer:
      
      fec 2188000.ethernet eth0: could not attach to PHY
      Unable to handle kernel NULL pointer dereference at virtual address 00000220
      pgd = 80004000
      [00000220] *pgd=00000000
      Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.24-01042-g27eaeea-dirty #405
      task: d8078000 ti: d8076000 task.ti: d8076000
      PC is at mutex_lock+0x10/0x54
      LR is at phy_start+0x14/0x68
      pc : [<806ad4e4>]    lr : [<803b0f90>]    psr: 60000113
      sp : d8077d80  ip : 00000000  fp : d83cc000
      r10: 0000100c  r9 : d83cc800  r8 : 00000000
      r7 : d83bcd0c  r6 : 00000200  r5 : 00000220  r4 : 00000220
      r3 : 00000000  r2 : 00000000  r1 : d83bcd90  r0 : 00000220
      Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
      Control: 10c5387d  Table: 8000404a  DAC: 00000015
      Process swapper/0 (pid: 1, stack limit = 0xd8076240)
      Stack: (0xd8077d80 to 0xd8078000)
      7d80: 00000000 803b0f90 00000001 00000000 d83bc800 803be034 00000007 805c3fb4
      7da0: 00000003 80d4e0bc 805efcb8 fffffff1 fffffff0 00000000 00000000 d8077dfc
      7dc0: 0000000d 80d6ce80 80d126b0 800499c8 d83bc800 d83bc800 806f0f40 d83bc82c
      7de0: 00000000 00000000 80d6ce80 80d126b0 0000016b 80540250 d8076008 d83bc800
      7e00: 0000016b d83bc800 00001003 00000001 00001002 805404d4 d83bc800 00000120
      7e20: 00001002 00001002 00000000 805405d4 d83bc800 00000001 80d126c0 00001002
      7e40: 80dbc5dc 80d02024 00000000 806ae360 00000002 d6128420 d6127198 12400000
      7e60: 00000000 00000000 00000002 d61271e8 00000000 12400000 d801674c 800e49f0
      7e80: d6127198 d6124e58 00000000 80238848 d61271c4 00000000 00000001 d8016700
      7ea0: 80dd2e00 80d752c0 80d752c0 80cfdaec 0000010c 80239430 806c2e90 d800f080
      7ec0: d800f380 804e46b4 ffffffbc 80d15cb0 00000007 80d752c0 80d752c0 80d01e94
      7ee0: 0000010c d8076030 00000000 800088cc 80dbaba4 80bd411c d80a6f00 806b1e04
      7f00: 00000000 00000000 00000000 80125b84 00000000 80d2c56c 60000113 00000001
      7f20: ef7ff9df 806c80cc 0000010c 80043f5c 80c95eb8 00000007 ef7ffa1d 00000007
      7f40: 80d2c55c 80d15cb0 00000007 80d752c0 80d752c0 80ccc50c 0000010c 80d0a114
      7f60: 80d0a10c 80cccc04 00000007 00000007 80ccc50c 806ae410 00000000 8004cb84
      7f80: 80d17bc0 00000000 806a4bd4 00000000 00000000 00000000 00000000 00000000
      7fa0: 00000000 806a4bdc 00000000 8000e5f8 00000000 00000000 00000000 00000000
      7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 1e79a7bb e5337f77
      [<806ad4e4>] (mutex_lock) from [<803b0f90>] (phy_start+0x14/0x68)
      [<803b0f90>] (phy_start) from [<803be034>] (fec_enet_open+0x448/0x5dc)
      [<803be034>] (fec_enet_open) from [<80540250>] (__dev_open+0xa8/0x110)
      [<80540250>] (__dev_open) from [<805404d4>] (__dev_change_flags+0x88/0x170)
      [<805404d4>] (__dev_change_flags) from [<805405d4>] (dev_change_flags+0x18/0x48)
      [<805405d4>] (dev_change_flags) from [<80d02024>] (ip_auto_config+0x190/0xf94)
      [<80d02024>] (ip_auto_config) from [<800088cc>] (do_one_initcall+0xe8/0x144)
      [<800088cc>] (do_one_initcall) from [<80cccc04>] (kernel_init_freeable+0x104/0x1c8)
      [<80cccc04>] (kernel_init_freeable) from [<806a4bdc>] (kernel_init+0x8/0xec)
      [<806a4bdc>] (kernel_init) from [<8000e5f8>] (ret_from_fork+0x14/0x3c)
      Code: e92d4010 e3a03000 e1a04000 ee073fba (e1903f9f)
      
      Add phydev check to fix the issue.
      Signed-off-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      213a9922
    • F
      net: systemport: allow changing MAC address · fb3b596d
      Florian Fainelli 提交于
      Hook a ndo_set_mac_address callback, update the internal Ethernet MAC in
      the netdevice structure, and finally write that address down to the
      UniMAC registers. If the interface is down, and most likely clock gated,
      we do not update the registers but just the local copy, such that next
      ndo_open() call will effectively write down the address.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb3b596d