1. 26 8月, 2014 19 次提交
    • D
      mvneta: Add missing if_vlan.h include. · 2d39d120
      David S. Miller 提交于
      drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_skb_tx_csum':
      drivers/net/ethernet/marvell/mvneta.c:1374:3: error: implicit declaration of function 'vlan_get_protocol' [-Werror=implicit-function-declaration]
         __be16 l3_proto = vlan_get_protocol(skb);
         ^
      Reporeted-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d39d120
    • W
      xen-netback: move netif_napi_add before binding interrupt · e24f8191
      Wei Liu 提交于
      Interrupt is enabled when bind_interdomain_evtchn_to_irqhandler returns.
      If there's interrupt pending interrupt handler is invoked.
      
      NAPI needs to be initialised before binding interrupt otherwise the
      interrupt handler will try to scheduling a NAPI instance that is not
      initialised yet, resulting in kernel OOPS.
      
      This fixes a regression introduced in ea2c5e13 ("xen-netback: move NAPI
      add/remove calls").
      
      Ideally function calls to create kthreads should also be moved before
      binding but I intent to fix this regression with minimal changes and
      refactor the code with another patch.
      Reported-by: NThomas Leonard <talex5@gmail.com>
      Signed-off-by: NWei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e24f8191
    • D
      Merge branch 'tso_fix' · 8dbb200f
      David S. Miller 提交于
      Vladislav Yasevich says:
      
      ====================
      Fix TSO and checksum issues with non-accelerated vlan traffic.
      
      I've recently ran across something rather interesting when testing vlans
      from inside VMs.  In some scenarios I was getting awfull thruput.
      Some debugging uncovered a very scary packet corruption.  I was
      seeing packets that had original TSO length as IP total length
      and their ip checksum was 0.  This was with e1000e driver.
      
      A bit more debugging uncovered an assumption made by that driver
      that skb->protocol will contain l3 protocol information.  This
      was not the case in my setup since I manually turned off vlan
      tx acceleration for the device.  This caused the driver to not
      initialize the tso information correctly and resulted in
      corrupt TSO frames on the wire.
      
      I decided to do some auditing of the usage of skb->protocols
      in the drivers.  Out of all the drivers, the included 8 appear
      to be effected.  They all allow user to control vlan acceleration
      settings, all support TSO on vlan devices, and all use
      skb->protocol to decide how to encode TSO information.  Some
      also have similar problems when initializing hw checksum information.
      On such device, it is simple enough to reproduce the issue.
      Simply turn off TX VLAN acceleration on the device, create a vlan,
      and run you favorite network performance tool.
      
      There is 1 driver I ran across that I belive will trigger a BUG
      in the system when used with vlans.  That driver is tile/tilepro.c
      I have not changed it in this patch set and would hope that
      the maintainer has time to look at it.
      
      V2: Fix i40ev using the wrong function name.  Full build.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dbb200f
    • V
      qlge: Fix TSO for non-accelerated vlan traffic · 1ee1cfe7
      Vlad Yasevich 提交于
      This device claims TSO support for vlans.  It also allows a user to
      control vlan acceleration offloading.  As such, it is possible to turn
      off vlan acceleration and configure a vlan which will continue to send
      TSO traffic.
      
      In such situation the packet passed down the the device will contain
      a vlan header and skb->protocol will be set to ETH_P_8021Q.
      The device assumes that skb->protocol contains network protocol
      value and uses that value to set up TSO information.
      This results in corrupted frames sent on the wire.
      
      This patch extracts the protocol value correctly by using a
      vlan_get_protocol() helper and corrects corrupt TSO frames.
      
      CC: Shahed Shaikh <shahed.shaikh@qlogic.com>
      CC: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      CC: Ron Mercer <ron.mercer@qlogic.com>
      CC: linux-driver@qlogic.com
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ee1cfe7
    • V
      mvneta: Fix TSO and checksum for non-acceleration vlan traffic · 817dbfa5
      Vlad Yasevich 提交于
      This driver doesn't appear to support vlan acceleration at
      all.  However, it does claim to support TSO and IP checksums
      for vlan devices.  Thus any configured vlan device would
      end up passing down partial checksums or TSO frames.
      
      The driver also uses the value from skb->protocol to
      determine TSO and checksum offload information, but assumes
      that skb->protocol holds the l3 protocol information.
      As a result, vlan traffic with partial checksums or TSO
      will fail those checks and TSO will not happen.
      
      Fix this by using vlan_get_protocol() helper.
      
      CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      817dbfa5
    • V
      i40evf: Fix TSO and hw checksums for non-accelerated vlan packets. · a12c4158
      Vlad Yasevich 提交于
      This device claims TSO and checksum support for vlans.  It also
      allows a user to control vlan acceleration offloading.  As such,
      it is possible to turn off vlan acceleration and configure a vlan
      which will continue to support TSO and hw checksums.
      
      In such situation the packet passed down the the device will contain
      a vlan header and skb->protocol will be set to ETH_P_8021Q.
      The device assumes that skb->protocol contains network protocol
      value and uses that value to set up TSO and checksum information.
      This results in corrupted frames sent on the wire.
      
      This patch extract the protocol value correctly and corrects TSO
      and checksums for non-accelerated traffic.
      
      Fix this by using vlan_get_protocol() helper.
      
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
      CC: Bruce Allan <bruce.w.allan@intel.com>
      CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
      CC: Don Skidmore <donald.c.skidmore@intel.com>
      CC: Greg Rose <gregory.v.rose@intel.com>
      CC: Alex Duyck <alexander.h.duyck@intel.com>
      CC: John Ronciak <john.ronciak@intel.com>
      CC: Mitch Williams <mitch.a.williams@intel.com>
      CC: Linux NICS <linux.nics@intel.com>
      CC: e1000-devel@lists.sourceforge.net
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a12c4158
    • V
      i40e: Fix TSO and hw checksums for non-accelerated vlan packets. · 3d34dd03
      Vlad Yasevich 提交于
      This device claims TSO and checksum support for vlans.  It also
      allows a user to control vlan acceleration offloading.  As such,
      it is possible to turn off vlan acceleration and configure a vlan
      which will continue to support TSO and hw checksums.
      
      In such situation the packet passed down the the device will contain
      a vlan header and skb->protocol will be set to ETH_P_8021Q.
      The device assumes that skb->protocol contains network protocol
      value and uses that value to set up TSO and checksum information.
      This results in corrupted frames sent on the wire.
      
      This patch extract the protocol value correctly and corrects TSO
      and checksums for non-accelerated traffic.
      
      Fix this by using vlan_get_protocol() helper.
      
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
      CC: Bruce Allan <bruce.w.allan@intel.com>
      CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
      CC: Don Skidmore <donald.c.skidmore@intel.com>
      CC: Greg Rose <gregory.v.rose@intel.com>
      CC: Alex Duyck <alexander.h.duyck@intel.com>
      CC: John Ronciak <john.ronciak@intel.com>
      CC: Mitch Williams <mitch.a.williams@intel.com>
      CC: Linux NICS <linux.nics@intel.com>
      CC: e1000-devel@lists.sourceforge.net
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d34dd03
    • V
      ehea: Fix TSO and hw checksums with non-accelerated vlan packets. · be1d1486
      Vlad Yasevich 提交于
      The driver claims that it can do TSO and IP checksums on vlan
      devices and also allows user to control vlan acceleration offloading.
      This makes it possible to push traffic to this driver that has TSO or
      partial checksums set, but also have a non-accelearted vlan
      header.  In this case, the driver will fail to correctly
      identify such traffic and will not correctly perform
      segmentation and checksum calculation.
      
      Fix this by using vlan_get_protocol() helper instead of
      assuming skb->protocol always has this information.
      
      CC: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be1d1486
    • V
      bna: Support TSO and partial checksum with non-accelerated vlans. · 1c53730a
      Vlad Yasevich 提交于
      This device claims TSO and checksum support for vlans.  It also
      allows a user to control vlan acceleration offloading.  As such,
      it is possible to turn off vlan acceleration and configure a vlan
      which will continue to support TSO.
      
      In such situation the packet passed down the the device will contain
      a vlan header and skb->protocol will be set to ETH_P_8021Q.
      The device assumes that skb->protocol contains network protocol
      value and uses that value to set up TSO information.  This results
      in corrupted frames sent on the wire.
      
      This patch extract the protocol value correctly and corrects TSO
      and checksums for non-accelerated traffic.
      
      CC: Rasesh Mody <rmody@brocade.com>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c53730a
    • V
      e1000: Fix TSO for non-accelerated vlan traffic · 06f4d033
      Vlad Yasevich 提交于
      This device claims TSO and checksum support for vlans.  It also
      allows a user to control vlan acceleration offloading.  As such,
      it is possible to turn off vlan acceleration and configure a vlan
      which will continue to support TSO.
      
      In such situation the packet passed down the the device will contain
      a vlan header and skb->protocol will be set to ETH_P_8021Q.
      The device assumes that skb->protocol contains network protocol
      value and uses that value to set up TSO and checksum information.
      This will results in corrupted frames sent on the wire.
      
      This patch extract the protocol value correctly and corrects TSO
      for non-accelerated traffic.
      
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
      CC: Bruce Allan <bruce.w.allan@intel.com>
      CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
      CC: Don Skidmore <donald.c.skidmore@intel.com>
      CC: Greg Rose <gregory.v.rose@intel.com>
      CC: Alex Duyck <alexander.h.duyck@intel.com>
      CC: John Ronciak <john.ronciak@intel.com>
      CC: Mitch Williams <mitch.a.williams@intel.com>
      CC: Linux NICS <linux.nics@intel.com>
      CC: e1000-devel@lists.sourceforge.net
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06f4d033
    • V
      e1000e: Fix TSO with non-accelerated vlans · 47ccd1ed
      Vlad Yasevich 提交于
      This device claims  TSO support for vlans.  It also allows a
      user to control vlan acceleration offloading.  As such, it is
      possible to turn off vlan acceleration and configure a vlan
      which will continue to support TSO.
      
      In such situation the packet passed down the the device will contain
      a vlan header and skb->protocol will be set to ETH_P_8021Q.
      The device assumes that skb->protocol contains network protocol
      value and uses that value to set up TSO information.  This results
      in corrupted frames sent on the wire.  Corruptions include
      incorrect IP total length and invalid IP checksum.
      
      This patch extract the protocol value correctly and corrects TSO
      for non-accelerated traffic.
      
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
      CC: Bruce Allan <bruce.w.allan@intel.com>
      CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
      CC: Don Skidmore <donald.c.skidmore@intel.com>
      CC: Greg Rose <gregory.v.rose@intel.com>
      CC: Alex Duyck <alexander.h.duyck@intel.com>
      CC: John Ronciak <john.ronciak@intel.com>
      CC: Mitch Williams <mitch.a.williams@intel.com>
      CC: Linux NICS <linux.nics@intel.com>
      CC: e1000-devel@lists.sourceforge.net
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47ccd1ed
    • J
      net: moxa: continue loop on skb allocation failure · 2b7890e7
      Jonas Jensen 提交于
      If netdev_alloc_skb_ip_align() fails, subsequent code will
      try to dereference an invalid pointer.
      
      Continue to next descriptor on error.
      
      While we're at it,
      
      1. eliminate the chance of an endless loop, replace the main
         loop with while(rx < budget)
      
      2. use napi_complete() and remove the explicit napi_gro_flush()
      Signed-off-by: NJonas Jensen <jonas.jensen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b7890e7
    • J
      net: moxa: synchronize DMA memory · 777fbc31
      Jonas Jensen 提交于
      DMA memory should be synchronized before data is passed
      to/from controller.
      
      Add dma_sync_single_for_cpu(.., DMA_FROM_DEVICE) to RX path
      and dma_sync_single_for_device(.., DMA_TO_DEVICE) to TX path.
      Signed-off-by: NJonas Jensen <jonas.jensen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      777fbc31
    • J
      net: moxa: replace build_skb() with netdev_alloc_skb_ip_align() / memcpy() · 9fe1b3bc
      Jonas Jensen 提交于
      build_skb() is used to make skbs out of existing RX ring memory
      which is bad because the RX ring is allocated only once, on probe.
      Memory corruption occur because said memory is reclaimed, i.e.
      __kfree_skb() (and eventually put_page()).
      
      Replace build_skb() with netdev_alloc_skb_ip_align() and use memcpy().
      
      Remove SKB_DATA_ALIGN() from RX buffer size while we're at it.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=69041Signed-off-by: NJonas Jensen <jonas.jensen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe1b3bc
    • J
      net: moxa: clear DESC1 on ndo_start_xmit() · b853f319
      Jonas Jensen 提交于
      TX buffer length is not cleared on ndo_start_xmit().
      Failing to do so can bug/hang the controller and
      cause TX interrupts to stop altogether.
      
      Remove the readl() and compute a new value for DESC1.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=69031Signed-off-by: NJonas Jensen <jonas.jensen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b853f319
    • M
      net: fix checksum features handling in netif_skb_features() · db115037
      Michal Kubeček 提交于
      This is follow-up to
      
        da08143b ("vlan: more careful checksum features handling")
      
      which introduced more careful feature intersection in vlan code,
      taking into account that HW_CSUM should be considered superset
      of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
      in order to avoid offloading mismatch warning when vlan is
      created on top of a bond consisting of slaves supporting IP/IPv6
      checksumming but not vlan Tx offloading.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db115037
    • G
      stmmac: set ptp_clock to NULL while unregister · f95f4045
      Giuseppe CAVALLARO 提交于
      This is to properly put to NULL the ptp_clock while un-register the PTP support.
      Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f95f4045
    • G
      stmmac: fix rx checksum programming · 978aded4
      Giuseppe CAVALLARO 提交于
      This patch is to fix the IPC bit into the GMAC control register
      that must be done after the core initialization otherwise it will
      not have any effect.
      Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      978aded4
    • A
      net: prevent of emerging cross-namespace symlinks · 4c75431a
      Alexander Y. Fomichev 提交于
      Code manipulating sysfs symlinks on adjacent net_devices(s)
      currently doesn't take into account that devices potentially
      belong to different namespaces.
      
      This patch trying to fix an issue as follows:
      - check for net_ns before creating / deleting symlink.
        for now only netdev_adjacent_rename_links and
        __netdev_adjacent_dev_remove are affected, afaics
        __netdev_adjacent_dev_insert implies both net_devs
        belong to the same namespace.
      - Drop all existing symlinks to / from all adj_devs before
        switching namespace and recreate them just after.
      Signed-off-by: NAlexander Y. Fomichev <git.user@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c75431a
  2. 23 8月, 2014 15 次提交
    • G
      vxlan: fix incorrect initializer in union vxlan_addr · a45e92a5
      Gerhard Stenzel 提交于
      The first initializer in the following
      
              union vxlan_addr ipa = {
                  .sin.sin_addr.s_addr = tip,
                  .sa.sa_family = AF_INET,
              };
      
      is optimised away by the compiler, due to the second initializer,
      therefore initialising .sin.sin_addr.s_addr always to 0.
      This results in netlink messages indicating a L3 miss never contain the
      missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
      The problem affects user space programs relying on an IP address being
      sent as part of a netlink message indicating a L3 miss.
      
      Changing
                  .sa.sa_family = AF_INET,
      to
                  .sin.sin_family = AF_INET,
      fixes the problem.
      Signed-off-by: NGerhard Stenzel <gerhard.stenzel@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a45e92a5
    • L
      Merge tag 'pwm/for-3.17-rc2' of... · 451fd722
      Linus Torvalds 提交于
      Merge tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm fix from Thierry Reding:
       "Just one bugfix for the PWM lookup table code that would cause a PWM
        channel to be set to the wrong period and polarity for non-perfect
        matches"
      
      * tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        pwm: Fix period and polarity in pwm_get() for non-perfect matches
      451fd722
    • M
      mac80211: fix channel switch for chanctx-based drivers · 47e4df94
      Michal Kazior 提交于
      The new_ctx pointer is set only for non-chanctx drivers.  This yielded a
      crash for chanctx-based drivers during channel switch finalization:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
        IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
      
      Use an adequate chanctx pointer to fix this.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47e4df94
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 433ab34d
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Here are some bug fixes that have piled up during ksummit/linuxcon.
      
         1) Fix endian problems in ibmveth, from Anton Blanchard.
      
         2) IPV6 routing code does GFP_KERNEL allocation in atomic, fix from
            Benjamin Block.
      
         3) SCTP association fixes from Daniel Borkmann.
      
         4) When multiple VLAN headers are present we have to make sure the
            second and subsequent ones are pullable in the SKB otherwise we
            blindly dereference garbage.  From Jiri Benc.
      
         5) The argument adjustment of the signature of hlist_add_after*()
            introduced a regression in the batman-adv code, fix from Sven
            Eckelmann.
      
         6) Fix TX hang handling to avoid a panic in i40e, from Anjali Singhai
            Jain.
      
         7) PTP flag test is inverted in i40e driver, from Jesse Brandeburg.
      
         8) ATM LEC driver needs to hold RTNL mutex over MTU changes, from
            Chas Williams.
      
         9) Truncate packets larger then the TPACKET_V3 format configured
            buffers, otherwise we overwrite past the end of said buffers.
            From Eric Dumazet.
      
        10) Fix endianness bugs in qlcnic firmware handling, from Rajesh
            Borundia and Shahed Shaikh.
      
        11) CXGB4 sometimes doesn't get all of the TX completion events it
            should resulting in SKBs getting stuck in the TX queue, from
            Hariprasad Shenai.
      
        12) When the FEC chip's PTP clock is disabled, you can't access the
            register.  Add necessary checks to avoid the resulting hang, from
            Fugang Duan"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
        drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef
        net: sctp: fix suboptimal edge-case on non-active active/retrans path selection
        net: sctp: spare unnecessary comparison in sctp_trans_elect_best
        net: ethernet: broadcom: bnx2x: Remove redundant #ifdef
        ibmveth: Fix endian issues with rx_no_buffer statistic
        net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings()
        openvswitch: fix panic with multiple vlan headers
        net: ipv6: fib: don't sleep inside atomic lock
        net: fec: ptp: avoid register access when ipg clock is disabled
        cxgb4: Free completed tx skbs promptly
        cxgb4: Fix race condition in cleanup
        sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe
        bnx2x: Revert UNDI flushing mechanism
        qlcnic: Fix endianess issue in firmware load from file operation
        qlcnic: Fix endianess issue in FW dump template header
        qlcnic: Fix flash access interface to application
        MAINTAINERS: Add section for MRF24J40 IEEE 802.15.4 radio driver
        macvlan: Allow setting multicast filter on all macvlan types
        packet: handle too big packets for PACKET_V3
        MAINTAINERS: add entry for ec_bhf driver
        ...
      433ab34d
    • R
      drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef · faaa5524
      Rasmus Villemoes 提交于
      Test for definedness of the macro which is actually defined (the
      change is hard to see: it is s/SSS/SSA/).
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faaa5524
    • D
      net: sctp: fix suboptimal edge-case on non-active active/retrans path selection · aa4a83ee
      Daniel Borkmann 提交于
      In SCTP, selection of active (T.ACT) and retransmission (T.RET)
      transports is being done whenever transport control operations
      (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
      
      Commits 4c47af4d ("net: sctp: rework multihoming retransmission
      path selection to rfc4960") and a7288c4d ("net: sctp: improve
      sctp_select_active_and_retran_path selection") have both improved
      it towards a more fine-grained and optimal path selection.
      
      Currently, the selection algorithm for T.ACT and T.RET is as follows:
      
      1) Elect the two most recently used ACTIVE transports T1, T2 for
         T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
      2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
         T.ACT<-T.PRI and T.RET<-T1
      3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
      4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
         T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
      
      Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
      T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
      still slightly suboptimal as it can lead to the following scenario:
      
      Setup:
              <A>                                <B>
          T1: p1p1 (10.0.10.10) <==>  .'`)  <==> p1p1 (10.0.10.12)  <= T.PRI
          T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
      
          net.sctp.rto_min = 1000
          net.sctp.path_max_retrans = 2
          net.sctp.pf_retrans = 0
          net.sctp.hb_interval = 1000
      
      T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
      link flapping). Here, the first time transmission is sent over PF path
      T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
      are sent over the INACTIVE path T1 (T.PRI), which is not good.
      
      After the patch, it's choosing better transports in both cases by
      modifying step 4):
      
      4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
         the most recently used (if avail) in PF, set T.RET<-T.ACT_new
      
      This will still select a best possible path in PF if available (which
      can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
      
      In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
      as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
      for a very short while before going back ACTIVE, it will guarantee that
      this path will be reselected for T.ACT/T.RET since T3 (PF) is not
      available.
      
      Previously, this was not possible, as we would only select between T.PRI
      and T.RET, and a possible T3 would be NULL due to the fact that we have
      just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
      and would select a suboptimal path when T.PRI/T.RET have worse properties.
      
      In the case that T.ACT_old permanently went to INACTIVE during this
      transition and there's no PF path available, plus T.PRI and T.RET are
      INACTIVE as well, we would now camp on T.ACT_old, but if everything is
      being INACTIVE there's really not much we can do except hoping for a
      successful HB to bring one of the transports back up again and, thus
      cause a new selection through sctp_assoc_control_transport().
      
      Now both tests work fine:
      
      Case 1:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(PF)
      
       3. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       5. T1 S(PF) T.ACT, T.RET
          T2 S(INACTIVE)
      
      [ 5.1 T1 S(INACTIVE) T.ACT, T.RET
            T2 S(INACTIVE) ]
      
       6. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
      Case 2:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(PF)
          T2 S(ACTIVE) T.ACT, T.RET
      
       3. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       5. T1 S(INACTIVE)
          T2 S(PF) T.ACT, T.RET
      
      [ 5.1 T1 S(INACTIVE)
            T2 S(INACTIVE) T.ACT, T.RET ]
      
       6. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa4a83ee
    • D
      net: sctp: spare unnecessary comparison in sctp_trans_elect_best · ea4f19c1
      Daniel Borkmann 提交于
      When both transports are the same, we don't have to go down that
      road only to realize that we will return the very same transport.
      We are guaranteed that curr is always non-NULL. Therefore, just
      short-circuit this special case.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea4f19c1
    • R
      net: ethernet: broadcom: bnx2x: Remove redundant #ifdef · 7d149c52
      Rasmus Villemoes 提交于
      Nothing defines _ASM_GENERIC_INT_L64_H, it is a weird way to check for
      64 bit longs, and u64 should be printed using %llx anyway.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d149c52
    • A
      ibmveth: Fix endian issues with rx_no_buffer statistic · cbd52281
      Anton Blanchard 提交于
      Hidden away in the last 8 bytes of the buffer_list page is a solitary
      statistic. It needs to be byte swapped or else ethtool -S will
      produce numbers that terrify the user.
      
      Since we do this in multiple places, create a helper function with a
      comment explaining what is going on.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbd52281
    • I
      net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings() · c10e4caf
      Iyappan Subramanian 提交于
      A NULL pointer dereference is possible for the argument ring->buf_pool
      which is passed to xgene_enet_free_desc_ring(), as ring could be NULL.
      
      And now since NULL pointers are being checked for before the calls to
      xgene_enet_free_desc_ring(), might as well take advantage of them and
      not call the function if the argument would be NULL.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c10e4caf
    • J
      openvswitch: fix panic with multiple vlan headers · 2ba5af42
      Jiri Benc 提交于
      When there are multiple vlan headers present in a received frame, the first
      one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
      skb beyond the VLAN TPID may be still non-linear, including the inner TCI
      and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
      does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
      executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
      
      This leads to two things:
      
      1. Part of the resulting ethernet header is in the non-linear part of the
         skb. When eth_type_trans is called later as the result of
         OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
         is in fact accessing random data when it reads past the TPID.
      
      2. network_header points into the ethernet header instead of behind it.
         mac_len is set to a wrong value (10), too.
      Reported-by: NYulong Pei <ypei@redhat.com>
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ba5af42
    • B
      net: ipv6: fib: don't sleep inside atomic lock · 793c3b40
      Benjamin Block 提交于
      The function fib6_commit_metrics() allocates a piece of memory in mode
      GFP_KERNEL while holding an atomic lock from higher up in the stack, in
      the function __ip6_ins_rt(). This produces the following BUG:
      
      > BUG: sleeping function called from invalid context at mm/slub.c:1250
      > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
      > 2 locks held by dhcpcd/2909:
      >  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20
      >  #1:  (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800
      > CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
      > Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
      >  0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
      >  ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
      >  0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
      > Call Trace:
      >  [<ffffffff81af135a>] dump_stack+0x4e/0x68
      >  [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120
      >  [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190
      >  [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110
      >  [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110
      >  [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80
      >  [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800
      >  [<ffffffff81a69535>] ip6_route_add+0x675/0x800
      >  [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800
      >  [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80
      >  [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260
      >  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
      >  [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180
      >  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
      >  [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20
      >  [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0
      >  [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40
      >  [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180
      >  [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770
      >  [<ffffffff81103735>] ? local_clock+0x25/0x30
      >  [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90
      >  [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0
      >  [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0
      >  [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0
      >  [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220
      >  [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10
      >  [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0
      >  [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150
      >  [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150
      >  [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230
      >  [<ffffffff811f989a>] ? might_fault+0x5a/0xb0
      >  [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0
      >  [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80
      >  [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20
      >  [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b
      
      Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.
      Signed-off-by: NBenjamin Block <bebl@mageta.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      793c3b40
    • N
      net: fec: ptp: avoid register access when ipg clock is disabled · 91c0d987
      Nimrod Andy 提交于
      The current kernel hang on i.MX6SX with rootfs mount from MMC.
      The root cause is that ptp uses a periodic timer to access enet register
      even if ipg clock is disabled.
      
      FEC ptp driver start one period timer to read 1588 counter register in the
      ptp init function that is called after FEC driver is probed.
      
      To save power, after FEC probe finish, FEC driver disable all clocks including
      ipg clock that is needed for register access.
      
      i.MX5x, i.MX6q/dl/sl FEC register access don't cause system hang when ipg clock
      is disabled, just return zero value. But for i.MX6sx SOC, it cause system hang.
      
      To avoid the issue, we need to check ptp clock status before ptp timer count access.
      Signed-off-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91c0d987
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 26d189b8
      Linus Torvalds 提交于
      Pull arm64 fixes from Will Deacon:
       "This small set of fixes addresses a few issues introduced during the
        merge window, including:
      
         - fix typo in I-cache detection that was causing us to treat all
           I-caches as aliasing
         - hook up memfd_create and getrandom syscalls for native and compat
         - revert a temporary hack for defconfig builds in -next (the audit
           tree changes didn't make it in this merge window)
         - a couple of UEFI fixes for TEXT_OFFSET fuzzing and /memreserve/
         - a simple sparsemem fix for 48-bit physical addressing
         - small defconfig updates to get autotesters working with X-gene"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        Revert "arm64: Do not invoke audit_syscall_* functions if !CONFIG_AUDIT_SYSCALL"
        arm64: mm: update max pa bits to 48
        arm64: ignore DT memreserve entries when booting in UEFI mode
        arm64: configs: Enable X-Gene SATA and ethernet in defconfig
        arm64: align randomized TEXT_OFFSET on 4 kB boundary
        asm-generic: add memfd_create system call to unistd.h
        arm64: compat: wire up memfd_create and getrandom syscalls for aarch32
        arm64: fix typo in I-cache policy detection
      26d189b8
    • L
      Merge tag 'iommu-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 1ae45cf0
      Linus Torvalds 提交于
      Pull IOMMU fixes from Joerg Roedel:
       "The fixes include:
      
         - fix a crash in the VT-d driver when devices with a driver attached
           are hot-unplugged
      
         - fix a AMD IOMMU driver crash with device assignment of 32 bit PCI
           devices to KVM guests
      
         - fix for a copy&paste error in generic IOMMU code.  Now the right
           function pointer is checked before calling"
      
      * tag 'iommu-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/core: Check for the right function pointer in iommu_map()
        iommu/amd: Fix cleanup_domain for mass device removal
        iommu/vt-d: Defer domain removal if device is assigned to a driver
      1ae45cf0
  3. 22 8月, 2014 6 次提交