1. 15 3月, 2016 17 次提交
    • A
      net: Fix use after free in the recvmmsg exit path · 34b88a68
      Arnaldo Carvalho de Melo 提交于
      The syzkaller fuzzer hit the following use-after-free:
      
        Call Trace:
         [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
         [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
         [<     inline     >] SYSC_recvmmsg net/socket.c:2281
         [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
         [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
        arch/x86/entry/entry_64.S:185
      
      And, as Dmitry rightly assessed, that is because we can drop the
      reference and then touch it when the underlying recvmsg calls return
      some packets and then hit an error, which will make recvmmsg to set
      sock->sk->sk_err, oops, fix it.
      Reported-and-Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Fixes: a2e27255 ("net: Introduce recvmmsg socket syscall")
      http://lkml.kernel.org/r/20160122211644.GC2470@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34b88a68
    • D
      Merge branch 'thunderx-perf' · b6e40382
      David S. Miller 提交于
      Sunil Goutham says:
      
      ====================
      net: thunderx: Performance enhancement changes
      
      Below patches attempts to improve performance by reducing
      no of atomic operations while allocating new receive buffers
      and reducing cache misses by adjusting nicvf structure elements.
      
      Changes from v1:
       No changes, resubmitting a fresh as per David's suggestion.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6e40382
    • S
      net: thunderx: Adjust nicvf structure to reduce cache misses · 1d368790
      Sunil Goutham 提交于
      Adjusted nicvf structure such that all elements used in hot
      path like napi, xmit e.t.c fall into same cache line. This reduced
      no of cache misses and resulted in ~2% increase in no of packets
      handled on a core.
      
      Also modified elements with :1 notation to boolean, to be
      consistent with other element definitions.
      Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d368790
    • S
      net: thunderx: Set recevie buffer page usage count in bulk · 5c2e26f6
      Sunil Goutham 提交于
      Instead of calling get_page() for every receive buffer carved out
      of page, set page's usage count at the end, to reduce no of atomic
      calls.
      Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c2e26f6
    • R
      tipc: make sure IPv6 header fits in skb headroom · 9bd160bf
      Richard Alpe 提交于
      Expand headroom further in order to be able to fit the larger IPv6
      header. Prior to this patch this caused a skb under panic for certain
      tipc packets when using IPv6 UDP bearer(s).
      Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bd160bf
    • D
      Merge branch 'mvneta-hwbm' · c9214f50
      David S. Miller 提交于
      Gregory CLEMENT says:
      
      ====================
      API set for HW Buffer management
      
      This is the sixth version of the API set for HW Buffer management (that was
      initially submitted here:
      http://thread.gmane.org/gmane.linux.kernel/2125152).
      
      This version is just a rebasing onto the last net-next. I also added
      the Tested-by flag from Sebastian Careba : "The patch set applies
      successfully and it works well, no more Samba issues any longer".
      
      For the record in the previous versions I made the following changes:
      v4 -> v5:
      - Add a field with the size of the buffer of the pool was added. It
        then allow to fix some misused size in the mvneta_bm code when using
        the new framework.
      
      - Add a new patch from Marcin for sram allowing to require
        non-bufferable access to the memory. It was needed for the hardware
        buffer management of the mvneta.
      
      - Fix the build issue notified by the 0-day builder when building the
        drivers as module.
      
      v3 -> v4
      - Fix build issue when HWBM is not selected
      
      v2 -> v3
      - Make a HWBM and a SWBM version of the mvneta_rx() function in order
        to reduce the the conditional code. Kept a condition inside the
        mvneta_poll because specializing this function would have means
        duplicating 95% of the code.
      
      - Put back the register_netdev() call at the end of the mvneta_probe()
        function. In order to have a unique ID for each port, just used a
        global variable in the driver.
      
      - Added a fix from Marcin in the "net: mvneta: bm: add support for
        hardware buffer management" patch: "when dropping packets, only
        buffer pointers passed from BM to descriptors have to be returned to
        the pool. In submitted version after closing the port and
        mvneta_rxq_deinit(), it was very likely that a lot of fake buffers
        are added to the pool, because all descriptors took part in
        iteration."
      
      - Removed the select MVNETA_BM from the Kconfig, it will let the user
        the choice to use not use it if they want.
      
      v1 -> v2
      - The hardware buffer management helpers are no more built by default
        and now depend on a hidden config symbol which has to be selected
        by the driver if needed
      - The hwbm_pool_refill() and hwbm_pool_add() now receive a gfp_t as
        argument allowing the caller to specify the flag it needs.
      - buf_num is now tested to ensure there is no wrapping
      - A spinlock has been added to protect the hwbm_pool_add() function in
        SMP or irq context.
      - used pr_warn instead of pr_debug in case of errors.
      - fixed the mvneta implementation by returning the buffer to the pool
        at various place instead of ignoring it.
      - Squashed "bus: mvenus-mbus: Fix size test for
         mvebu_mbus_get_dram_win_info" into bus: mvebu-mbus: provide api for
         obtaining IO and DRAM window information.
      - Added my signed-otf-by on all the patches as submitter of the series.
      - Renamed the dts patches with the pattern "ARM: dts: platform:"
      - Removed the patch "ARM: mvebu: enable SRAM support in
        mvebu_v7_defconfig" of this series and already applied it
      - Modified the order of the patches.
      
      In order to ease the test the branch mvneta-BM-framework-v6 is
      available at git@github.com:MISL-EBU-System-SW/mainline-public.git.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9214f50
    • G
      net: mvneta: Use the new hwbm framework · baa11ebc
      Gregory CLEMENT 提交于
      Now that the hardware buffer management framework had been introduced,
      let's use it.
      Tested-by: NSebastian Careba <nitroshift@yahoo.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baa11ebc
    • G
      net: add a hardware buffer management helper API · 8cb2d8bf
      Gregory CLEMENT 提交于
      This basic implementation allows to share code between driver using
      hardware buffer management. As the code is hardware agnostic, there is
      few helpers, most of the optimization brought by the an HW BM has to be
      done at driver level.
      Tested-by: NSebastian Careba <nitroshift@yahoo.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb2d8bf
    • M
      net: mvneta: bm: add support for hardware buffer management · dc35a10f
      Marcin Wojtas 提交于
      Buffer manager (BM) is a dedicated hardware unit that can be used by all
      ethernet ports of Armada XP and 38x SoC's. It allows to offload CPU on RX
      path by sparing DRAM access on refilling buffer pool, hardware-based
      filling of descriptor ring data and better memory utilization due to HW
      arbitration for using 'short' pools for small packets.
      
      Tests performed with A388 SoC working as a network bridge between two
      packet generators showed increase of maximum processed 64B packets by
      ~20k (~555k packets with BM enabled vs ~535 packets without BM). Also
      when pushing 1500B-packets with a line rate achieved, CPU load decreased
      from around 25% without BM to 20% with BM.
      
      BM comprise up to 4 buffer pointers' (BP) rings kept in DRAM, which
      are called external BP pools - BPPE. Allocating and releasing buffer
      pointers (BP) to/from BPPE is performed indirectly by write/read access
      to a dedicated internal SRAM, where internal BP pools (BPPI) are placed.
      BM hardware controls status of BPPE automatically, as well as assigning
      proper buffers to RX descriptors. For more details please refer to
      Functional Specification of Armada XP or 38x SoC.
      
      In order to enable support for a separate hardware block, common for all
      ports, a new driver has to be implemented ('mvneta_bm'). It provides
      initialization sequence of address space, clocks, registers, SRAM,
      empty pools' structures and also obtaining optional configuration
      from DT (please refer to device tree binding documentation). mvneta_bm
      exposes also a necessary API to mvneta driver, as well as a dedicated
      structure with BM information (bm_priv), whose presence is used as a
      flag notifying of BM usage by port. It has to be ensured that mvneta_bm
      probe is executed prior to the ones in ports' driver. In case BM is not
      used or its probe fails, mvneta falls back to use software buffer
      management.
      
      A sequence executed in mvneta_probe function is modified in order to have
      an access to needed resources before possible port's BM initialization is
      done. According to port-pools mapping provided by DT appropriate registers
      are configured and the buffer pools are filled. RX path is modified
      accordingly. Becaues the hardware allows a wide variety of configuration
      options, following assumptions are made:
      * using BM mechanisms can be selectively disabled/enabled basing
        on DT configuration among the ports
      * 'long' pool's single buffer size is tied to port's MTU
      * using 'long' pool by port is obligatory and it cannot be shared
      * using 'short' pool for smaller packets is optional
      * one 'short' pool can be shared among all ports
      
      This commit enables hardware buffer management operation cooperating with
      existing mvneta driver. New device tree binding documentation is added and
      the one of mvneta is updated accordingly.
      
      [gregory.clement@free-electrons.com: removed the suspend/resume part]
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc35a10f
    • M
      bus: mvebu-mbus: provide api for obtaining IO and DRAM window information · f2900ace
      Marcin Wojtas 提交于
      This commit enables finding appropriate mbus window and obtaining its
      target id and attribute for given physical address in two separate
      routines, both for IO and DRAM windows. This functionality
      is needed for Armada XP/38x Network Controller's Buffer Manager and
      PnC configuration.
      
      [gregory.clement@free-electrons.com: Fix size test for
      mvebu_mbus_get_dram_win_info]
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      [DRAM window information reference in LKv3.10]
      Signed-off-by: NEvan Wang <xswang@marvell.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2900ace
    • G
      ARM: dts: armada-xp-openblocks-ax3-4: Add BM support · 293fdc24
      Gregory CLEMENT 提交于
      Allow Openblock AX3 using hardware buffer management with mvneta.
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      293fdc24
    • M
      ARM: dts: armada-xp: enable buffer manager support on Armada XP boards · 9dd7a57e
      Marcin Wojtas 提交于
      Since mvneta driver supports using hardware buffer management (BM), in
      order to use it, board files have to be adjusted accordingly. This commit
      enables BM on AXP-DB and AXP-GP in same manner - because number of ports
      on those boards is the same as number of possible pools, each port is
      supposed to use single pool for all kind of packets.
      
      Moreover appropriate entry is added to 'soc' node ranges, as well as "okay"
      status for 'bm' and 'bm-bppi' (internal SRAM) nodes.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd7a57e
    • M
      ARM: dts: armada-xp: add buffer manager nodes · ebae1376
      Marcin Wojtas 提交于
      Armada XP network controller supports hardware buffer management (BM).
      Since it is now enabled in mvneta driver, appropriate nodes can be added
      to armada-xp.dtsi - for the actual common BM unit (bm@c0000) and its
      internal SRAM (bm-bppi), which is used for indirect access to buffer
      pointer ring residing in DRAM.
      
      Pools - ports mapping, bm-bppi entry in 'soc' node's ranges and optional
      parameters are supposed to be set in board files.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebae1376
    • M
      ARM: dts: armada-38x: enable buffer manager support on Armada 38x boards · c49e99c2
      Marcin Wojtas 提交于
      Since mvneta driver supports using hardware buffer management (BM), in
      order to use it, board files have to be adjusted accordingly. This commit
      enables BM on:
      * A385-DB-AP - each port has its own pool for long and common pool for
      short packets,
      * A388-ClearFog - same as above,
      * A388-DB - to each port unique 'short' and 'long' pools are mapped,
      * A388-GP - same as above.
      
      Moreover appropriate entry is added to 'soc' node ranges, as well as "okay"
      status for 'bm' and 'bm-bppi' (internal SRAM) nodes.
      
      [gregory.clement@free-electrons.com: add suppport for the ClearFog board]
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c49e99c2
    • M
      ARM: dts: armada-38x: add buffer manager nodes · 4a547a5a
      Marcin Wojtas 提交于
      Armada 38x network controller supports hardware buffer management (BM).
      Since it is now enabled in mvneta driver, appropriate nodes can be added
      to armada-38x.dtsi - for the actual common BM unit (bm@c8000) and its
      internal SRAM (bm-bppi), which is used for indirect access to buffer
      pointer ring residing in DRAM.
      
      Pools - ports mapping, bm-bppi entry in 'soc' node's ranges and optional
      parameters are supposed to be set in board files.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a547a5a
    • M
      misc: sram: add optional ioremap without write combining · eb43e023
      Marcin Wojtas 提交于
      Some SRAM users may require non-bufferable access to the memory, which is
      impossible, because devm_ioremap_wc() is used for setting sram->virt_base.
      
      This commit adds optional flag 'no-memory-wc', which allow to choose remap
      method, using DT property. Documentation is updated accordingly.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb43e023
    • D
      Merge tag 'wireless-drivers-next-for-davem-2016-03-14' of... · d3bf9b19
      David S. Miller 提交于
      Merge tag 'wireless-drivers-next-for-davem-2016-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers patches for 4.6
      
      Major changes:
      
      rtl8xxxu
      
      * add 8723bu support
      
      wl18xx
      
      * add radar_debug_mode debugfs file for DFS testing
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3bf9b19
  2. 14 3月, 2016 23 次提交
    • D
      Merge branch 'ipv4-ipv6-csums' · 20db778e
      David S. Miller 提交于
      Alexander Duyck says:
      
      ====================
      Fix differences between IPv4 and IPv6 TCP/UDP checksum calculation
      
      This patch series is meant to address the differences that exist between
      IPv4 and IPv6 in terms of checksum calculation.  Specifically the IPv6
      function csum_ipv6_magic treated length as a value that could be greater
      than 64K, while csum_tcpudp_magic was truncating the length at 16 bits.
      After looking over the code and giving it some thought I decided it would
      be best to update the IPv4 function so that it worked the same way the IPv6
      one did.  This allows us to get the same results given the same inputs for
      both functions.  As a result we can use the same processes to reverse the
      calculation in the event we need to do something like remove the length of
      the pseudo-header checksum.
      
      I also took the opportunity to standardize things so that the parameters
      for these functions all use the correct types.  IPv4 addresses are __be32,
      length should always be __u32, and protocol is a __u8.
      
      With this change in place it corrects an issue with UDP tunnels in which we
      were getting a checksum that was off by 1 when performing fragmentation on
      inner UDP packets.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20db778e
    • A
      GSO/UDP: Use skb->len instead of udph->len to determine length of original skb · 08334824
      Alexander Duyck 提交于
      It is possible for tunnels to end up generating IP or IPv6 datagrams that
      are larger than 64K and expecting to be segmented.  As such we need to deal
      with length values greater than 64K.  In order to accommodate this we need
      to update the code to work with a 32b length value instead of a 16b one.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08334824
    • A
      ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short · 1e940829
      Alexander Duyck 提交于
      This patch updates csum_ipv6_magic so that it correctly recognizes that
      protocol is a unsigned 8 bit value.
      
      This will allow us to better understand what limitations may or may not be
      present in how we handle the data.  For example there are a number of
      places that call htonl on the protocol value.  This is likely not necessary
      and can be replaced with a multiplication by ntohl(1) which will be
      converted to a shift by the compiler.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e940829
    • A
      ipv4: Update parameters for csum_tcpudp_magic to their original types · 01cfbad7
      Alexander Duyck 提交于
      This patch updates all instances of csum_tcpudp_magic and
      csum_tcpudp_nofold to reflect the types that are usually used as the source
      inputs.  For example the protocol field is populated based on nexthdr which
      is actually an unsigned 8 bit value.  The length is usually populated based
      on skb->len which is an unsigned integer.
      
      This addresses an issue in which the IPv6 function csum_ipv6_magic was
      generating a checksum using the full 32b of skb->len while
      csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
      run into issues when attempting to adjust the checksum as there was no
      protocol agnostic way to update it.
      
      With this change the value is still truncated as many architectures use
      "(len + proto) << 8", however this truncation only occurs for values
      greater than 16776960 in length and as such is unlikely to occur as we stop
      the inner headers at ~64K in size.
      
      I did have to make a few minor changes in the arm, mn10300, nios2, and
      score versions of the function in order to support these changes as they
      were either using things such as an OR to combine the protocol and length,
      or were using ntohs to convert the length which would have truncated the
      value.
      
      I also updated a few spots in terms of whitespace and type differences for
      the addresses.  Most of this was just to make sure all of the definitions
      were in sync going forward.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01cfbad7
    • D
      ipv4: Don't do expensive useless work during inetdev destroy. · fbd40ea0
      David S. Miller 提交于
      When an inetdev is destroyed, every address assigned to the interface
      is removed.  And in this scenerio we do two pointless things which can
      be very expensive if the number of assigned interfaces is large:
      
      1) Address promotion.  We are deleting all addresses, so there is no
         point in doing this.
      
      2) A full nf conntrack table purge for every address.  We only need to
         do this once, as is already caught by the existing
         masq_dev_notifier so masq_inet_event() can skip this.
      Reported-by: NSolar Designer <solar@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NCyrill Gorcunov <gorcunov@openvz.org>
      fbd40ea0
    • D
      Merge tag 'nfc-next-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · f4fa6e6d
      David S. Miller 提交于
      Samuel Ortiz says:
      
      ====================
      NFC 4.6 pull request
      
      This is a very small one this time, with only 5 patches.
      There are a couple of big items that could not be merged/finished
      on time.
      
      We have:
      
      - 2 LLCP fixes for a race and a potential OOM.
      - 2 cleanups for the pn544 and microread drivers.
      - 1 Maintainer addition for the s3fwrn5 driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4fa6e6d
    • D
      Merge branch 'macsec' · 01099881
      David S. Miller 提交于
      Sabrina Dubroca says:
      
      ====================
      MACsec IEEE 802.1AE implementation
      
      MACsec (IEEE 802.1AE [0]) is a protocol that provides security for
      wired ethernet LANs.  MACsec offers two protection modes:
      authentication only, or authenticated encryption.
      
      MACsec defines "secure channels" that allow transmission from one node
      to one or more others.  Communication on a channel is done over a
      succession of "secure associations", that each use a specific key.
      Secure associations are identified by their "association number" in
      the range 0..3.  A secure association is retired when its 32-bit
      packet number would wrap, and the same association number can later be
      reused with a new key and packet number.
      
      The standard mode of encryption is GCM AES with 128 bits keys,
      although an extension allows 256 bits keys [1] (not implemented in
      this submission).
      
      When using MACsec, an extra header, called "SecTAG", is added between
      the ethernet header and the original payload:
      
       +---------------------------------+----------------+----------------+
       |        (MACsec ethertype)       |     TCI_AN     |       SL       |
       +---------------------------------+----------------+----------------+
       |                           Packet Number                           |
       +-------------------------------------------------------------------+
       |                     Secure Channel Identifier                     |
       |                            (optional)                             |
       +-------------------------------------------------------------------+
      
      TCI_AN:
       version
       end_station
       sci_present
       scb
       encrypted
       changed_text
       association_number (2 bits)
      SL:
       short_length (6 bits)
       unused (2 bits)
      
      The ethertype for the packet is set to 0x88E5, and the original
      ethertype becomes part of the secure payload, which may be encrypted.
      The ethernet header and the SecTAG are always transmitted in the
      clear, but are integrity-protected.
      
      MACsec supports optional replay protection with a configurable replay
      window.
      
      MACsec is designed to be used with the MKA extension to 802.1X (MACsec
      Key Agreement protocol) [2], which provides channel attribution and
      key distribution to the nodes, but can also be used with static keys
      getting fed manually by an administrator.
      
      Optional (not supported yet) features:
       - confidentiality offset: in encryption mode, part of the payload may
         be left unencrypted.
       - choice of cipher suite: GCM AES with 256 bits has been standardised
         [1].
      
      Implementation
      
      A netdevice is created on top of a real device for each TX secure
      channel, like we do for VLANs.  Multiple TX channels can be created on
      top of the same underlying device.
      
      Several other approaches were considered for the RX path:
      
       - dev_add_pack: doesn't work, because we want to filter out
         unprotected packets
       - transparent mode: MACsec would be enabled directly on the real
         netdevice.  For this, we cannot use a rx_handler directly because
         MACsec must be available for underlying devices enslaved in a
         bridge or in a bond, so we need a hook directly in
         __netif_receive_skb_core.  This approach makes it harder to filter
         non-encrypted packets on RX without forcing the user to setup some
         rules, so the "transparent" mode is not so transparent after all.
         It also makes TX more complex than with a dedicated netdevice.
      
      One issue with the proposed implementation is that the qdisc layer for
      the real device operates on already encrypted packets.
      
      Netlink API
      
      This is currently a mix of rtnetlink (to create the device and set up
      the TX channel) and genl (for RX channels, secure associations and
      their keys).  genl provides clean demultiplexing of the {TX,RX}{SC,SA}
      commands.
      
      Use cases
      
      The normal use case is wired LANs, including veth and slave devices
      for bonding/teaming or bridges.
      
      MACsec can also be used on any device that makes a full ethernet
      header visible, for example VXLAN.
      The VXLAN+MACsec setup would be:
      
               hypervisor        |     virtual machine
          <real_dev>---<VXLAN>---|---<dev>---<macsec_dev>
      
      And the packets would look like this:
      
      | eth | IP | UDP | VXLAN | eth | MACsec | IP | ... | MACsec ICV |
      
      One benefit on this approach to encryption in the cloud is that the
      payload is encrypted by the tenant, not by the tunnel provider, thus
      the tenant has full control over the keys.
      
      Changes from v1:
       - rework netlink API after discussion with Johannes Berg
         - nest attributes, rename
         - export stats as separate attributes
         - add some comments
       - misc small fixes (rcu, constants, struct organization)
      
      Changes from RFCv2:
       - fix ENCODING_SA param validation
       - add parent link to netlink ifdumps
      
      Changes from RFCv1:
       - addressed comments from Florian and Paolo + kbuild robot
       - also perform post-decrypt handling after crypto callback
       - fixed ->dellink behavior
      
      Future plans:
       - offload to hardware, on nics that support it
       - implement optional features
      
      [0] http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf
      [1] http://standards.ieee.org/getieee802/download/802.1AEbn-2011.pdf
      [2] http://standards.ieee.org/getieee802/download/802.1X-2010.pdf
      [3] RFCv1: http://www.spinics.net/lists/netdev/msg358151.html
      [4] RFCv2: http://www.spinics.net/lists/netdev/msg362389.html
      [5] v1: http://www.spinics.net/lists/netdev/msg367959.html
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01099881
    • S
      macsec: introduce IEEE 802.1AE driver · c09440f7
      Sabrina Dubroca 提交于
      This is an implementation of MACsec/IEEE 802.1AE.  This driver
      provides authentication and encryption of traffic in a LAN, typically
      with GCM-AES-128, and optional replay protection.
      
      http://standards.ieee.org/getieee802/download/802.1AE-2006.pdfSigned-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c09440f7
    • S
    • S
      dece8d2b
    • L
      net: socket: use pr_info_once to tip the obsolete usage of PF_PACKET · f3c98690
      liping.zhang 提交于
      There is no need to use the static variable here, pr_info_once is more
      concise.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3c98690
    • Z
      at803x: fix suspend/resume for SGMII link · 98267311
      Zefir Kurtisi 提交于
      When operating the at803x in SGMII mode, resuming the chip
      from power down brings up the copper-side link but leaves
      the SGMII link in unconnected state (tested with at8031
      attached to gianfar). In effect, this caused a permanent
      link loss once the related interface was put down.
      
      This patch ensures that power down handling in supspend()
      and resume() is also applied to the SGMII link.
      Signed-off-by: NZefir Kurtisi <zefir.kurtisi@neratec.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98267311
    • D
      Merge branch 'net-more-bulk-free-users' · 5d608414
      David S. Miller 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      net: bulk free adjustment and two driver use-cases
      
      I've split out the bulk free adjustments, from the bulk alloc patches,
      as I want the adjustment to napi_consume_skb be in same kernel cycle
      the API was introduced.
      
      Adjustments based on discussion:
       Subj: "mlx4: use napi_consume_skb API to get bulk free operations"
       http://thread.gmane.org/gmane.linux.network/402503/focus=403386
      
      Patchset based on net-next at commit 3ebeac1d
      
      V4: more nitpicks from Sergei
      V3: spelling fixes from Sergei
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d608414
    • J
      mlx5: use napi_consume_skb API to get bulk free operations · 8ec736e5
      Jesper Dangaard Brouer 提交于
      Bulk free of SKBs happen transparently by the API call napi_consume_skb().
      The napi budget parameter is needed by napi_consume_skb() to detect
      if called from netpoll.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ec736e5
    • J
      mlx4: use napi_consume_skb API to get bulk free operations · b4a53379
      Jesper Dangaard Brouer 提交于
      Bulk free of SKBs happen transparently by the API call napi_consume_skb().
      The napi budget parameter is usually needed by napi_consume_skb()
      to detect if called from netpoll.  In this patch it has an extra meaning.
      
      For mlx4 driver, the mlx4_en_stop_port() call is done outside
      NAPI/softirq context, and cleanup the entire TX ring via
      mlx4_en_free_tx_buf().  The code mlx4_en_free_tx_desc() for
      freeing SKBs are shared with NAPI calls.
      
      To handle this shared use the zero budget indication is reused,
      and handled appropriately in napi_consume_skb(). To reflect this,
      variable is called napi_mode for the function call that needed
      this distinction.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4a53379
    • J
      net: adjust napi_consume_skb to handle non-NAPI callers · 885eb0a5
      Jesper Dangaard Brouer 提交于
      Some drivers reuse/share code paths that free SKBs between NAPI
      and non-NAPI calls. Adjust napi_consume_skb to handle this
      use-case.
      
      Before, calls from netpoll (w/ IRQs disabled) was handled and
      indicated with a budget zero indication.  Use the same zero
      indication to handle calls not originating from NAPI/softirq.
      Simply handled by using dev_consume_skb_any().
      
      This adds an extra branch+call for the netpoll case (checking
      in_irq() + irqs_disabled()), but that is okay as this is a slowpath.
      Suggested-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      885eb0a5
    • C
      r8169:Remove unnecessary phy reset for pcie nic when setting link spped. · c4556975
      Chun-Hao Lin 提交于
      For pcie nic, after setting link speed and there is no link driver does not need
      to do phy reset until link up.
      
      For some pcie nics, to do this will also reset phy speed down counter and prevent
      phy from auto speed down.
      
      This patch fix the issue reported in following link.
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1547151Signed-off-by: NChunhao Lin <hau@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4556975
    • J
      mlxsw: pci: Implement reset done check · 233fa44b
      Jiri Pirko 提交于
      Firmware now tells us that the reset is done by passing a magic value
      via register. Use it to shorten the wait in case this is supported.
      With old firmware, we still wait until the timeout is reached.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      233fa44b
    • M
      sctp: allow sctp_transmit_packet and others to use gfp · cea8768f
      Marcelo Ricardo Leitner 提交于
      Currently sctp_sendmsg() triggers some calls that will allocate memory
      with GFP_ATOMIC even when not necessary. In the case of
      sctp_packet_transmit it will allocate a linear skb that will be used to
      construct the packet and this may cause sends to fail due to ENOMEM more
      often than anticipated specially with big MTUs.
      
      This patch thus allows it to inherit gfp flags from upper calls so that
      it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
      similar. All others, like retransmits or flushes started from BH, are
      still allocated using GFP_ATOMIC.
      
      In netperf tests this didn't result in any performance drawbacks when
      memory is not too fragmented and made it trigger ENOMEM way less often.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea8768f
    • S
      ovs: allow nl 'flow set' to use ufid without flow key · 6f15cdbf
      Samuel Gauthier 提交于
      When we want to change a flow using netlink, we have to identify it to
      be able to perform a lookup. Both the flow key and unique flow ID
      (ufid) are valid identifiers, but we always have to specify the flow
      key in the netlink message. When both attributes are there, the ufid
      is used. The flow key is used to validate the actions provided by
      the userland.
      
      This commit allows to use the ufid without having to provide the flow
      key, as it is already done in the netlink 'flow get' and 'flow del'
      path. The flow key remains mandatory when an action is provided.
      Signed-off-by: NSamuel Gauthier <samuel.gauthier@6wind.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f15cdbf
    • N
      net: macb: fix default configuration for GMAC on AT91 · 6bdaa5e9
      Nicolas Ferre 提交于
      On AT91 SoCs, the User Register (USRIO) exposes a switch to configure the
      "Reduced" or "Traditional" version of the Media Independent Interface
      (RMII vs. MII or RGMII vs. GMII).
      As on the older EMAC version, on GMAC, this switch is set by default to the
      non-reduced type of interface, so use the existing capability and extend it to
      GMII as well. We then keep the current logic in the macb_init() function.
      
      The capabilities of sama5d2, sama5d4 and sama5d3 GEM interface are updated in
      the macb_config structure to be able to properly enable them with a traditional
      interface (GMII or MII).
      Reported-by: NRomain HENRIET <romain.henriet@l-acoustics.com>
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bdaa5e9
    • L
      phy: remove documentation of removed members of phy_device structure · 470c3822
      LABBE Corentin 提交于
      Commit e5a03bfd ("phy: Add an mdio_device structure") removed addr,
      bus and dev member of the phy_device structure.
      This patch remove the documentation about those members.
      Signed-off-by: NLABBE Corentin <clabbe.montjoie@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      470c3822
    • D
      Merge branch 'xen-netback-fix-multiple-extra-info-handling' · 3c4ef851
      David S. Miller 提交于
      Paul Durrant says:
      
      ====================
      xen-netback: fix multiple extra info handling
      
      If a frontend passes multiple extra info fragments to netback on the guest
      transmit side, because xen-netback does not account for this properly, only
      a single ack response will be sent. This will eventually cause processing
      of the shared ring to wedge.
      
      This series re-imports the canonical netif.h from Xen, where the ring
      protocol documentation has been updated, fixes this issue in xen-netback
      and also adds a patch to reduce log spam.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c4ef851