1. 04 11月, 2016 1 次提交
  2. 03 11月, 2016 18 次提交
    • G
      enic: set skb->hash type properly · 17197236
      Govindarajulu Varadarajan 提交于
      Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
      which is bit mask. This is wrong. Hw actually provides us enum.
      Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
      
      Fixes: bf751ba8 ("driver/net: enic: record q_number and rss_hash for skb")
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17197236
    • P
      net: 3com: typhoon: use new api ethtool_{get|set}_link_ksettings · f7a5537c
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Reviewed-by: NDavid Dillow <dave@thedillows.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7a5537c
    • T
      ila: Fix crash caused by rhashtable changes · 1913540a
      Tom Herbert 提交于
      commit ca26893f ("rhashtable: Add rhlist interface")
      added a field to rhashtable_iter so that length became 56 bytes
      and would exceed the size of args in netlink_callback (which is
      48 bytes). The netlink diag dump function already has been
      allocating a iter structure and storing the pointed to that
      in the args of netlink_callback. ila_xlat also uses
      rhahstable_iter but is still putting that directly in
      the arg block. Now since rhashtable_iter size is increased
      we are overwriting beyond the structure. The next field
      happens to be cb_mutex pointer in netlink_sock and hence the crash.
      
      Fix is to alloc the rhashtable_iter and save it as pointer
      in arg.
      
      Tested:
      
        modprobe ila
        ./ip ila add loc 3333:0:0:0 loc_match 2222:0:0:1,
        ./ip ila list  # NO crash now
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1913540a
    • C
      net: ip, diag -- Adjust raw_abort to use unlocked __udp_disconnect · 3de864f8
      Cyrill Gorcunov 提交于
      While being preparing patches for killing raw sockets via
      diag netlink interface I noticed that my runs are stuck:
      
       | [root@pcs7 ~]# cat /proc/`pidof ss`/stack
       | [<ffffffff816d1a76>] __lock_sock+0x80/0xc4
       | [<ffffffff816d206a>] lock_sock_nested+0x47/0x95
       | [<ffffffff8179ded6>] udp_disconnect+0x19/0x33
       | [<ffffffff8179b517>] raw_abort+0x33/0x42
       | [<ffffffff81702322>] sock_diag_destroy+0x4d/0x52
      
      which has not been the case before. I narrowed it down to the commit
      
       | commit 286c72de
       | Author: Eric Dumazet <edumazet@google.com>
       | Date:   Thu Oct 20 09:39:40 2016 -0700
       |
       |     udp: must lock the socket in udp_disconnect()
      
      where we start locking the socket for different reason.
      
      So the raw_abort escaped the renaming and we have to
      fix this typo using __udp_disconnect instead.
      
      Fixes: 286c72de ("udp: must lock the socket in udp_disconnect()")
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andrey Vagin <avagin@openvz.org>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3de864f8
    • W
      lan78xx: Use irq_domain for phy interrupt from USB Int. EP · cc89c323
      Woojung Huh 提交于
      To utilize phylib with interrupt fully than handling some of phy stuff in the MAC driver,
      create irq_domain for USB interrupt EP of phy interrupt and
      pass the irq number to phy_connect_direct() instead of PHY_IGNORE_INTERRUPT.
      
      Idea comes from drivers/gpio/gpio-dl2.c
      Signed-off-by: NWoojung Huh <woojung.huh@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc89c323
    • E
      tcp: enhance tcp collapsing · 2331ccc5
      Eric Dumazet 提交于
      As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
      time even if the skb at the right has fragments.
      
      We simply have to use more generic skb_copy_bits() instead of
      skb_copy_from_linear_data() in tcp_collapse_retrans()
      
      Also need to guard this skb_copy_bits() in case there is nothing to
      copy, otherwise skb_put() could panic if left skb has frags.
      
      Tested:
      
      Used following packetdrill test
      
      // Establish a connection.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      +.100 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
         +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
         +0 write(4, ..., 200) = 200
         +0 > P. 1:201(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 201:401(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 401:601(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 601:801(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 801:1001(200) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1001:1101(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1101:1201(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1201:1301(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1301:1401(100) ack 1
      
      +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401>
      // Check that TCP collapse works :
         +0 > P. 1:1001(1000) ack 1
      Reported-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2331ccc5
    • P
      net: 3c509: use new api ethtool_{get|set}_link_ksettings · b646cf29
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b646cf29
    • P
      net: 3c59x: use new api ethtool_{get|set}_link_ksettings · e19b7883
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e19b7883
    • P
      net: mii: add generic function to support ksetting support · bc8ee596
      Philippe Reynes 提交于
      The old ethtool api (get_setting and set_setting) has generic mii
      functions mii_ethtool_sset and mii_ethtool_gset.
      
      To support the new ethtool api ({get|set}_link_ksettings), we add
      two generics mii function mii_ethtool_{get|set}_link_ksettings_get.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc8ee596
    • D
      Merge branch 'mlx4-XDP-tx-refactor' · 55454e9e
      David S. Miller 提交于
      Tariq Toukan says:
      
      ====================
      mlx4 XDP TX refactor
      
      This patchset refactors the XDP forwarding case, so that
      its dedicated transmit queues are managed in a complete
      separation from the other regular ones.
      
      It also adds ethtool counters for XDP cases.
      
      Series generated against net-next commit:
      22ca904a genetlink: fix error return code in genl_register_family()
      
      Thanks,
      Tariq.
      
      v3:
      * Exposed per ring counters.
      
      v2:
      * Added ethtool counters.
      * Rebased, now patch 2 reverts Brenden's fix, as the bug no longer exists:
        958b3d39 ("net/mlx4_en: fixup xdp tx irq to match rx")
      * Updated commit message of patch 2.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55454e9e
    • T
      net/mlx4_en: Add ethtool statistics for XDP cases · 15fca2c8
      Tariq Toukan 提交于
      XDP statistics are reported in ethtool, in total and per ring,
      as follows:
      - xdp_drop: the number of packets dropped by xdp.
      - xdp_tx: the number of packets forwarded by xdp.
      - xdp_tx_full: the number of times an xdp forward failed
      	due to a full tx xdp ring.
      
      In addition, all packets that are dropped/forwarded by XDP
      are no longer accounted in rx_packets/rx_bytes of the ring,
      so that they count traffic that is passed to the stack.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15fca2c8
    • T
      net/mlx4_en: Refactor the XDP forwarding rings scheme · 67f8b1dc
      Tariq Toukan 提交于
      Separately manage the two types of TX rings: regular ones, and XDP.
      Upon an XDP set, do not borrow regular TX rings and convert them
      into XDP ones, but allocate new ones, unless we hit the max number
      of rings.
      Which means that in systems with smaller #cores we will not consume
      the current TX rings for XDP, while we are still in the num TX limit.
      
      XDP TX rings counters are not shown in ethtool statistics.
      Instead, XDP counters will be added to the respective RX rings
      in a downstream patch.
      
      This has no performance implications.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67f8b1dc
    • T
      net/mlx4_en: Add TX_XDP for CQ types · ccc109b8
      Tariq Toukan 提交于
      Support XDP CQ type, and refactor the CQ type enum.
      Rename the is_tx field to match the change.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc109b8
    • X
      sctp: clean up sctp_packet_transmit · e4ff952a
      Xin Long 提交于
      After adding sctp gso, sctp_packet_transmit is a quite big function now.
      
      This patch is to extract the codes for packing packet to sctp_packet_pack
      from sctp_packet_transmit, and add some comments, simplify the err path by
      freeing auth chunk when freeing packet chunk_list in out path and freeing
      head skb early if it fails to pack packet.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4ff952a
    • D
      Merge branch 'cls_flower-misc' · 92901827
      David S. Miller 提交于
      Roi Dayan says:
      
      ====================
      misc TC/flower changes
      
      This series includes two small changes to the TC flower classifier.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92901827
    • R
      net/sched: cls_flower: merge filter delete/destroy common code · 13fa876e
      Roi Dayan 提交于
      Move common code from fl_delete and fl_detroy to __fl_delete.
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13fa876e
    • R
      net/sched: cls_flower: add missing unbind call when destroying flows · a1a8f7fe
      Roi Dayan 提交于
      tcf_unbind was called in fl_delete but was missing in fl_destroy when
      force deleting flows.
      
      Fixes: 77b9900e ('tc: introduce Flower classifier')
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a8f7fe
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4cb551a1
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree. This includes better integration with the routing subsystem for
      nf_tables, explicit notrack support and smaller updates. More
      specifically, they are:
      
      1) Add fib lookup expression for nf_tables, from Florian Westphal. This
         new expression provides a native replacement for iptables addrtype
         and rp_filter matches. This is more flexible though, since we can
         populate the kernel flowi representation to inquire fib to
         accomodate new usecases, such as RTBH through skb mark.
      
      2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This
         new expression allow you to access skbuff route metadata, more
         specifically nexthop and classid fields.
      
      3) Add notrack support for nf_tables, to skip conntracking, requested by
         many users already.
      
      4) Add boilerplate code to allow to use nf_log infrastructure from
         nf_tables ingress.
      
      5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate
         the xtables cluster match, from Liping Zhang.
      
      6) Move socket lookup code into generic nf_socket_* infrastructure so
         we can provide a native replacement for the xtables socket match.
      
      7) Make sure nfnetlink_queue data that is updated on every packets is
         placed in a different cache from read-only data, from Florian Westphal.
      
      8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal.
      
      9) Start round robin number generation in nft_numgen from zero,
         instead of n-1, for consistency with xtables statistics match,
         patch from Liping Zhang.
      
      10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log,
          given we retry with a smaller allocation on failure, from Calvin Owens.
      
      11) Cleanup xt_multiport to use switch(), from Gao feng.
      
      12) Remove superfluous check in nft_immediate and nft_cmp, from
          Liping Zhang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cb551a1
  3. 02 11月, 2016 7 次提交
    • F
      netfilter: nf_queue: place volatile data in own cacheline · 886bc503
      Florian Westphal 提交于
      As the comment indicates, the data at the end of nfqnl_instance struct is
      written on every queue/dequeue, so it should reside in its own cacheline.
      
      Before this change, 'lock' was in first cacheline so we dirtied both.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      886bc503
    • L
      netfilter: nf_tables: remove useless U8_MAX validation · e41e9d62
      Liping Zhang 提交于
      After call nft_data_init, size is already validated and desc.len will
      not exceed the sizeof(struct nft_data), i.e. 16 bytes. So it will never
      exceed U8_MAX.
      
      Furthermore, in nft_immediate_init, we forget to call nft_data_uninit
      when desc.len exceeds U8_MAX, although this will not happen, but it's
      a logical mistake.
      
      Now remove these redundant validation introduced by commit 36b701fa
      ("netfilter: nf_tables: validate maximum value of u32 netlink attributes")
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e41e9d62
    • A
      netfilter: nf_tables: introduce routing expression · 2fa84193
      Anders K. Pedersen 提交于
      Introduces an nftables rt expression for routing related data with support
      for nexthop (i.e. the directly connected IP address that an outgoing packet
      is sent to), which can be used either for matching or accounting, eg.
      
       # nft add rule filter postrouting \
      	ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
      
      This will drop any traffic to 192.168.1.0/24 that is not routed via
      192.168.0.1.
      
       # nft add rule filter postrouting \
      	flow table acct { rt nexthop timeout 600s counter }
       # nft add rule ip6 filter postrouting \
      	flow table acct { rt nexthop timeout 600s counter }
      
      These rules count outgoing traffic per nexthop. Note that the timeout
      releases an entry if no traffic is seen for this nexthop within 10 minutes.
      
       # nft add rule inet filter postrouting \
      	ether type ip \
      	flow table acct { rt nexthop timeout 600s counter }
       # nft add rule inet filter postrouting \
      	ether type ip6 \
      	flow table acct { rt nexthop timeout 600s counter }
      
      Same as above, but via the inet family, where the ether type must be
      specified explicitly.
      
      "rt classid" is also implemented identical to "meta rtclassid", since it
      is more logical to have this match in the routing expression going forward.
      Signed-off-by: NAnders K. Pedersen <akp@cohaesio.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2fa84193
    • P
      netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c · 8db4c5be
      Pablo Neira Ayuso 提交于
      We need this split to reuse existing codebase for the upcoming nf_tables
      socket expression.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8db4c5be
    • P
      netfilter: nf_log: add packet logging for netdev family · 1fddf4ba
      Pablo Neira Ayuso 提交于
      Move layer 2 packet logging into nf_log_l2packet() that resides in
      nf_log_common.c, so this can be shared by both bridge and netdev
      families.
      
      This patch adds the boiler plate code to register the netdev logging
      family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1fddf4ba
    • F
      netfilter: nf_tables: add fib expression · f6d0cbcf
      Florian Westphal 提交于
      Add FIB expression, supported for ipv4, ipv6 and inet family (the latter
      just dispatches to ipv4 or ipv6 one based on nfproto).
      
      Currently supports fetching output interface index/name and the
      rtm_type associated with an address.
      
      This can be used for adding path filtering. rtm_type is useful
      to e.g. enforce a strong-end host model where packets
      are only accepted if daddr is configured on the interface the
      packet arrived on.
      
      The fib expression is a native nftables alternative to the
      xtables addrtype and rp_filter matches.
      
      FIB result order for oif/oifname retrieval is as follows:
       - if packet is local (skb has rtable, RTF_LOCAL set, this
         will also catch looped-back multicast packets), set oif to
         the loopback interface.
       - if fib lookup returns an error, or result points to local,
         store zero result.  This means '--local' option of -m rpfilter
         is not supported. It is possible to use 'fib type local' or add
         explicit saddr/daddr matching rules to create exceptions if this
         is really needed.
       - store result in the destination register.
         In case of multiple routes, search set for desired oif in case
         strict matching is requested.
      
      ipv4 and ipv6 behave fib expressions are supposed to behave the same.
      
      [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings")
      
      	http://patchwork.ozlabs.org/patch/688615/
      
        to address fallout from this patch after rebasing nf-next, that was
        posted to address compilation warnings. --pablo ]
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f6d0cbcf
    • W
      genetlink: fix error return code in genl_register_family() · 22ca904a
      Wei Yongjun 提交于
      Fix to return a negative error code from the idr_alloc() error handling
      case instead of 0, as done elsewhere in this function.
      
      Also fix the return value check of idr_alloc() since idr_alloc return
      negative errors on failure, not zero.
      
      Fixes: 2ae0f17d ("genetlink: use idr to track families")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22ca904a
  4. 01 11月, 2016 14 次提交