1. 03 3月, 2016 5 次提交
    • F
      netfilter: bridge: register hooks only when bridge interface is added · 5f6c253e
      Florian Westphal 提交于
      This moves bridge hooks to a register-when-needed scheme.
      
      We use a device notifier to register the 'call-iptables' netfilter hooks
      only once a bridge gets added.
      
      This means that if the initial namespace uses a bridge, newly created
      network namespaces no longer get the PRE_ROUTING ipt_sabotage hook.
      
      It will registered in that network namespace once a bridge is created
      within that namespace.
      
      A few modules still use global hooks:
      
      - conntrack
      - bridge PF_BRIDGE hooks
      - IPVS
      - CLUSTER match (deprecated)
      - SYNPROXY
      
      As long as these modules are not loaded/used, a new network namespace has
      empty hook list and NF_HOOK() will boil down to single list_empty test even
      if initial namespace does stateless packet filtering.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5f6c253e
    • F
      netfilter: xtables: don't hook tables by default · b9e69e12
      Florian Westphal 提交于
      delay hook registration until the table is being requested inside a
      namespace.
      
      Historically, a particular table (iptables mangle, ip6tables filter, etc)
      was registered on module load.
      
      When netns support was added to iptables only the ip/ip6tables ruleset was
      made namespace aware, not the actual hook points.
      
      This means f.e. that when ipt_filter table/module is loaded on a system,
      then each namespace on that system has an (empty) iptables filter ruleset.
      
      In other words, if a namespace sends a packet, such skb is 'caught' by
      netfilter machinery and fed to hooking points for that table (i.e. INPUT,
      FORWARD, etc).
      
      Thanks to Eric Biederman, hooks are no longer global, but per namespace.
      
      This means that we can avoid allocation of empty ruleset in a namespace and
      defer hook registration until we need the functionality.
      
      We register a tables hook entry points ONLY in the initial namespace.
      When an iptables get/setockopt is issued inside a given namespace, we check
      if the table is found in the per-namespace list.
      
      If not, we attempt to find it in the initial namespace, and, if found,
      create an empty default table in the requesting namespace and register the
      needed hooks.
      
      Hook points are destroyed only once namespace is deleted, there is no
      'usage count' (it makes no sense since there is no 'remove table' operation
      in xtables api).
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b9e69e12
    • F
      netfilter: xtables: prepare for on-demand hook register · a67dd266
      Florian Westphal 提交于
      This change prepares for upcoming on-demand xtables hook registration.
      
      We change the protoypes of the register/unregister functions.
      A followup patch will then add nf_hook_register/unregister calls
      to the iptables one.
      
      Once a hook is registered packets will be picked up, so all assignments
      of the form
      
      net->ipv4.iptable_$table = new_table
      
      have to be moved to ip(6)t_register_table, else we can see NULL
      net->ipv4.iptable_$table later.
      
      This patch doesn't change functionality; without this the actual change
      simply gets too big.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a67dd266
    • J
      netfilter: nf_defrag_ipv4: Drop redundant ip_send_check() · 5f547391
      Joe Stringer 提交于
      Since commit 0848f642 ("inet: frags: fix defragmented packet's IP
      header for af_packet"), ip_send_check() would be called twice for
      defragmentation that occurs from netfilter ipv4 defrag hooks. Remove the
      extra call.
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5f547391
    • P
      Merge tag 'ipvs-for-v4.6' of... · a7ed31cf
      Pablo Neira Ayuso 提交于
      Merge tag 'ipvs-for-v4.6' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next into HEAD
      
      Simon Horman says:
      
      ====================
      please consider these cleanups for IPVS for v4.6.
      
      * Arnd Bergmann has resolved a bunch of unused variable warnings and;
      * Yannick Brosseau has removed a noisy debug message
      ====================
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a7ed31cf
  2. 29 2月, 2016 3 次提交
  3. 28 2月, 2016 1 次提交
  4. 27 2月, 2016 5 次提交
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · e1bae75d
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2016-02-24
      
      This series contains updates to e1000e, igb and igbvf.
      
      Raanan provides updates for e1000e, first increases the ULP timer since it
      now takes longer for the ULP exit to complete on Skylake.  Fixes the
      configuration of the internal hardware PHY clock gating mechanism, which was
      causing packet loss due to mis configuring.  Fixed additional ULP
      configuration settings which were not being properly cleared after cable
      connect in V-Pro capable systems.  Added support for more i219 devices.
      
      Takuma Ueba provides a fix for I210 where IPv6 autoconf test sometimes
      fails due to DAD NS for link-local is not transmitted.  To avoid this
      issue, we need to wait until 1000BASE-T status register "Remote receiver
      status OK".
      
      Todd provides a patch to override EEPROM WoL settings for specific OEM
      devices. Then renamed igb defines to be more generic, since the define
      E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part.
      
      Roland Hii fixes an issue where only the half cycle time of less than or
      equal to 70 millisecond uses the I210 clock output function.  His patch
      adds additional conditions when half cycle time is equal to 125 or 250 or
      500 millisecond to use the clock output function.
      
      Alex Duyck adds support for generic transmit checksums for igb and igbvf.
      
      Jon Maxwell fixes an issues where customer applications are registering
      and un-registering multicast addresses every few seconds which is leading
      to many "Link is up" messages in the logs as a result of the
      netif_carrier_off(netdev) in igbvf_msix_other().  So remove the
      link is up message when registering multicast addresses.
      
      Corinna Vinschen provides a fix for when switching off VLAN offloading on
      i350, the VLAN interface becomes unusable.
      
      Stefan Assmann updates the driver to use ndo_stop() instead of
      dev_close() when running ethtool offline self test.  Since dev_close()
      causes IFF_UP to be cleared which will remove the interfaces routes
      and some addresses.
      
      v2: Dropped patches 6-10 in the original series.  Patch 6-7 added support
          for character device for AVB and based on community feedback, we do not
          want to do this.  Patches 8-10 provided fixes to the problematic code
          added in patches 6 & 7.  So all of them must go!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1bae75d
    • A
      GSO: Provide software checksum of tunneled UDP fragmentation offload · 22463876
      Alexander Duyck 提交于
      On reviewing the code I realized that GRE and UDP tunnels could cause a
      kernel panic if we used GSO to segment a large UDP frame that was sent
      through the tunnel with an outer checksum and hardware offloads were not
      available.
      
      In order to correct this we need to update the feature flags that are
      passed to the skb_segment function so that in the event of UDP
      fragmentation being requested for the inner header the segmentation
      function will correctly generate the checksum for the payload if we cannot
      segment the outer header.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22463876
    • D
      Merge branch 'vrf-saddr-selection' · c3f463ba
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: l3mdev: Fix source address for unnumbered deployments
      
      David Lamparter noted a use case where the source address selection fails
      to pick an address from a VRF interface - unnumbered interfaces. The use
      case has the VRF device as the VRF local loopback with addresses and
      interfaces enslaved without an address themselves. e.g,
      
          ip addr add 9.9.9.9/32 dev lo
          ip link set lo up
      
          ip link add name vrf0 type vrf table 101
          ip rule add oif vrf0 table 101
          ip rule add iif vrf0 table 101
          ip link set vrf0 up
          ip addr add 10.0.0.3/32 dev vrf0
      
          ip link add name dummy2 type dummy
          ip link set dummy2 master vrf0 up
      
          --> note dummy2 has no address - unnumbered device
      
          ip route add 10.2.2.2/32 dev dummy2 table 101
          ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
      
      ping to the 10.2.2.2 through the L3 domain:
      
          $ ping -I vrf0 -c1 10.2.2.2
          ping: Warning: source address might be selected on device other than vrf0.
          PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
      
      picks up the wrong address -- the one from 'lo' not vrf0. And from tcpdump:
          12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
      
      This patch series changes address selection to only consider devices in
      the same L3 domain and to use the VRF device as the L3 domains loopback.
      
          $ ping -I vrf0 -c1 10.2.2.2
          PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
      
      From tcpdump:
          12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
      
      Now the source address comes from vrf0.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f463ba
    • D
      net: l3mdev: prefer VRF master for source address selection · 17b693cd
      David Lamparter 提交于
      When selecting an address in context of a VRF, the vrf master should be
      preferred for address selection.  If it isn't, the user has a hard time
      getting the system to select to their preference - the code will pick
      the address off the first in-VRF interface it can find, which on a
      router could well be a non-routable address.
      Signed-off-by: NDavid Lamparter <equinox@diac24.net>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      [dsa: Fixed comment style and removed extra blank link ]
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17b693cd
    • D
      net: l3mdev: address selection should only consider devices in L3 domain · 3f2fb9a8
      David Ahern 提交于
      David Lamparter noted a use case where the source address selection fails
      to pick an address from a VRF interface - unnumbered interfaces.
      
      Relevant commands from his script:
          ip addr add 9.9.9.9/32 dev lo
          ip link set lo up
      
          ip link add name vrf0 type vrf table 101
          ip rule add oif vrf0 table 101
          ip rule add iif vrf0 table 101
          ip link set vrf0 up
          ip addr add 10.0.0.3/32 dev vrf0
      
          ip link add name dummy2 type dummy
          ip link set dummy2 master vrf0 up
      
          --> note dummy2 has no address - unnumbered device
      
          ip route add 10.2.2.2/32 dev dummy2 table 101
          ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
      
          tcpdump -ni dummy2 &
      
      And using ping instead of his socat example:
          $ ping -I vrf0 -c1 10.2.2.2
          ping: Warning: source address might be selected on device other than vrf0.
          PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
      
      >From tcpdump:
          12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
      
      Note the source address is from lo and is not a VRF local address. With
      this patch:
      
          $ ping -I vrf0 -c1 10.2.2.2
          PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
      
      >From tcpdump:
          12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
      
      Now the source address comes from vrf0.
      
      The ipv4 function for selecting source address takes a const argument.
      Removing the const requires touching a lot of places, so instead
      l3mdev_master_ifindex_rcu is changed to take a const argument and then
      do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
      This is similar to what l3mdev_fib_table_rcu does.
      
      IPv6 for unnumbered interfaces appears to be selecting the addresses
      properly.
      
      Cc: David Lamparter <david@opensourcerouting.org>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2fb9a8
  5. 26 2月, 2016 26 次提交
    • D
      Merge branch 'ethtool-ksettings' · 8d3f2806
      David S. Miller 提交于
      David Decotigny says:
      
      ====================
      new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API
      
      History:
       v9
       - add 'link' in macro, struct and function names
       - rename ethtool_link_ksettings::parent -> ::base
       - remove un-needed mlx4 en_dbg_enabled() companion patch
       - note: bitmap u32[] API patches were merged separately by Kan Liang
       v8
       - bitmap u32 API returns number of bits copied, unit tests updated
       v7
       - module_exit in test_bitmap
       v6
       - fix copy_from_user in user/kernel handshake
       v5
       note: please see v4 bullets for a question regarding bitmap.c
       - minor fix to make allyesconfig/allmodconfig
       v4
       - removed typedef for link mode bitmaps
       - moved bitmap<->u32[] conversion routines to bitmap.c . This is the
         naive implementation. I have an endian-aware version that uses
         memcpy/memset as much as possible, but I find it harder to follow
         (see http://paste.ubuntu.com/13863722/). Please let me know if I
         should use it instead.
       - fixes suggested by Ben Hutchings
       v3
       - rebased v2 on top of latest net-next, minor checkpatch/printf %*pb
         updates
       v2
       - keep return 0 in get_settings when successful, instead of
         propagating positive result from driver's get_settings callback.
       v1
       - original submission
      
      The main goal of this series is to support ethtool link mode masks
      larger than 32 bits. It implements a new ioctl pair
      (ETHTOOL_GLINKSETTINGS/SLINKSETTINGS), its associated callbacks
      (get/set_link_ksettings) and a new struct ethtool_link_settings, which
      should eventually replace legacy ethtool_cmd. Internally, the kernel
      uses fixed length link mode masks defined at compilation time in
      ethtool.h (for now: 31 bits), that can be increased by changing
      __ETHTOOL_LINK_MODE_LAST in ethtool.h (absolute max is 4064 bits,
      checked at compile time), and the user/kernel interface allows this
      length to be arbitrary within 1..4064. This should allow some
      flexibility without using too much heap/stack space, at the cost of a
      small kernel/user handshake for the user to determine the sizes of
      those bitmaps.
      
      Along the way, I chose to drop in the new structure the 3 ethtool_cmd
      fields marked "deprecated" (transceiver/maxrxpkt/maxtxpkt). They are
      still available for old drivers via the (old) ETHTOOL_GSET/SSET API,
      but are not available to drivers that switch to new API. Of those 3
      fields, ethtool_cmd::transceiver seems to be still actively used by
      several drivers, maybe we should not consider this field deprecated?
      The 2 other fields are basically not used. This transition requires
      some care in the way old and new ethtool talk to the kernel.
      
      More technical details provided in the description for main patch. In
      particular details about backward compatibility properties.
      
      Some open questions:
       - the kernel/interface multiplexes the "tell me the bitmap length"
         handshake and the "give me the settings" inside the new
         ETHTOOL_GLINKSETTINGS cmd. I was thinking of making this into 2
         separate cmds: 1 cmd ETHTOOL_GKERNELPROPERTIES which would be
         kernel-wide rather than device-specific, would return properties
         like "length of the link mode bitmaps", and possibly others. And
         ETHTOOL_GLINKSETTINGS would expect the proper bitmaps
       - the link mode bitmaps are piggybacked at tail of the new struct
         ethtool_link_settings. Since its user-visible definition does not
         assume specific bitmap width, I am using a 0-length array as the
         publicly visible placeholder. But then, the kernel needs to
         specialize it (struct ethtool_link_ksettings) to specify its
         current link mode masks. This means that kernel code is "littered"
         with "ksettings->base.field" to access "field" inside
         ethtool_settings:
         + I could use ethtool_link_settings everywhere (instead of a new
           ethtool_ksettings) and an container_of accessor (or a plain cast)
           to retrieve the link mode masks?
         + or: we could decide to make the link mode masks statically
           bounded again, ie. make their width public, but larger than
           current 32, and unchangeable forever. This would make everything
           straightforward, but we might hit limits later, or have an
           unneeded memory/stack usage for unused bits.
         any preference?
       - I foresee bugs where people use the legacy/deprecated SUPPORTED_x
         macros instead of the new ETHTOOL_LINK_MODE_x_BIT enums in the new
         get/set_link_ksettings callbacks. Not sure how to prevent problems
         with this.
      
      The only driver which was converted for now is mlx4. I am not
      considering fcoe as fully converted, but I updated it a minima to be
      able to remove __ethtool_get_settings, now known as
      __ethtool_get_link_ksettings.
      
      Tested with legacy and "future" ethtool on 64b x86 kernel and 32+64b
      ethtool, and on a 32b x86 kernel + 32b ethtool.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d3f2806
    • D
      3d8f7cc7
    • D
      net: ethtool: remove unused __ethtool_get_settings · 3237fc63
      David Decotigny 提交于
      replaced by __ethtool_get_link_ksettings.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3237fc63
    • D
      7cad1bac
    • D
      702b26a2
    • D
      57709798
    • D
      17605b96
    • D
      008eb736
    • D
      0ab6b544
    • D
      85f95819
    • D
      314d10d7
    • D
      9856909c
    • D
      96a0c396
    • D
      tx4939: use __ethtool_get_ksettings · 091a9277
      David Decotigny 提交于
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      091a9277
    • D
      net: ethtool: add new ETHTOOL_xLINKSETTINGS API · 3f1ac7a7
      David Decotigny 提交于
      This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
      handled by the new get_link_ksettings/set_link_ksettings callbacks.
      This API provides support for most legacy ethtool_cmd fields, adds
      support for larger link mode masks (up to 4064 bits, variable length),
      and removes ethtool_cmd deprecated
      fields (transceiver/maxrxpkt/maxtxpkt).
      
      This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
      the following backward compatibility properties:
       - legacy ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks.
       - legacy ethtool with new get/set_link_ksettings drivers: the new
         driver callbacks are used, data internally converted to legacy
         ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
         mode mask. ETHTOOL_SSET will fail if user tries to set the
         ethtool_cmd deprecated fields to
         non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
         driver sets higher bits.
       - future ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks, internally converted to new data
         structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
         ignored and seen as 0 from user space. Note that that "future"
         ethtool tool will not allow changes to these deprecated fields.
       - future ethtool with new drivers: direct call to the new callbacks.
      
      By "future" ethtool, what is meant is:
       - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
         fails
       - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
         ETHTOOL_GSET was successful
         + if ETHTOOL_GLINKSETTINGS was successful, then change config with
           ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
           ETHTOOL_SSET).
         + otherwise ETHTOOL_GSET was successful, change config with
           ETHTOOL_SSET. A failure there is final (do not try
           ETHTOOL_SLINKSETTINGS).
      
      The interaction user/kernel via the new API requires a small
      ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
      mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
      length it is expecting from user as a negative length (and cmd field is
      0). When kernel and user agree, kernel returns valid info in all
      fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).
      
      Data structure crossing user/kernel boundary is 32/64-bit
      agnostic. Converted internally to a legal kernel bitmap.
      
      The internal __ethtool_get_settings kernel helper will gradually be
      replaced by __ethtool_get_link_ksettings by the time the first
      "link_settings" drivers start to appear. So this patch doesn't change
      it, it will be removed before it needs to be changed.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f1ac7a7
    • D
      48133335
    • D
    • T
      net: Facility to report route quality of connected sockets · a87cb3e4
      Tom Herbert 提交于
      This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
      purpose is to allow an application to give feedback to the kernel about
      the quality of the network path for a connected socket. The value
      argument indicates the type of quality report. For this initial patch
      the only supported advice is a value of 1 which indicates "bad path,
      please reroute"-- the action taken by the kernel is to call
      dst_negative_advice which will attempt to choose a different ECMP route,
      reset the TX hash for flow label and UDP source port in encapsulation,
      etc.
      
      This facility should be useful for connected UDP sockets where only the
      application can provide any feedback about path quality. It could also
      be useful for TCP applications that have additional knowledge about the
      path outside of the normal TCP control loop.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87cb3e4
    • D
      net: ipv6: Make address flushing on ifdown optional · f1705ec1
      David Ahern 提交于
      Currently, all ipv6 addresses are flushed when the interface is configured
      down, including global, static addresses:
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          << nothing; all addresses have been flushed>>
      
      Add a new sysctl to make this behavior optional. The new setting defaults to
      flush all addresses to maintain backwards compatibility. When the set global
      addresses with no expire times are not flushed on an admin down. The sysctl
      is per-interface or system-wide for all interfaces
      
          $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
      or
          $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
      
      Will keep addresses on eth1 on an admin down.
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
              inet6 2100:1::2/120 scope global tentative
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
                 valid_lft forever preferred_lft forever
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1705ec1
    • F
      tipc: fix null deref crash in compat config path · 619b1745
      Florian Westphal 提交于
      msg.dst_sk needs to be set up with a valid socket because some callbacks
      later derive the netns from it.
      
      Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
      Reported-by: NJon Maloy <maloy@donjonn.com>
      Bisected-by: NJon Maloy <maloy@donjonn.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619b1745
    • J
      tipc: fix crash during node removal · d25a0125
      Jon Paul Maloy 提交于
      When the TIPC module is unloaded, we have identified a race condition
      that allows a node reference counter to go to zero and the node instance
      being freed before the node timer is finished with accessing it. This
      leads to occasional crashes, especially in multi-namespace environments.
      
      The scenario goes as follows:
      
      CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
      
      1:                                          if(!mod_timer())
      2: if (del_timer())
      3:   tipc_node_put()                                        // ref -> 1
      4: tipc_node_put()                                          // ref -> 0
      5:   kfree_rcu(node);
      6:                                               tipc_node_get(node)
      7:                                               // BOOM!
      
      We now clean up this functionality as follows:
      
      1) We remove the node pointer from the node lookup table before we
         attempt deactivating the timer. This way, we reduce the risk that
         tipc_node_find() may obtain a valid pointer to an instance marked
         for deletion; a harmless but undesirable situation.
      
      2) We use del_timer_sync() instead of del_timer() to safely deactivate
         the node timer without any risk that it might be reactivated by the
         timeout handler. There is no risk of deadlock here, since the two
         functions never touch the same spinlocks.
      
      3: We remove a pointless tipc_node_get() + tipc_node_put() from the
         timeout handler.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25a0125
    • J
      tipc: eliminate risk of finding to-be-deleted node instance · b170997a
      Jon Paul Maloy 提交于
      Although we have never seen it happen, we have identified the
      following problematic scenario when nodes are stopped and deleted:
      
      CPU0:                            CPU1:
      
      tipc_node_xxx()                                   //ref == 1
         tipc_node_put()                                //ref -> 0
                                       tipc_node_find() // node still in table
             tipc_node_delete()
               list_del_rcu(n. list)
                                       tipc_node_get()  //ref -> 1, bad
               kfree_rcu()
      
                                       tipc_node_put() //ref to 0 again.
                                       kfree_rcu()     // BOOM!
      
      We fix this by introducing use of the conditional kref_get_if_not_zero()
      instead of kref_get() in the function tipc_node_find(). This eliminates
      any risk of post-mortem access.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b170997a
    • D
      Merge branch 'qed-misc' · 3da7611f
      David S. Miller 提交于
      Yuval Mintz says:
      
      ====================
      qed*: Driver updates
      
      Usually I try to provide a sensible description of the patch set even if
      it lacks a general 'motif', but this simply contains several small,
      unrelated and self-explenatory tweaks and additions.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3da7611f
    • Y
      qed, qede: rebrand module description · 5abd7e92
      Yuval Mintz 提交于
      Drop the `QL4xxx 40G/100G' and use `FastLinQ 4xxxx' instead.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5abd7e92
    • Y
      qed: Prevent probe on previous error · 0dfaba6d
      Yuval Mintz 提交于
      Don't allow driver to probe on an adapter at a failed state;
      Gracefully block the probe instead.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dfaba6d
    • Y
      qed: add MODULE_FIRMWARE() · d43d3f0f
      Yuval Mintz 提交于
      Module is using a binary firmware file and so should be marked as such.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d43d3f0f