1. 27 1月, 2017 3 次提交
    • P
      tcp: don't annotate mark on control socket from tcp_v6_send_response() · 92e55f41
      Pablo Neira 提交于
      Unlike ipv4, this control socket is shared by all cpus so we cannot use
      it as scratchpad area to annotate the mark that we pass to ip6_xmit().
      
      Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
      family caches the flowi6 structure in the sctp_transport structure, so
      we cannot use to carry the mark unless we later on reset it back, which
      I discarded since it looks ugly to me.
      
      Fixes: bf99b4de ("tcp: fix mark propagation with fwmark_reflect enabled")
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92e55f41
    • K
      ravb: unmap descriptors when freeing rings · a47b70ea
      Kazuya Mizuguchi 提交于
      "swiotlb buffer is full" errors occur after repeated initialisation of a
      device - f.e. suspend/resume or ip link set up/down. This is because memory
      mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
      is not released.  Resolve this problem by unmapping descriptors when
      freeing rings.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: NKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      [simon: reworked]
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a47b70ea
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 086cb6a4
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains a large batch with Netfilter fixes for
      your net tree, they are:
      
      1) Two patches to solve conntrack garbage collector cpu hogging, one to
         remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
         vs. evicted entries) to make a decision on whether to reduce or not
         the scanning interval. From Florian Westphal.
      
      2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
         is not set. Moreover, don't decrenent set->nelems from abort patch
         if -ENFILE which leaks a spare slot in the set. This includes a
         patch to deconstify the set walk callback to update set->ndeact.
      
      3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
         reply packets both from nf_reject and local stack, from Pau Espin Pedrol.
      
      4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
         fib expression, from Liping Zhang.
      
      5) Fix oops on stateful objects netlink dump, when no filter is specified.
         Also from Liping Zhang.
      
      6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
         to fix that was applied in the previous batch for net. From Arnd Bergmann.
      
      7) Fix lack of string validation in table, chain, set and stateful
         object names in nf_tables, from Liping Zhang. Moreover, restrict
         maximum log prefix length to 127 bytes, otherwise explicitly bail
         out.
      
      8) Two patches to fix spelling and typos in nf_tables uapi header file
         and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      086cb6a4
  2. 26 1月, 2017 22 次提交
    • D
      Merge tag 'batadv-net-for-davem-20170125' of git://git.open-mesh.org/linux-merge · 214767fa
      David S. Miller 提交于
      Simon Wunderlich says:
      
      ====================
      Here is a batman-adv bugfix:
      
       - fix reference count handling on fragmentation error, by Sven Eckelmann
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      214767fa
    • J
      virtio_net: reject XDP programs using header adjustment · 529ec6ac
      Jakub Kicinski 提交于
      commit 17bedab2 ("bpf: xdp: Allow head adjustment in XDP prog")
      added a new XDP helper to prepend and remove data from a frame.
      Make virtio_net reject programs making use of this helper until
      proper support is added.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      529ec6ac
    • J
      virtio_net: use dev_kfree_skb for small buffer XDP receive · b68df015
      John Fastabend 提交于
      In the small buffer case during driver unload we currently use
      put_page instead of dev_kfree_skb. Resolve this by adding a check
      for virtnet mode when checking XDP queue type. Also name the
      function so that the code reads correctly to match the additional
      check.
      
      Fixes: bb91accf ("virtio-net: XDP support for small buffers")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b68df015
    • D
      Merge branch 'r8152-napi-fixes' · 7480888f
      David S. Miller 提交于
      Hayes Wang says:
      
      ====================
      r8152: fix scheduling napi
      
      v3:
      simply the argument for patch #3. Replace &tp->napi with napi.
      
      v2:
      Add smp_mb__after_atomic() for patch #1.
      
      v1:
      Scheduling the napi during the following periods would let it be ignored.
      And the events wouldn't be handled until next napi_schedule() is called.
      
      1. after napi_disable and before napi_enable().
      2. after all actions of napi function is completed and before calling
         napi_complete().
      
      If no next napi_schedule() is called, tx or rx would stop working.
      
      In order to avoid these situations, the followings solutions are applied.
      
      1. prevent start_xmit() from calling napi_schedule() during runtime suspend
         or after napi_disable().
      2. re-schedule the napi for tx if it is necessary.
      3. check if any rx is finished or not after napi_enable().
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7480888f
    • H
      r8152: check rx after napi is enabled · 7489bdad
      hayeswang 提交于
      Schedule the napi after napi_enable() for rx, if it is necessary.
      
      If the rx is completed when napi is disabled, the sheduling of napi
      would be lost. Then, no one handles the rx packet until next napi
      is scheduled.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7489bdad
    • H
      r8152: re-schedule napi for tx · 248b213a
      hayeswang 提交于
      Re-schedule napi after napi_complete() for tx, if it is necessay.
      
      In r8152_poll(), if the tx is completed after tx_bottom() and before
      napi_complete(), the scheduling of napi would be lost. Then, no
      one handles the next tx until the next napi_schedule() is called.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      248b213a
    • H
      r8152: avoid start_xmit to schedule napi when napi is disabled · de9bf29d
      hayeswang 提交于
      Stop the tx when the napi is disabled to prevent napi_schedule() is
      called.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de9bf29d
    • H
      r8152: avoid start_xmit to call napi_schedule during autosuspend · 26afec39
      hayeswang 提交于
      Adjust the setting of the flag of SELECTIVE_SUSPEND to prevent start_xmit()
      from calling napi_schedule() directly during runtime suspend.
      
      After calling napi_disable() or clearing the flag of WORK_ENABLE,
      scheduling the napi is useless.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26afec39
    • F
      net: dsa: Bring back device detaching in dsa_slave_suspend() · f154be24
      Florian Fainelli 提交于
      Commit 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid
      lockdep splat") removed the netif_device_detach() call done in
      dsa_slave_suspend() which is necessary, and paired with a corresponding
      netif_device_attach(), bring it back.
      
      Fixes: 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f154be24
    • D
      Merge branch 'phy-truncated-led-names' · d5bdc021
      David S. Miller 提交于
      Geert Uytterhoeven says:
      
      ====================
      net: phy: leds: Fix truncated LED trigger names and crashes
      
      I started seeing crashes during s2ram and poweroff on all my ARM boards,
      like:
      
          Unable to handle kernel NULL pointer dereference at virtual address 00000000
          ...
          [<c04116d4>] (__list_del_entry_valid) from [<c05e8948>] (led_trigger_unregister+0x34/0xcc)
          [<c05e8948>] (led_trigger_unregister) from [<c05336c4>] (phy_led_triggers_unregister+0x28/0x34)
          [<c05336c4>] (phy_led_triggers_unregister) from [<c0531d44>] (phy_detach+0x30/0x74)
          [<c0531d44>] (phy_detach) from [<c0538bdc>] (sh_eth_close+0x64/0x9c)
          [<c0538bdc>] (sh_eth_close) from [<c04d4ce0>] (dpm_run_callback+0x48/0xc8)
      
      or:
      
          list_del corruption. prev->next should be dede6540, but was 2e323931
          ------------[ cut here ]------------
          kernel BUG at lib/list_debug.c:52!
          ...
          [<c02f6d70>] (__list_del_entry_valid) from [<c0425168>] (led_trigger_unregister+0x34/0xcc)
          [<c0425168>] (led_trigger_unregister) from [<c03a05a0>] (phy_led_triggers_unregister+0x28/0x34)
          [<c03a05a0>] (phy_led_triggers_unregister) from [<c039ec04>] (phy_detach+0x30/0x74)
          [<c039ec04>] (phy_detach) from [<c03a4fc0>] (sh_eth_close+0x6c/0xa4)
          [<c03a4fc0>] (sh_eth_close) from [<c0483234>] (__dev_close_many+0xac/0xd0)
      
      As the only clue was a kernel message like
      
          sh-eth ee700000.ethernet eth0: No phy led trigger registered for speed(100)
      
      I had to bisected this, leading to commit 4567d686 ("phy:
      increase size of MII_BUS_ID_SIZE and bus_id").  Reverting that commit
      fixed the issue.
      
      More investigation revealed the crashes are due to the combination of
      two things:
        - Truncated LED trigger names, leading to duplicate names, and
          registration failures,
        - Bad error handling in case of registration failures.
      
      Both are fixed by this patch series.
      
      Changes compared to v1:
        - Add Reviewed-by,
        - New patch "net: phy: leds: Break dependency of phy.h on
          phy_led_triggers.h",
        - Drop moving the include of <linux/phy_led_triggers.h>, as
          <linux/phy.h> no longer includes it,
        - #include <linux/phy.h> from <linux/phy_led_triggers.h>.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5bdc021
    • G
      net: phy: leds: Fix truncated LED trigger names · 3c880eb0
      Geert Uytterhoeven 提交于
      Commit 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and
      bus_id") increased the size of MII bus IDs, but forgot to update the
      private definition in <linux/phy_led_triggers.h>.
      This may cause:
        1. Truncation of LED trigger names,
        2. Duplicate LED trigger names,
        3. Failures registering LED triggers,
        4. Crashes due to bad error handling in the LED trigger failure path.
      
      To fix this, and prevent the definitions going out of sync again in the
      future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE
      definition.
      
      Example:
        - Before I had triggers "ee700000.etherne:01:100Mbps" and
          "ee700000.etherne:01:10Mbps",
        - After the increase of MII_BUS_ID_SIZE, both became
          "ee700000.ethernet-ffffffff:01:" => FAIL,
        - Now, the triggers are "ee700000.ethernet-ffffffff:01:100Mbps" and
          "ee700000.ethernet-ffffffff:01:10Mbps", which are unique again.
      
      Fixes: 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and bus_id")
      Fixes: 2e0bc452 ("net: phy: leds: add support for led triggers on phy link state change")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c880eb0
    • G
      net: phy: leds: Break dependency of phy.h on phy_led_triggers.h · d6f8cfa3
      Geert Uytterhoeven 提交于
      <linux/phy.h> includes <linux/phy_led_triggers.h>, which is not really
      needed.  Drop the include from <linux/phy.h>, and add it to all users
      that didn't include it explicitly.
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6f8cfa3
    • G
      net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash · 8a87fca8
      Geert Uytterhoeven 提交于
      phy_attach_direct() ignores errors returned by
      phy_led_triggers_register(). I think that's OK, as LED triggers can be
      considered a non-critical feature.
      
      However, this causes problems later:
        - phy_led_trigger_change_speed() will access the array
          phy_device.phy_led_triggers, which has been freed in the error path
          of phy_led_triggers_register(), which may lead to a crash.
      
        - phy_led_triggers_unregister() will access the same array, leading to
          crashes during s2ram or poweroff, like:
      
      	Unable to handle kernel NULL pointer dereference at virtual address
      	00000000
      	...
      	[<c04116d4>] (__list_del_entry_valid) from [<c05e8948>] (led_trigger_unregister+0x34/0xcc)
      	[<c05e8948>] (led_trigger_unregister) from [<c05336c4>] (phy_led_triggers_unregister+0x28/0x34)
      	[<c05336c4>] (phy_led_triggers_unregister) from [<c0531d44>] (phy_detach+0x30/0x74)
      	[<c0531d44>] (phy_detach) from [<c0538bdc>] (sh_eth_close+0x64/0x9c)
      	[<c0538bdc>] (sh_eth_close) from [<c04d4ce0>] (dpm_run_callback+0x48/0xc8)
      
          or:
      
      	list_del corruption. prev->next should be dede6540, but was 2e323931
      	------------[ cut here ]------------
      	kernel BUG at lib/list_debug.c:52!
      	...
      	[<c02f6d70>] (__list_del_entry_valid) from [<c0425168>] (led_trigger_unregister+0x34/0xcc)
      	[<c0425168>] (led_trigger_unregister) from [<c03a05a0>] (phy_led_triggers_unregister+0x28/0x34)
      	[<c03a05a0>] (phy_led_triggers_unregister) from [<c039ec04>] (phy_detach+0x30/0x74)
      	[<c039ec04>] (phy_detach) from [<c03a4fc0>] (sh_eth_close+0x6c/0xa4)
      	[<c03a4fc0>] (sh_eth_close) from [<c0483234>] (__dev_close_many+0xac/0xd0)
      
      To fix this, clear phy_device.phy_num_led_triggers in the error path of
      phy_led_triggers_register() fails.
      
      Note that the "No phy led trigger registered for speed" message will
      still be printed on link speed changes, which is a good cue that
      something went wrong with the LED triggers.
      
      Fixes: 2e0bc452 ("net: phy: leds: add support for led triggers on phy link state change")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a87fca8
    • J
      net-next: ethernet: mediatek: change the compatible string · 8b901f6b
      John Crispin 提交于
      When the binding was defined, I was not aware that mt2701 was an earlier
      version of the SoC. For sake of consistency, the ethernet driver should
      use mt2701 inside the compat string as this is the earliest SoC with the
      ethernet core.
      
      The ethernet driver is currently of no real use until we finish and
      upstream the DSA driver. There are no users of this binding yet. It should
      be safe to fix this now before it is too late and we need to provide
      backward compatibility for the mt7623-eth compat string.
      Reported-by: NSean Wang <sean.wang@mediatek.com>
      Signed-off-by: NJohn Crispin <john@phrozen.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b901f6b
    • J
      Documentation: devicetree: change the mediatek ethernet compatible string · 61976fff
      John Crispin 提交于
      When the binding was defined, I was not aware that mt2701 was an earlier
      version of the SoC. For sake of consistency, the ethernet driver should
      use mt2701 inside the compat string as this is the earliest SoC with the
      ethernet core.
      
      The ethernet driver is currently of no real use until we finish and
      upstream the DSA driver. There are no users of this binding yet. It should
      be safe to fix this now before it is too late and we need to provide
      backward compatibility for the mt7623-eth compat string.
      Reported-by: NSean Wang <sean.wang@mediatek.com>
      Signed-off-by: NJohn Crispin <john@phrozen.org>
      Reviewed-by: NMatthias Brugger <matthias.bgg@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61976fff
    • D
      Merge branch 'bnxt_en-rtnl-fixes' · c0d9665f
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: Fix RTNL lock usage in bnxt_sp_task().
      
      There are 2 function calls from bnxt_sp_task() that have buggy RTNL
      usage.  These 2 functions take RTNL lock under some conditions, but
      some callers (such as open, ethtool) have already taken RTNL.  These
      3 patches fix the issue by making it clear that callers must take
      RTNL.  If the caller is bnxt_sp_task() which does not automatically
      take RTNL, we add a common scheme for bnxt_sp_task() to call these
      functions properly under RTNL.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0d9665f
    • M
      bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status(). · 90c694bb
      Michael Chan 提交于
      bnxt_get_port_module_status() calls bnxt_update_link() which expects
      RTNL to be held.  In bnxt_sp_task() that does not hold RTNL, we need to
      call it with a prior call to bnxt_rtnl_lock_sp() and the call needs to
      be moved to the end of bnxt_sp_task().
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90c694bb
    • M
      bnxt_en: Fix RTNL lock usage on bnxt_update_link(). · 0eaa24b9
      Michael Chan 提交于
      bnxt_update_link() is called from multiple code paths.  Most callers,
      such as open, ethtool, already hold RTNL.  Only the caller bnxt_sp_task()
      does not.  So it is a bug to take RTNL inside bnxt_update_link().
      
      Fix it by removing the RTNL inside bnxt_update_link().  The function
      now expects the caller to always hold RTNL.
      
      In bnxt_sp_task(), call bnxt_rtnl_lock_sp() before calling
      bnxt_update_link().  We also need to move the call to the end of
      bnxt_sp_task() since it will be clearing the BNXT_STATE_IN_SP_TASK bit.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eaa24b9
    • M
      bnxt_en: Fix bnxt_reset() in the slow path task. · a551ee94
      Michael Chan 提交于
      In bnxt_sp_task(), we set a bit BNXT_STATE_IN_SP_TASK so that bnxt_close()
      will synchronize and wait for bnxt_sp_task() to finish.  Some functions
      in bnxt_sp_task() require us to clear BNXT_STATE_IN_SP_TASK and then
      acquire rtnl_lock() to prevent race conditions.
      
      There are some bugs related to this logic. This patch refactors the code
      to have common bnxt_rtnl_lock_sp() and bnxt_rtnl_unlock_sp() to handle
      the RTNL and the clearing/setting of the bit.  Multiple functions will
      need the same logic.  We also need to move bnxt_reset() to the end of
      bnxt_sp_task().  Functions that clear BNXT_STATE_IN_SP_TASK must be the
      last functions to be called in bnxt_sp_task().  The common scheme will
      handle the condition properly.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a551ee94
    • J
      tcp: correct memory barrier usage in tcp_check_space() · 56d80622
      Jason Baron 提交于
      sock_reset_flag() maps to __clear_bit() not the atomic version clear_bit().
      Thus, we need smp_mb(), smp_mb__after_atomic() is not sufficient.
      
      Fixes: 3c715127 ("tcp: add memory barriers to write space paths")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56d80622
    • X
      sctp: sctp gso should set feature with NETIF_F_SG when calling skb_segment · 5207f399
      Xin Long 提交于
      Now sctp gso puts segments into skb's frag_list, then processes these
      segments in skb_segment. But skb_segment handles them only when gs is
      enabled, as it's in the same branch with skb's frags.
      
      Although almost all the NICs support sg other than some old ones, but
      since commit 1e16aa3d ("net: gso: use feature flag argument in all
      protocol gso handlers"), features &= skb->dev->hw_enc_features, and
      xfrm_output_gso call skb_segment with features = 0, which means sctp
      gso would call skb_segment with sg = 0, and skb_segment would not work
      as expected.
      
      This patch is to fix it by setting features param with NETIF_F_SG when
      calling skb_segment so that it can go the right branch to process the
      skb's frag_list.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5207f399
    • X
      sctp: sctp_addr_id2transport should verify the addr before looking up assoc · 6f29a130
      Xin Long 提交于
      sctp_addr_id2transport is a function for sockopt to look up assoc by
      address. As the address is from userspace, it can be a v4-mapped v6
      address. But in sctp protocol stack, it always handles a v4-mapped
      v6 address as a v4 address. So it's necessary to convert it to a v4
      address before looking up assoc by address.
      
      This patch is to fix it by calling sctp_verify_addr in which it can do
      this conversion before calling sctp_endpoint_lookup_assoc, just like
      what sctp_sendmsg and __sctp_connect do for the address from users.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f29a130
  3. 25 1月, 2017 15 次提交
    • D
      Merge branch 'lwt-module-unload' · ec221a17
      David S. Miller 提交于
      Robert Shearman says:
      
      ====================
      net: Fix oops on state free after lwt module unload
      
      An oops is seen in lwtstate_free after an lwt ops module has been
      unloaded. This patchset fixes this by preventing modules implementing
      lwtunnel ops from being unloaded whilst there's state alive using
      those ops. The first patch adds fills in a new owner field in all lwt
      ops and the second patch makes use of this to reference count the
      modules as state is built and destroyed using them.
      
      Changes in v3:
       - don't put module reference if try_module_get fails on building state
      
      Changes in v2:
       - specify module owner for all modules as suggested by DaveM
       - reference count all modules building lwt state, not just those ops
         implementing destroy_state, as also suggested by DaveM.
       - rebased on top of David Ahern's lwtunnel changes
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec221a17
    • R
      lwtunnel: Fix oops on state free after encap module unload · 85c81401
      Robert Shearman 提交于
      When attempting to free lwtunnel state after the module for the encap
      has been unloaded an oops occurs:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: lwtstate_free+0x18/0x40
      [..]
      task: ffff88003e372380 task.stack: ffffc900001fc000
      RIP: 0010:lwtstate_free+0x18/0x40
      RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300
      [..]
      Call Trace:
       <IRQ>
       free_fib_info_rcu+0x195/0x1a0
       ? rt_fibinfo_free+0x50/0x50
       rcu_process_callbacks+0x2d3/0x850
       ? rcu_process_callbacks+0x296/0x850
       __do_softirq+0xe4/0x4cb
       irq_exit+0xb0/0xc0
       smp_apic_timer_interrupt+0x3d/0x50
       apic_timer_interrupt+0x93/0xa0
      [..]
      Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8
      
      The problem is after the module for the encap can be unloaded the
      corresponding ops is removed and is thus NULL here.
      
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so grab the module
      reference for the ops on creating lwtunnel state and of course release
      the reference when freeing the state.
      
      Fixes: 1104d9ba ("lwtunnel: Add destroy state operation")
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85c81401
    • R
      net: Specify the owning module for lwtunnel ops · 88ff7334
      Robert Shearman 提交于
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so specify the owning
      module for all lwtunnel ops.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88ff7334
    • D
      Merge branch 'tipc-topology-fixes' · 04d7f1fb
      David S. Miller 提交于
      Parthasarathy Bhuvaragan says:
      
      ====================
      tipc: topology server fixes for nametable soft lockup
      
      In this series, we revert the commit 333f7962 ("tipc: fix a
      race condition leading to subscriber refcnt bug") and provide an
      alternate solution to fix the race conditions in commits 2-4.
      
      We have to do this as the above commit introduced a nametbl soft
      lockup at module exit as described by patch#4.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04d7f1fb
    • P
      tipc: fix cleanup at module unload · 35e22e49
      Parthasarathy Bhuvaragan 提交于
      In tipc_server_stop(), we iterate over the connections with limiting
      factor as server's idr_in_use. We ignore the fact that this variable
      is decremented in tipc_close_conn(), leading to premature exit.
      
      In this commit, we iterate until the we have no connections left.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35e22e49
    • P
      tipc: ignore requests when the connection state is not CONNECTED · 4c887aa6
      Parthasarathy Bhuvaragan 提交于
      In tipc_conn_sendmsg(), we first queue the request to the outqueue
      followed by the connection state check. If the connection is not
      connected, we should not queue this message.
      
      In this commit, we reject the messages if the connection state is
      not CF_CONNECTED.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c887aa6
    • P
      tipc: fix nametbl_lock soft lockup at module exit · 9dc3abdd
      Parthasarathy Bhuvaragan 提交于
      Commit 333f7962 ("tipc: fix a race condition leading to
      subscriber refcnt bug") reveals a soft lockup while acquiring
      nametbl_lock.
      
      Before commit 333f7962, we call tipc_conn_shutdown() from
      tipc_close_conn() in the context of tipc_topsrv_stop(). In that
      context, we are allowed to grab the nametbl_lock.
      
      Commit 333f7962, moved tipc_conn_release (renamed from
      tipc_conn_shutdown) to the connection refcount cleanup. This allows
      either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup.
      
      Since tipc_exit_net() first calls tipc_topsrv_stop() and then
      tipc_nametble_withdraw() increases the chances for the later to
      perform the connection cleanup.
      
      The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
      when it performs the tipc_conn_kref_release() as it tries to grab
      nametbl_lock again while holding it already.
      tipc_nametbl_withdraw() grabs nametbl_lock
        tipc_nametbl_remove_publ()
          tipc_subscrp_report_overlap()
            tipc_subscrp_send_event()
              tipc_conn_sendmsg()
                << if (con->flags != CF_CONNECTED) we do conn_put(),
                   triggering the cleanup as refcount=0. >>
                tipc_conn_kref_release
                  tipc_sock_release
                    tipc_conn_release
                      tipc_subscrb_delete
                        tipc_subscrp_delete
                          tipc_nametbl_unsubscribe << Soft Lockup >>
      
      The previous changes in this series fixes the race conditions fixed
      by commit 333f7962. Hence we can now revert the commit.
      
      Fixes: 333f7962 ("tipc: fix a race condition leading to subscriber refcnt bug")
      Reported-and-Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dc3abdd
    • P
      tipc: fix connection refcount error · fc0adfc8
      Parthasarathy Bhuvaragan 提交于
      Until now, the generic server framework maintains the connection
      id's per subscriber in server's conn_idr. At tipc_close_conn, we
      remove the connection id from the server list, but the connection is
      valid until we call the refcount cleanup. Hence we have a window
      where the server allocates the same connection to an new subscriber
      leading to inconsistent reference count. We have another refcount
      warning we grab the refcount in tipc_conn_lookup() for connections
      with flag with CF_CONNECTED not set. This usually occurs at shutdown
      when the we stop the topology server and withdraw TIPC_CFG_SRV
      publication thereby triggering a withdraw message to subscribers.
      
      In this commit, we:
      1. remove the connection from the server list at recount cleanup.
      2. grab the refcount for a connection only if CF_CONNECTED is set.
      Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc0adfc8
    • P
      tipc: add subscription refcount to avoid invalid delete · d094c4d5
      Parthasarathy Bhuvaragan 提交于
      Until now, the subscribers keep track of the subscriptions using
      reference count at subscriber level. At subscription cancel or
      subscriber delete, we delete the subscription only if the timer
      was pending for the subscription. This approach is incorrect as:
      1. del_timer() is not SMP safe, if on CPU0 the check for pending
         timer returns true but CPU1 might schedule the timer callback
         thereby deleting the subscription. Thus when CPU0 is scheduled,
         it deletes an invalid subscription.
      2. We export tipc_subscrp_report_overlap(), which accesses the
         subscription pointer multiple times. Meanwhile the subscription
         timer can expire thereby freeing the subscription and we might
         continue to access the subscription pointer leading to memory
         violations.
      
      In this commit, we introduce subscription refcount to avoid deleting
      an invalid subscription.
      Reported-and-Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d094c4d5
    • P
      tipc: fix nametbl_lock soft lockup at node/link events · 93f955aa
      Parthasarathy Bhuvaragan 提交于
      We trigger a soft lockup as we grab nametbl_lock twice if the node
      has a pending node up/down or link up/down event while:
      - we process an incoming named message in tipc_named_rcv() and
        perform an tipc_update_nametbl().
      - we have pending backlog items in the name distributor queue
        during a nametable update using tipc_nametbl_publish() or
        tipc_nametbl_withdraw().
      
      The following are the call chain associated:
      tipc_named_rcv() Grabs nametbl_lock
         tipc_update_nametbl() (publish/withdraw)
           tipc_node_subscribe()/unsubscribe()
             tipc_node_write_unlock()
                << lockup occurs if an outstanding node/link event
                   exits, as we grabs nametbl_lock again >>
      
      tipc_nametbl_withdraw() Grab nametbl_lock
        tipc_named_process_backlog()
          tipc_update_nametbl()
            << rest as above >>
      
      The function tipc_node_write_unlock(), in addition to releasing the
      lock processes the outstanding node/link up/down events. To do this,
      we need to grab the nametbl_lock again leading to the lockup.
      
      In this commit we fix the soft lockup by introducing a fast variant of
      node_unlock(), where we just release the lock. We adapt the
      node_subscribe()/node_unsubscribe() to use the fast variants.
      Reported-and-Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f955aa
    • P
      netfilter: nf_tables: bump set->ndeact on set flush · b2c11e4b
      Pablo Neira Ayuso 提交于
      Add missing set->ndeact update on each deactivated element from the set
      flush path. Otherwise, sets with fixed size break after flush since
      accounting breaks.
      
       # nft add set x y { type ipv4_addr\; size 2\; }
       # nft add element x y { 1.1.1.1 }
       # nft add element x y { 1.1.1.2 }
       # nft flush set x y
       # nft add element x y { 1.1.1.1 }
       <cmdline>:1:1-28: Error: Could not process rule: Too many open files in system
      
      Fixes: 8411b644 ("netfilter: nf_tables: support for set flushing")
      Reported-by: NElise Lennion <elise.lennion@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b2c11e4b
    • P
      netfilter: nf_tables: deconstify walk callback function · de70185d
      Pablo Neira Ayuso 提交于
      The flush operation needs to modify set and element objects, so let's
      deconstify this.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      de70185d
    • P
      netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL · 35d0ac90
      Pablo Neira Ayuso 提交于
      If the element exists and no NLM_F_EXCL is specified, do not bump
      set->nelems, otherwise we leak one set element slot. This problem
      amplifies if the set is full since the abort path always decrements the
      counter for the -ENFILE case too, giving one spare extra slot.
      
      Fix this by moving set->nelems update to nft_add_set_elem() after
      successful element insertion. Moreover, remove the element if the set is
      full so there is no need to rely on the abort path to undo things
      anymore.
      
      Fixes: c016c7e4 ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      35d0ac90
    • L
      netfilter: nft_log: restrict the log prefix length to 127 · 5ce6b04c
      Liping Zhang 提交于
      First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
      at nf_log_packet(), so the extra part is useless.
      
      Second, after adding a log rule with a very very long prefix, we will
      fail to dump the nft rules after this _special_ one, but acctually,
      they do exist. For example:
        # name_65000=$(printf "%0.sQ" {1..65000})
        # nft add rule filter output log prefix "$name_65000"
        # nft add rule filter output counter
        # nft add rule filter output counter
        # nft list chain filter output
        table ip filter {
            chain output {
                type filter hook output priority 0; policy accept;
            }
        }
      
      So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5ce6b04c
    • D
      Merge branch 'alx-mq-fixes' · 294628c1
      David S. Miller 提交于
      Tobias Regnery says:
      
      ====================
      alx: fix fallout from multi queue conversion
      
      Here are 3 fixes for the multi queue conversion in v4.10.
      
      The first patch fixes a wrong condition in an if statement.
      
      Patches 2 and 3 fixes regressions in the corner case when requesting msi-x
      interrupts fails and we fall back to msi or legacy interrupts.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      294628c1