1. 29 9月, 2015 8 次提交
    • E
      tcp: avoid reorders for TFO passive connections · 7c85af88
      Eric Dumazet 提交于
      We found that a TCP Fast Open passive connection was vulnerable
      to reorders, as the exchange might look like
      
      [1] C -> S S <FO ...> <request>
      [2] S -> C S. ack request <options>
      [3] S -> C . <answer>
      
      packets [2] and [3] can be generated at almost the same time.
      
      If C receives the 3rd packet before the 2nd, it will drop it as
      the socket is in SYN_SENT state and expects a SYNACK.
      
      S will have to retransmit the answer.
      
      Current OOO avoidance in linux is defeated because SYNACK
      packets are attached to the LISTEN socket, while DATA packets
      are attached to the children. They might be sent by different cpus,
      and different TX queues might be selected.
      
      It turns out that for TFO, we created a child, which is a
      full blown socket in TCP_SYN_RECV state, and we simply can attach
      the SYNACK packet to this socket.
      
      This means that at the time tcp_sendmsg() pushes DATA packet,
      skb->ooo_okay will be set iff the SYNACK packet had been sent
      and TX completed.
      
      This removes the reorder source at the host level.
      
      We also removed the export of tcp_try_fastopen(), as it is no
      longer called from IPv6.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c85af88
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · eae93fe4
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-09-28
      
      This series contains updates to i40e, i40evf and igb to resolve issues
      seen and reported by Red Hat.
      
      Kiran moves i40e_get_head() in preparation for the refactor of the Tx
      timeout logic, so that it can be used in other areas of the driver.
      Refactored the driver timeout logic by issuing a writeback request via
      a software interrupt to the hardware the first time the driver detects
      a hang.  This was due to the driver being too aggressive in resetting a
      hung queue.
      
      Shannon adds the GRE protocol to the transmit checksum encoding.
      
      Anjali fixes an issue of forcing writeback too often, which caused us to
      not benefit from NAPI.  We now disable force writeback in the clean
      routine for X710 and XL710 adapters.  The X722 adapters do not enable
      interrupt to force a writeback and benefit from WB_ON_ITR and so force
      WB is left enabled for those adapters.  Fixed a possible deadlock issue
      where sync_vsi_filters() can be called directly under RTNL or through
      the timer subtask without RTNL.  So update the flow to see if we are
      already under RTNL before trying to grab it.
      
      Stefan Assmann provides a fix for igb where SR-IOV was not getting
      enabled properly and we ran into a NULL pointer if the max_vfs module
      parameter is specified.  This is prevented by setting the
      IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs().
      
      v2: added "i40e: Fix for recursive RTNL lock during PROMISC change" patch
          to the series, as it resolves another issues seen and reported by
          Red Hat.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eae93fe4
    • S
      igb: assume MSI-X interrupts during initialization · cbfe360a
      Stefan Assmann 提交于
      In igb_sw_init() the sequence of calls was changed from
      igb_init_queue_configuration()
      igb_init_interrupt_scheme()
      igb_probe_vfs()
      to
      igb_probe_vfs()
      igb_init_queue_configuration()
      igb_init_interrupt_scheme()
      
      This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
      during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
      get enabled properly and we run into a NULL pointer if the max_vfs
      module parameter is specified (adapter->vf_data does not get allocated,
      crash on accessing the structure).
      
      [    7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [    7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
      [    7.419370] PGD 0
      [    7.419373] Oops: 0002 [#1] SMP
      [    7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
      [    7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
      [    7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
      [...]
      [    7.419431] Call Trace:
      [    7.419442]  [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
      [    7.419447]  [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0
      
      Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
      igb_probe_vfs(). The real interrupt capabilities will be checked during
      igb_init_interrupt_scheme() so this is safe to do.
      Signed-off-by: NStefan Assmann <sassmann@kpanic.de>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cbfe360a
    • A
      i40e: Fix for recursive RTNL lock during PROMISC change · 30e2561b
      Anjali Singhai 提交于
      The sync_vsi_filters function can be called directly under RTNL
      or through the timer subtask without one. This was causing a deadlock.
      
      If sync_vsi_filters is called from a thread which held the lock,
      and in another thread the PROMISC setting got changed we would
      be executing the PROMISC change in the thread which already held
      the lock alongside the other filter update. The PROMISC change
      requires a reset if we are on a VEB, which requires it to be called
      under RTNL.
      
      Earlier the driver would call reset for PROMISC change without
      checking if we were already under RTNL and would try to grab it
      causing a deadlock. This patch changes the flow to see if we are
      already under RTNL before trying to grab it.
      Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: NKiran Patil <kiran.patil@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      30e2561b
    • A
      i40e: Fix RS bit update in Tx path and disable force WB workaround · 58044743
      Anjali Singhai 提交于
      This patch fixes the issue of forcing WB too often causing us to not
      benefit from NAPI.
      
      Without this patch we were forcing WB/arming interrupt too often taking
      away the benefits of NAPI and causing a performance impact.
      
      With this patch we disable force WB in the clean routine for X710
      and XL710 adapters. X722 adapters do not enable interrupt to force
      a WB and benefit from WB_ON_ITR and hence force WB is left enabled
      for those adapters.
      For XL710 and X710 adapters if we have less than 4 packets pending
      a software Interrupt triggered from service task will force a WB.
      
      This patch also changes the conditions for setting RS bit as described
      in code comments. This optimizes when the HW does a tail bump and when
      it does a WB. It also optimizes when we do a wmb.
      Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      58044743
    • S
      i40e: add GRE tunnel type to csum encoding · c1d1791d
      Shannon Nelson 提交于
      Make sure the Tx checksum encoder knows about GRE protocol and sets the
      descriptor flag appropriately.
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c1d1791d
    • K
      i40e/i40evf: refactor tx timeout logic · b03a8c1f
      Kiran Patil 提交于
      This patch modifies the driver timeout logic by issuing a writeback
      request via a software interrupt to the hardware the first time the
      driver detects a hang. The driver was too aggressive in resetting a hung
      queue, so back that off by removing logic to down the netdevice after
      too many hangs, and move the function to the service task.
      
      Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea
      Signed-off-by: NKiran Patil <kiran.patil@intel.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b03a8c1f
    • K
      i40e: Move i40e_get_head into header file · 1e6d6f8c
      Kiran Patil 提交于
      i40e_get_head needs to be called in multiple files in a further patch,
      prepare by moving the function into a header file.
      Signed-off-by: NKiran Patil <kiran.patil@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e6d6f8c
  2. 28 9月, 2015 1 次提交
  3. 27 9月, 2015 6 次提交
  4. 26 9月, 2015 25 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 518a7cb6
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
          instead of cloning them.  From Daniel Borkmann.
      
       2) When converting classical BPF into eBPF, fix the setting of the
          source reg to BPF_REG_X.  From Tycho Andersen.
      
       3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
          Linus Lussing.
      
       4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.
      
       5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
          Roopa Prabhu.
      
       6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.
      
       7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
          Florian Westphal.
      
       8) Check DMA mapping errors in bna driver, from Ivan Vecera.
      
       9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
          misclearing of interrupt status in TX timeout handler, etc.) from
          David Woodhouse.
      
      10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
          Hugne.
      
      11) Fix autobind races et al. in netlink code, from Herbert Xu with
          help from Tejun Heo and others.
      
      12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.
      
      13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
          Eric Dumazet.
      
      14) Fix array overruns in mac80211, from Johannes Berg and Dan
          Carpenter.
      
      15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.
      
      16) Fix race between poll_one_napi and napi_disable, from Neil Horman.
      
      17) Fix byte order in geneve tunnel port config, from John W Linville.
      
      18) Fix handling of ARP replies over lightweight tunnels, from Jiri
          Benc.
      
      19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
          Kok and Roopa Prabhu.
      
      20) Several reference count handling bug fixes in the PHY/MDIO layer
          from Russel King.
      
      21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.
      
      22) Fix crash in icmp_route_lookup(), from David Ahern.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
        net: Fix panic in icmp_route_lookup
        net: update docbook comment for __mdiobus_register()
        ppp: fix lockdep splat in ppp_dev_uninit()
        net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
        phy: marvell: add link partner advertised modes
        net: fix net_device refcounting
        phy: add phy_device_remove()
        phy: fixed-phy: properly validate phy in fixed_phy_update_state()
        net: fix phy refcounting in a bunch of drivers
        of_mdio: fix MDIO phy device refcounting
        phy: add proper phy struct device refcounting
        phy: fix mdiobus module safety
        net: dsa: fix of_mdio_find_bus() device refcount leak
        phy: fix of_mdio_find_bus() device refcount leak
        ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
        ip6_gre: Reduce log level in ip6gre_err() to debug
        fib_rules: fix fib rule dumps across multiple skbs
        bnx2x: byte swap rss_key to comply to Toeplitz specs
        net: revert "net_sched: move tp->root allocation into fw_init()"
        lwtunnel: remove source and destination UDP port config option
        ...
      518a7cb6
    • D
      net: Fix panic in icmp_route_lookup · bdb06cbf
      David Ahern 提交于
      Andrey reported a panic:
      
      [ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
      [ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
      [ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
      [ 7249.865637] Oops: 0000 [#1]
      ...
      [ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
      4.3.0-999-generic #201509220155
      [ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
      [ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
      [ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
      [ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
      [ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
      [ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
      [ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
      [ 7249.867141] Stack:
      [ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
      f3920240 f0c69180
      [ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
      e659266c f483ba54
      [ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
      c16b0e00 00000064
      [ 7249.867448] Call Trace:
      [ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
      [ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
      [ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
      [ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
      [ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
      [ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
      [ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
      [ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
      [ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
      [nf_conntrack]
      [ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
      [ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
      [ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
      [ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
      [ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
      [ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
      [ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
      [ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
      [ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
      [ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
      [ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
      [ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
      [ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
      [ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
      [ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
      [ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
      [ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
      [ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
      ...
      
      Prior to the VRF change the oif was not set in the flow struct, so the
      VRF support should really have only added the vrf_master_ifindex lookup.
      
      Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
      Cc: Andrey Melnikov <temnota.am@gmail.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdb06cbf
    • R
      net: update docbook comment for __mdiobus_register() · 59f06978
      Russell King 提交于
      Update the docbook comment for __mdiobus_register() to include the new
      module owner argument.  This resolves a warning found by the 0-day
      builder.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59f06978
    • L
      Merge branch 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · d4a748a1
      Linus Torvalds 提交于
      Pull another cgroup fix from Tejun Heo:
       "The cgroup writeback support got inadvertently enabled for traditional
        hierarchies revealing two regressions which are currently being worked
        on.  It shouldn't have been enabled on traditional hierarchies, so
        disable it on them.  This is enough to make the regressions go away
        for people who aren't experimenting with cgroup"
      
      * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup, writeback: don't enable cgroup writeback on traditional hierarchies
      d4a748a1
    • D
      Merge branch 'listener-sock-const' · 4d54d865
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      dccp/tcp: constify listener sock
      
      Another patch bomb to prepare lockless TCP/DCCP LISTEN handling.
      
      SYNACK retransmits are built and sent without listener socket
      being locked. Soon, initial SYNACK packets will have same property.
      
      This series makes sure we did not something wrong with this model,
      by adding a const qualifier in all the paths taken from synack building
      and transmit, for IPv4/IPv6 and TCP/dccp.
      
      The only potential problem was the rewrite of ecn bits for connections
      with DCTCP as congestion module, but this was a very minor one.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d54d865
    • E
      inet: constify inet_rtx_syn_ack() sock argument · 1b70e977
      Eric Dumazet 提交于
      SYNACK packets are sent on behalf on unlocked listeners
      or fastopen sockets. Mark socket as const to catch future changes
      that might break the assumption.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b70e977
    • E
      tcp/dccp: constify rtx_synack() and friends · ea3bea3a
      Eric Dumazet 提交于
      This is done to make sure we do not change listener socket
      while sending SYNACK packets while socket lock is not held.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea3bea3a
    • E
      dccp: constify dccp_make_response() socket argument · 802885fc
      Eric Dumazet 提交于
      Like tcp_make_synack() the only time we might change the socket is
      when calling sock_wmalloc(), which is using atomic operation to
      update sk->sk_wmem_alloc
      
      Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      802885fc
    • E
      tcp: constify tcp_v{4|6}_send_synack() socket argument · 0f935dbe
      Eric Dumazet 提交于
      This documents fact that listener lock might not be held
      at the time SYNACK are sent.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f935dbe
    • E
      ipv6: constify ip6_xmit() sock argument · 1c1e9d2b
      Eric Dumazet 提交于
      This is to document that socket lock might not be held at this point.
      
      skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
      or spinlocks, so we promote the socket to non const when calling them.
      
      netfilter hooks should never assume socket lock is held,
      we also promote the socket to non const.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c1e9d2b
    • E
      tcp: constify tcp_make_synack() socket argument · 5d062de7
      Eric Dumazet 提交于
      listener socket is not locked when tcp_make_synack() is called.
      
      We better make sure no field is written.
      
      There is one exception : Since SYNACK packets are attached to the listener
      at this moment (or SYN_RECV child in case of Fast Open),
      sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using
      atomic operations so this is safe.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d062de7
    • E
      tcp: remove tcp_ecn_make_synack() socket argument · 6ac705b1
      Eric Dumazet 提交于
      SYNACK packets might be sent without holding socket lock.
      
      For DCTCP/ECN sake, we should call INET_ECN_xmit() while
      socket lock is owned, and only when we init/change congestion control.
      
      This also fixies a bug if congestion module is changed from
      dctcp to another one on a listener : we now clear ECN bits
      properly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ac705b1
    • E
      tcp: remove tcp_synack_options() socket argument · 37bfbdda
      Eric Dumazet 提交于
      We do not use the socket in this function.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37bfbdda
    • E
      ip: constify ip_build_and_send_pkt() socket argument · cfe673b0
      Eric Dumazet 提交于
      This function is used to build and send SYNACK packets,
      possibly on behalf of unlocked listener socket.
      
      Make sure we did not miss a write by making this socket const.
      
      We no longer can use ip_select_ident() and have to either
      set iph->id to 0 or directly call __ip_select_ident()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfe673b0
    • E
      tcp: md5: constify tcp_md5_do_lookup() socket argument · b83e3deb
      Eric Dumazet 提交于
      When TCP new listener is done, these functions will be called
      without socket lock being held. Make sure they don't change
      anything.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b83e3deb
    • E
      inet: constify ip_dont_fragment() arguments · 4e3f5d72
      Eric Dumazet 提交于
      ip_dont_fragment() can accept const socket and dst
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e3f5d72
    • E
      ipv6: constify inet6_csk_route_req() socket argument · 30d50c61
      Eric Dumazet 提交于
      socket is not modified, make it const so that callers can
      do the same if they need.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30d50c61
    • E
      ipv6: constify ip6_dst_lookup_{flow|tail}() sock arguments · 3aef934f
      Eric Dumazet 提交于
      ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch
      socket, lets add a const qualifier.
      
      This will permit the same change in inet6_csk_route_req()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3aef934f
    • E
      inet: constify inet_csk_route_req() socket argument · e5895bc6
      Eric Dumazet 提交于
      This is used by TCP listener core, and listener socket shall
      not be modified by inet_csk_route_req().
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5895bc6
    • E
      inet: constify ip_route_output_flow() socket argument · 6f9c9615
      Eric Dumazet 提交于
      Very soon, TCP stack might call inet_csk_route_req(), which
      calls inet_csk_route_req() with an unlocked listener socket,
      so we need to make sure ip_route_output_flow() is not trying to
      change any field from its socket argument.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f9c9615
    • E
      tcp: constify tcp_openreq_init_rwin() · b1964b5f
      Eric Dumazet 提交于
      Soon, listener socket wont be locked when tcp_openreq_init_rwin()
      is called. We need to read socket fields once, as their value
      could change under us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1964b5f
    • E
      tcp: constify listener socket in tcp_v[46]_init_req() · b40cf18e
      Eric Dumazet 提交于
      Soon, listener socket spinlock will no longer be held,
      add const arguments to tcp_v[46]_init_req() to make clear these
      functions can not mess socket fields.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b40cf18e
    • G
      ppp: fix lockdep splat in ppp_dev_uninit() · 58a89eca
      Guillaume Nault 提交于
      ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection.
      ppp_create_interface() must then lock these mutexes in that same order
      to avoid possible deadlock.
      
      [  120.880011] ======================================================
      [  120.880011] [ INFO: possible circular locking dependency detected ]
      [  120.880011] 4.2.0 #1 Not tainted
      [  120.880011] -------------------------------------------------------
      [  120.880011] ppp-apitest/15827 is trying to acquire lock:
      [  120.880011]  (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
      [  120.880011]
      [  120.880011] but task is already holding lock:
      [  120.880011]  (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14
      [  120.880011]
      [  120.880011] which lock already depends on the new lock.
      [  120.880011]
      [  120.880011]
      [  120.880011] the existing dependency chain (in reverse order) is:
      [  120.880011]
      [  120.880011] -> #1 (rtnl_mutex){+.+.+.}:
      [  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
      [  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
      [  120.880011]        [<ffffffff812e4255>] rtnl_lock+0x12/0x14
      [  120.880011]        [<ffffffff812d9d94>] register_netdev+0x11/0x27
      [  120.880011]        [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic]
      [  120.880011]        [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532
      [  120.880011]        [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d
      [  120.880011]        [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [  120.880011]
      [  120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}:
      [  120.880011]        [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76
      [  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
      [  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
      [  120.880011]        [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
      [  120.880011]        [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252
      [  120.880011]        [<ffffffff812d5381>] rollback_registered+0x29/0x38
      [  120.880011]        [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77
      [  120.880011]        [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic]
      [  120.880011]        [<ffffffff8112d9f6>] __fput+0xec/0x192
      [  120.880011]        [<ffffffff8112dacc>] ____fput+0x9/0xb
      [  120.880011]        [<ffffffff8105447a>] task_work_run+0x66/0x80
      [  120.880011]        [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7
      [  120.880011]        [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104
      [  120.880011]        [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f
      [  120.880011]
      [  120.880011] other info that might help us debug this:
      [  120.880011]
      [  120.880011]  Possible unsafe locking scenario:
      [  120.880011]
      [  120.880011]        CPU0                    CPU1
      [  120.880011]        ----                    ----
      [  120.880011]   lock(rtnl_mutex);
      [  120.880011]                                lock(&pn->all_ppp_mutex);
      [  120.880011]                                lock(rtnl_mutex);
      [  120.880011]   lock(&pn->all_ppp_mutex);
      [  120.880011]
      [  120.880011]  *** DEADLOCK ***
      
      Fixes: 8cb775bc ("ppp: fix device unregistration upon netns deletion")
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58a89eca
    • S
      net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected · 21343ac2
      Sudip Mukherjee 提交于
      The builds of allmodconfig of avr32 is failing with:
      
      drivers/net/ethernet/via/via-rhine.c:1098:2: error: implicit declaration
      of function 'pci_iomap' [-Werror=implicit-function-declaration]
      drivers/net/ethernet/via/via-rhine.c:1119:2: error: implicit declaration
      of function 'pci_iounmap' [-Werror=implicit-function-declaration]
      
      The generic empty pci_iomap and pci_iounmap is used only if CONFIG_PCI
      is not defined and CONFIG_GENERIC_PCI_IOMAP is defined.
      
      Add GENERIC_PCI_IOMAP in the dependency list for VIA_RHINE as we are
      getting build failure when CONFIG_PCI and CONFIG_GENERIC_PCI_IOMAP both
      are not defined.
      Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21343ac2
    • M
      net: remove unused argument of __netdev_find_adj() · 6ea29da1
      Michal Kubeček 提交于
      The __netdev_find_adj() helper does not use its first argument, only the
      device to find and list to walk through.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ea29da1