1. 15 8月, 2017 7 次提交
    • C
      net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag · b0ba9d5f
      Casey Leedom 提交于
      cxgb4 Ethernet driver now queries PCIe configuration space to determine
      if it can send TLPs to it with the Relaxed Ordering Attribute set.
      
      Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
      Device Control[Relaxed Ordering Enable] at probe routine, to make sure
      the driver will not send the Relaxed Ordering TLPs to the Root Complex which
      could not deal the Relaxed Ordering TLPs.
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Reviewed-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0ba9d5f
    • D
      PCI: Disable Relaxed Ordering Attributes for AMD A1100 · 077fa19c
      dingtianhong 提交于
      Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
      Root Port where Upstream Transaction Layer Packets with the Relaxed
      Ordering Attribute clear are allowed to bypass earlier TLPs with
      Relaxed Ordering set, it would cause Data Corruption, so we need
      to disable Relaxed Ordering Attribute when Upstream TLPs to the
      Root Port.
      Reported-and-suggested-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Acked-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      077fa19c
    • D
      PCI: Disable Relaxed Ordering for some Intel processors · 87e09cde
      dingtianhong 提交于
      According to the Intel spec section 3.9.1 said:
      
          3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
                and Toward MMIO Regions (P2P)
      
          In order to maximize performance for PCIe devices in the processors
          listed in Table 3-6 below, the soft- ware should determine whether the
          accesses are toward coherent memory (system memory) or toward MMIO
          regions (P2P access to other devices). If the access is toward MMIO
          region, then software can command HW to set the RO bit in the TLP
          header, as this would allow hardware to achieve maximum throughput for
          these types of accesses. For accesses toward coherent memory, software
          can command HW to clear the RO bit in the TLP header (no RO), as this
          would allow hardware to achieve maximum throughput for these types of
          accesses.
      
          Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
                     PCIe Performance
      
          Processor                            CPU RP Device IDs
      
          Intel Xeon processors based on       6F01H-6F0EH
          Broadwell microarchitecture
      
          Intel Xeon processors based on       2F01H-2F0EH
          Haswell microarchitecture
      
      It means some Intel processors has performance issue when use the Relaxed
      Ordering Attribute, so disable Relaxed Ordering for these root port.
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87e09cde
    • D
      PCI: Disable PCIe Relaxed Ordering if unsupported · a99b646a
      dingtianhong 提交于
      When bit4 is set in the PCIe Device Control register, it indicates
      whether the device is permitted to use relaxed ordering.
      On some platforms using relaxed ordering can have performance issues or
      due to erratum can cause data-corruption. In such cases devices must avoid
      using relaxed ordering.
      
      The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
      Relaxed Ordering (RO) attribute should not be used for Transaction Layer
      Packets (TLP) targeted towards these affected root complexes.
      
      This patch checks if there is any node in the hierarchy that indicates that
      using relaxed ordering is not safe. In such cases the patch turns off the
      relaxed ordering by clearing the capability for this device.
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Acked-by: NAshok Raj <ashok.raj@intel.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a99b646a
    • J
      tipc: avoid inheriting msg_non_seq flag when message is returned · 59a361bc
      Jon Paul Maloy 提交于
      In the function msg_reverse(), we reverse the header while trying to
      reuse the original buffer whenever possible. Those rejected/returned
      messages are always transmitted as unicast, but the msg_non_seq field
      is not explicitly set to zero as it should be.
      
      We have seen cases where multicast senders set the message type to
      "NOT dest_droppable", meaning that a multicast message shorter than
      one MTU will be returned, e.g., during receive buffer overflow, by
      reusing the original buffer. This has the effect that even the
      'msg_non_seq' field is inadvertently inherited by the rejected message,
      although it is now sent as a unicast message. This again leads the
      receiving unicast link endpoint to steer the packet toward the broadcast
      link receive function, where it is dropped. The affected unicast link is
      thereafter (after 100 failed retransmissions) declared 'stale' and
      reset.
      
      We fix this by unconditionally setting the 'msg_non_seq' flag to zero
      for all rejected/returned messages.
      Reported-by: NCanh Duc Luu <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59a361bc
    • J
      tipc: accept PACKET_MULTICAST packets · fed5f571
      Jon Paul Maloy 提交于
      On L2 bearers, the TIPC broadcast function is sending out packets using
      the corresponding L2 broadcast address. At reception, we filter such
      packets under the assumption that they will also be delivered as
      broadcast packets.
      
      This assumption doesn't always hold true. Under high load, we have seen
      that a switch may convert the destination address and deliver the packet
      as a PACKET_MULTICAST, something leading to inadvertently dropped
      packets and a stale and reset broadcast link.
      
      We fix this by extending the reception filtering to accept packets of
      type PACKET_MULTICAST.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fed5f571
    • F
      ipv4: route: fix inet_rtm_getroute induced crash · 2c87d63a
      Florian Westphal 提交于
      "ip route get $daddr iif eth0 from $saddr" causes:
       BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
       Call Trace:
        ip_route_input_rcu+0x1535/0x1b50
        ip_route_input_noref+0xf9/0x190
        tcp_v4_early_demux+0x1a4/0x2b0
        ip_rcv+0xbcb/0xc05
        __netif_receive_skb+0x9c/0xd0
        netif_receive_skb_internal+0x5a8/0x890
      
      Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
      iif was provided) or ip_route_output_key_hash_rcu.
      
      But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
      associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
      bit (i.e. skb_dst_drop will release/free the entry while it should not).
      
      Thus only set the dst if we called ip_route_output_key_hash_rcu().
      
      I tested this patch by running:
       while true;do ip r get 10.0.1.2;done > /dev/null &
       while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
      ... and saw no crash or memory leak.
      
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: David Ahern <dsahern@gmail.com>
      Fixes: ba52d61e ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c87d63a
  2. 14 8月, 2017 1 次提交
    • A
      bonding: ratelimit failed speed/duplex update warning · 11e9d782
      Andreas Born 提交于
      bond_miimon_commit() handles the UP transition for each slave of a bond
      in the case of MII. It is triggered 10 times per second for the default
      MII Polling interval of 100ms. For device drivers that do not implement
      __ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
      fails persistently while the MII status could remain UP. That is, in
      this and other cases where the speed/duplex update keeps failing over a
      longer period of time while the MII state is UP, a warning is printed
      every MII polling interval.
      
      To address these excessive warnings net_ratelimit() should be used.
      Printing a warning once would not be sufficient since the call to
      bond_update_speed_duplex() could recover to succeed and fail again
      later. In that case there would be no new indication what went wrong.
      
      Fixes: b5bf0f5b (bonding: correctly update link status during mii-commit phase)
      Signed-off-by: NAndreas Born <futur.andy@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11e9d782
  3. 12 8月, 2017 10 次提交
    • E
      udp: harden copy_linear_skb() · fd851ba9
      Eric Dumazet 提交于
      syzkaller got crashes with CONFIG_HARDENED_USERCOPY=y configs.
      
      Issue here is that recvfrom() can be used with user buffer of Z bytes,
      and SO_PEEK_OFF of X bytes, from a skb with Y bytes, and following
      condition :
      
      Z < X < Y
      
      kernel BUG at mm/usercopy.c:72!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 2917 Comm: syzkaller842281 Not tainted 4.13.0-rc3+ #16
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      task: ffff8801d2fa40c0 task.stack: ffff8801d1fe8000
      RIP: 0010:report_usercopy mm/usercopy.c:64 [inline]
      RIP: 0010:__check_object_size+0x3ad/0x500 mm/usercopy.c:264
      RSP: 0018:ffff8801d1fef8a8 EFLAGS: 00010286
      RAX: 0000000000000078 RBX: ffffffff847102c0 RCX: 0000000000000000
      RDX: 0000000000000078 RSI: 1ffff1003a3fded5 RDI: ffffed003a3fdf09
      RBP: ffff8801d1fef998 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d1ea480e
      R13: fffffffffffffffa R14: ffffffff84710280 R15: dffffc0000000000
      FS:  0000000001360880(0000) GS:ffff8801dc000000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000202ecfe4 CR3: 00000001d1ff8000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       check_object_size include/linux/thread_info.h:108 [inline]
       check_copy_size include/linux/thread_info.h:139 [inline]
       copy_to_iter include/linux/uio.h:105 [inline]
       copy_linear_skb include/net/udp.h:371 [inline]
       udpv6_recvmsg+0x1040/0x1af0 net/ipv6/udp.c:395
       inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
       sock_recvmsg_nosec net/socket.c:792 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:799
       SYSC_recvfrom+0x2d6/0x570 net/socket.c:1788
       SyS_recvfrom+0x40/0x50 net/socket.c:1760
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: b65ac446 ("udp: try to avoid 2 cache miss on dequeue")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd851ba9
    • D
      Merge branch 'bpf-Minor-fix-in-bpf_convert_ctx_access' · 6401f37c
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      bpf: Minor fix in bpf_convert_ctx_access
      
      First one was found while trying to compile the kernel
      with !CONFIG_NET_RX_BUSY_POLL.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6401f37c
    • D
      bpf: fix two missing target_size settings in bpf_convert_ctx_access · 2ed46ce4
      Daniel Borkmann 提交于
      When CONFIG_NET_SCHED or CONFIG_NET_RX_BUSY_POLL is /not/ set and
      we try a narrow __sk_buff load of tc_index or napi_id, respectively,
      then verifier rightfully complains that it's misconfigured, because
      we need to set target_size in each of the two cases. The rewrite
      for the ctx access is just a dummy op, but needs to pass, so fix
      this up.
      
      Fixes: f96da094 ("bpf: simplify narrower ctx access")
      Reported-by: NShubham Bansal <illusionist.neo@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ed46ce4
    • D
      net: fix compilation when busy poll is not enabled · e4dde412
      Daniel Borkmann 提交于
      MIN_NAPI_ID is used in various places outside of
      CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set
      we run into build errors such as:
      
        net/core/dev.c: In function 'dev_get_by_napi_id':
        net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function)
          if (napi_id < MIN_NAPI_ID)
                        ^~~~~~~~~~~
      
      Thus, have MIN_NAPI_ID always defined to fix these errors.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4dde412
    • A
      mISDN: Fix null pointer dereference at mISDN_FsmNew · 54a6a043
      Anton Vasilyev 提交于
      If mISDN_FsmNew() fails to allocate memory for jumpmatrix
      then null pointer dereference will occur on any write to
      jumpmatrix.
      
      The patch adds check on successful allocation and
      corresponding error handling.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: NAnton Vasilyev <vasilyev@ispras.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54a6a043
    • S
      nfp: do not update MTU from BH in flower app · bb3afda4
      Simon Horman 提交于
      The Flower app may receive a request to update the MTU of a representor
      netdev upon receipt of a control message from the firmware. This requires
      the RTNL lock which needs to be taken outside of the packet processing
      path.
      
      As a handling of this correctly seems a little to invasive for a fix simply
      skip setting the MTU for now.
      
      Relevant backtrace:
       [ 1496.288489] BUG: scheduling while atomic: kworker/0:3/373/0x00000100
       [ 1496.294911]  dca syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm mxm_wmi ahci pps_core libahci i2c_algo_bit wmi [last unloaded: nfp]
       [ 1496.294918] CPU: 0 PID: 373 Comm: kworker/0:3 Tainted: G           OE   4.13.0-rc3+ #3
       [ 1496.294919] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015
       [ 1496.294923] Workqueue: events work_for_cpu_fn
       [ 1496.294924] Call Trace:
       [ 1496.294927]  <IRQ>
       [ 1496.294931]  dump_stack+0x63/0x82
       [ 1496.294935]  __schedule_bug+0x54/0x70
       [ 1496.294937]  __schedule+0x62f/0x890
       [ 1496.294941]  ? intel_unmap_sg+0x90/0x90
       [ 1496.294942]  schedule+0x36/0x80
       [ 1496.294943]  schedule_preempt_disabled+0xe/0x10
       [ 1496.294945]  __mutex_lock.isra.2+0x445/0x4a0
       [ 1496.294947]  ? device_is_rmrr_locked+0x12/0x50
       [ 1496.294950]  ? kfree+0x162/0x170
       [ 1496.294952]  ? device_is_rmrr_locked+0x12/0x50
       [ 1496.294953]  ? iommu_should_identity_map+0x50/0xe0
       [ 1496.294954]  __mutex_lock_slowpath+0x13/0x20
       [ 1496.294955]  ? iommu_no_mapping+0x48/0xd0
       [ 1496.294956]  ? __mutex_lock_slowpath+0x13/0x20
       [ 1496.294957]  mutex_lock+0x2f/0x40
       [ 1496.294960]  rtnl_lock+0x15/0x20
       [ 1496.294979]  nfp_flower_cmsg_rx+0xc8/0x150 [nfp]
       [ 1496.294986]  nfp_ctrl_poll+0x286/0x350 [nfp]
       [ 1496.294989]  tasklet_action+0xf6/0x110
       [ 1496.294992]  __do_softirq+0xed/0x278
       [ 1496.294993]  irq_exit+0xb6/0xc0
       [ 1496.294994]  do_IRQ+0x4f/0xd0
       [ 1496.294996]  common_interrupt+0x89/0x89
      
      Fixes: 948faa46 ("nfp: add support for control messages for flower app")
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb3afda4
    • R
      net: stmmac: Use the right logging function in stmmac_mdio_register · fbca1647
      Romain Perier 提交于
      Currently, the function stmmac_mdio_register() is only used by
      stmmac_dvr_probe() from stmmac_main.c, in order to register the MDIO bus
      and probe information about the PHY. As this function is called before
      calling register_netdev(), all messages logged from stmmac_mdio_register
      are prefixed by "(unnamed net_device)". The goal of netdev_info or
      netdev_err is to dump useful infos about a net_device, when this data
      structure is partially initialized, there is no point for using these
      functions.
      
      This commit fixes the issue by replacing all netdev_*() by the
      corresponding dev_*() function for logging. The last netdev_info is
      replaced by phy_attached_info(), as a valid phydev can be used at this
      point.
      Signed-off-by: NRomain Perier <romain.perier@collabora.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbca1647
    • K
      net/sched/hfsc: allocate tcf block for hfsc root class · 8d553738
      Konstantin Khlebnikov 提交于
      Without this filters cannot be attached.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d553738
    • A
      bonding: require speed/duplex only for 802.3ad, alb and tlb · ad729bc9
      Andreas Born 提交于
      The patch c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state") puts the link state to down if
      bond_update_speed_duplex() cannot retrieve speed and duplex settings.
      Assumably the patch was written with 802.3ad mode in mind which relies
      on link speed/duplex settings. For other modes like active-backup these
      settings are not required. Thus, only for these other modes, this patch
      reintroduces support for slaves that do not support reporting speed or
      duplex such as wireless devices. This fixes the regression reported in
      bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).
      
      Fixes: c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state")
      Signed-off-by: NAndreas Born <futur.andy@googlemail.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad729bc9
    • V
      net: dsa: ksz: fix skb freeing · e71cb9e0
      Vivien Didelot 提交于
      The DSA layer frees the original skb when an xmit function returns NULL,
      meaning an error occurred. But if the tagging code copied the original
      skb, it is responsible of freeing the copy if an error occurs.
      
      The ksz tagging code currently has two issues: if skb_put_padto fails,
      the skb copy is not freed, and the original skb will be freed twice.
      
      To fix that, move skb_put_padto inside both branches of the skb_tailroom
      condition, before freeing the original skb, and free the copy on error.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NWoojung Huh <woojung.huh@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e71cb9e0
  4. 11 8月, 2017 4 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26273939
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy.
      
       2) Fix stats handling in bcm_sysport_get_stats(), from Florian
          Fainelli.
      
       3) Reject 16777215 VNI value in geneve_validate(), from Girish
          Moodalbail.
      
       4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov.
      
       5) Once a UFO fragmented frame is treated as UFO, we should continue
          doing so. Likewise once a frame has been segmented, we should
          continue doing that and not try to convert it to a UFO frame. From
          Willem de Bruijn.
      
       6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to
          prevent races. From Willem de Bruijn.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        packet: fix tp_reserve race in packet_set_ring
        udp: consistently apply ufo or fragmentation
        net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
        igmp: Fix regression caused by igmp sysctl namespace code.
        geneve: maximum value of VNI cannot be used
        net: systemport: Fix software statistics for SYSTEMPORT Lite
        tipc: remove premature ESTABLISH FSM event at link synchronization
      26273939
    • W
      packet: fix tp_reserve race in packet_set_ring · c27927e3
      Willem de Bruijn 提交于
      Updates to tp_reserve can race with reads of the field in
      packet_set_ring. Avoid this by holding the socket lock during
      updates in setsockopt PACKET_RESERVE.
      
      This bug was discovered by syzkaller.
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c27927e3
    • W
      udp: consistently apply ufo or fragmentation · 85f1bd9a
      Willem de Bruijn 提交于
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85f1bd9a
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f213ad38
      Linus Torvalds 提交于
      Pull sparc updates from David Miller:
      
       1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.
      
       2) Prevent crashes when bringing up sunvdc virtual block devices in
          some environments. From Jim Quigley.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
        sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
        sparc64: recognize and support sparc M8 cpu type
        sparc64: properly name the cpu constants
      f213ad38
  5. 10 8月, 2017 12 次提交
    • X
      net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target · 96d97030
      Xin Long 提交于
      Commit 55917a21 ("netfilter: x_tables: add context to know if
      extension runs from nft_compat") introduced a member nft_compat to
      xt_tgchk_param structure.
      
      But it didn't set it's value for ipt_init_target. With unexpected
      value in par.nft_compat, it may return unexpected result in some
      target's checkentry.
      
      This patch is to set all it's fields as 0 and only initialize the
      non-zero fields in ipt_init_target.
      
      v1->v2:
        As Wang Cong's suggestion, fix it by setting all it's fields as
        0 and only initializing the non-zero fields.
      
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96d97030
    • N
      igmp: Fix regression caused by igmp sysctl namespace code. · 1714020e
      Nikolay Borisov 提交于
      Commit dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
      function is only called as part of per-namespace initialization, only if
      CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
      compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
      being left initialized with 0. However, there are certain functions, such as
      ip_mc_join_group which are always compiled and make use of some of those
      sysctls. Let's do a partial revert of the aforementioned commit and move the
      sysctl initialization into inet_init_net, that way they will always have
      sane values.
      
      Fixes: dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595Reported-by: NGerardo Exequiel Pozzi <vmlinuz386@gmail.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1714020e
    • G
      geneve: maximum value of VNI cannot be used · 04db70d9
      Girish Moodalbail 提交于
      Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
      of values for it would be from 0 to 16777215 (2^24 -1).  However, one
      cannot create a geneve device with VNI set to 16777215. This patch fixes
      this issue.
      Signed-off-by: NGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04db70d9
    • F
      net: systemport: Fix software statistics for SYSTEMPORT Lite · 50ddfbaf
      Florian Fainelli 提交于
      With SYSTEMPORT Lite we have holes in our statistics layout that make us
      skip over the hardware MIB counters, bcm_sysport_get_stats() was not
      taking that into account resulting in reporting 0 for all SW-maintained
      statistics, fix this by skipping accordingly.
      
      Fixes: 44a4524c ("net: systemport: Add support for SYSTEMPORT Lite")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50ddfbaf
    • J
      tipc: remove premature ESTABLISH FSM event at link synchronization · ed43594a
      Jon Paul Maloy 提交于
      When a link between two nodes come up, both endpoints will initially
      send out a STATE message to the peer, to increase the probability that
      the peer endpoint also is up when the first traffic message arrives.
      Thereafter, if the establishing link is the second link between two
      nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
      helping the peer to perform initial synchronization between the two
      links.
      
      However, the initial STATE message may be lost, in which case the SYNCH
      message will be the first one arriving at the peer. This should also
      work, as the SYNCH message itself will be used to take up the link
      endpoint before  initializing synchronization.
      
      Unfortunately the code for this case is broken. Currently, the link is
      brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
      arrives, whereupon __tipc_node_link_up() is called to distribute the
      link slots and take the link into traffic. But, __tipc_node_link_up() is
      itself starting with a test for whether the link is up, and if true,
      returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
      is unnecessary, since tipc_node_link_up() is itself issuing such an
      event, but also harmful, since it inhibits tipc_node_link_up() to
      perform the test of its tasks, and the link endpoint in question hence
      is never taken into traffic.
      
      This problem has been exposed when we set up dual links between pre-
      and post-4.4 kernels, because the former ones don't send out the
      initial STATE message described above.
      
      We fix this by removing the unnecessary event call.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed43594a
    • J
      sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain · 3ee70591
      Jim Quigley 提交于
      Using mpgroup to define multiple paths for a virtual disk causes multiple
      virtual-device-port ports to be created for that virtual device.
      Each virtual-device-port port then gets a vdisk created for it by the Linux
      sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
      cannot handle multiple ports for a single vdisk, leading to a kernel panic
      at startup.
      
      This fix prevents more than one vdisk per virtual-device-port being created
      until full virtual disk multipathing (mpgroup) support is implemented.
      Signed-off-by: NJim Quigley <Jim.Quigley@oracle.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
      Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
      Reviewed-by: NAaron Young <aaron.young@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ee70591
    • L
      Merge tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 8d31f80e
      Linus Torvalds 提交于
      Pull pin control fixes from Linus Walleij:
       "These are the pin control fixes I have gathered since the return from
        my vacation. They boiled in -next a while so let's get them in.
      
        Apart from the documentation build it is purely driver fixes. Which is
        nice. The Intel fixes seem kind of important.
      
         - Fix the documentation build as the docs were moved
      
         - Correct the UART pin list on the Intel Merrifield
      
         - Fix pin assignment and number of pins on the Marvell Armada 37xx
           pin controller
      
         - Cover the Setzer models in the Chromebook DMI quirk in the Intel
           cheryview driver so they start working
      
         - Add the missing "sim" function to the sunxi driver
      
         - Fix USB pin definitions on Uniphier Pro4
      
         - Smatch fix for invalid reference in the zx pin control driver"
      
      * tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: generic: update references to Documentation/pinctrl.txt
        pinctrl: intel: merrifield: Correct UART pin lists
        pinctrl: armada-37xx: Fix number of pin in south bridge
        pinctrl: armada-37xx: Fix the pin 23 on south bridge
        pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
        pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
        pinctrl: uniphier: fix USB3 pin assignment for Pro4
        pinctrl: zte: fix dereference of 'data' in zx_set_mux()
      8d31f80e
    • M
      futex: Remove unnecessary warning from get_futex_key · 48fb6f4d
      Mel Gorman 提交于
      Commit 65d8fc77 ("futex: Remove requirement for lock_page() in
      get_futex_key()") removed an unnecessary lock_page() with the
      side-effect that page->mapping needed to be treated very carefully.
      
      Two defensive warnings were added in case any assumption was missed and
      the first warning assumed a correct application would not alter a
      mapping backing a futex key.  Since merging, it has not triggered for
      any unexpected case but Mark Rutland reported the following bug
      triggering due to the first warning.
      
        kernel BUG at kernel/futex.c:679!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
        Hardware name: linux,dummy-virt (DT)
        task: ffff80001e271780 task.stack: ffff000010908000
        PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
      
      The fact that it's a bug instead of a warning was due to an unrelated
      arm64 problem, but the warning itself triggered because the underlying
      mapping changed.
      
      This is an application issue but from a kernel perspective it's a
      recoverable situation and the warning is unnecessary so this patch
      removes the warning.  The warning may potentially be triggered with the
      following test program from Mark although it may be necessary to adjust
      NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
      system.
      
          #include <linux/futex.h>
          #include <pthread.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/time.h>
          #include <unistd.h>
      
          #define NR_FUTEX_THREADS 16
          pthread_t threads[NR_FUTEX_THREADS];
      
          void *mem;
      
          #define MEM_PROT  (PROT_READ | PROT_WRITE)
          #define MEM_SIZE  65536
      
          static int futex_wrapper(int *uaddr, int op, int val,
                                   const struct timespec *timeout,
                                   int *uaddr2, int val3)
          {
              syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
          }
      
          void *poll_futex(void *unused)
          {
              for (;;) {
                  futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
              }
          }
      
          int main(int argc, char *argv[])
          {
              int i;
      
              mem = mmap(NULL, MEM_SIZE, MEM_PROT,
                     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
              printf("Mapping @ %p\n", mem);
      
              printf("Creating futex threads...\n");
      
              for (i = 0; i < NR_FUTEX_THREADS; i++)
                  pthread_create(&threads[i], NULL, poll_futex, NULL);
      
              printf("Flipping mapping...\n");
              for (;;) {
                  mmap(mem, MEM_SIZE, MEM_PROT,
                       MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
              }
      
              return 0;
          }
      Reported-and-tested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48fb6f4d
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 358f8c26
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "The main thing is to allow empty id_tables for ACPI to make some
        drivers get probed again. It looks a bit bigger than usual because it
        needs some internal renaming, too.
      
        Other than that, there is a fix for broken DSTDs, a super simple
        enablement for ARM MPS, and two documentation fixes which I'd like to
        see in v4.13 already"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: rephrase explanation of I2C_CLASS_DEPRECATED
        i2c: allow i2c-versatile for ARM MPS platforms
        i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
        i2c: designware: Print clock freq on invalid clock freq error
        i2c: core: Allow empty id_table in ACPI case as well
        i2c: mux: pinctrl: mention correct module name in Kconfig help text
      358f8c26
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 31cf92f3
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Three patches that should go into this release.
      
        Two of them are from Paolo and fix up some corner cases with BFQ, and
        the last patch is from Ming and fixes up a potential usage count
        imbalance regression due to the recent NOWAIT work"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
        block, bfq: consider also in_service_entity to state whether an entity is active
        block, bfq: reset in_service_entity if it becomes idle
      31cf92f3
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · d555eb6b
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
       "Fix two regressions in the inside-secure driver with respect to
        hmac(sha1)"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
        crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
      d555eb6b
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4530cca1
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "The pull requests are getting smaller, that's progress I suppose :-)
      
         1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.
      
         2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
            from Koichiro Den.
      
         3) Missing u64_stats_init() calls in several drivers, from Florian
            Fainelli.
      
         4) TCP can set the congestion window to an invalid ssthresh value
            after congestion window reductions, from Yuchung Cheng.
      
         5) Fix BPF jit branch generation on s390, from Daniel Borkmann.
      
         6) Correct MIPS ebpf JIT merge, from David Daney.
      
         7) Correct byte order test in BPF test_verifier.c, from Daniel
            Borkmann.
      
         8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.
      
         9) Handle SCTP checksums properly in mlx4 driver, from Davide
            Caratti.
      
        10) We can potentially enter tcp_connect() with a cached route
            already, due to fastopen, so we have to explicitly invalidate it.
      
        11) skb_warn_bad_offload() can bark in legitimate situations, fix from
            Willem de Bruijn"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
        net: avoid skb_warn_bad_offload false positives on UFO
        qmi_wwan: fix NULL deref on disconnect
        ppp: fix xmit recursion detection on ppp channels
        rds: Reintroduce statistics counting
        tcp: fastopen: tcp_connect() must refresh the route
        net: sched: set xt_tgchk_param par.net properly in ipt_init_target
        net: dsa: mediatek: add adjust link support for user ports
        net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
        qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
        hysdn: fix to a race condition in put_log_buffer
        s390/qeth: fix L3 next-hop in xmit qeth hdr
        asix: Fix small memory leak in ax88772_unbind()
        asix: Ensure asix_rx_fixup_info members are all reset
        asix: Add rx->ax_skb = NULL after usbnet_skb_return()
        bpf: fix selftest/bpf/test_pkt_md_access on s390x
        netvsc: fix race on sub channel creation
        bpf: fix byte order test in test_verifier
        xgene: Always get clk source, but ignore if it's missing for SGMII ports
        MIPS: Add missing file for eBPF JIT.
        bpf, s390: fix build for libbpf and selftest suite
        ...
      4530cca1
  6. 09 8月, 2017 6 次提交
    • W
      net: avoid skb_warn_bad_offload false positives on UFO · 8d63bee6
      Willem de Bruijn 提交于
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d63bee6
    • B
      qmi_wwan: fix NULL deref on disconnect · bbae08e5
      Bjørn Mork 提交于
      qmi_wwan_disconnect is called twice when disconnecting devices with
      separate control and data interfaces.  The first invocation will set
      the interface data to NULL for both interfaces to flag that the
      disconnect has been handled.  But the matching NULL check was left
      out when qmi_wwan_disconnect was added, resulting in this oops:
      
        usb 2-1.4: USB disconnect, device number 4
        qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
        IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Modules linked in: <stripped irrelevant module list>
        CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G            E   4.12.3-nr44-normandy-r1500619820+ #1
        Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
        Workqueue: usb_hub_wq hub_event [usbcore]
        task: ffff8c882b716040 task.stack: ffffb8e800d84000
        RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
        RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
        R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
        R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
        Call Trace:
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
         ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
         ? usbnet_disconnect+0x6c/0xf0 [usbnet]
         ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
      Reported-and-tested-by: NNathaniel Roach <nroach44@gmail.com>
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Cc: Daniele Palmas <dnlplm@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbae08e5
    • G
      ppp: fix xmit recursion detection on ppp channels · 0a0e1a85
      Guillaume Nault 提交于
      Commit e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp
      devices") dropped the xmit_recursion counter incrementation in
      ppp_channel_push() and relied on ppp_xmit_process() for this task.
      But __ppp_channel_push() can also send packets directly (using the
      .start_xmit() channel callback), in which case the xmit_recursion
      counter isn't incremented anymore. If such packets get routed back to
      the parent ppp unit, ppp_xmit_process() won't notice the recursion and
      will call ppp_channel_push() on the same channel, effectively creating
      the deadlock situation that the xmit_recursion mechanism was supposed
      to prevent.
      
      This patch re-introduces the xmit_recursion counter incrementation in
      ppp_channel_push(). Since the xmit_recursion variable is now part of
      the parent ppp unit, incrementation is skipped if the channel doesn't
      have any. This is fine because only packets routed through the parent
      unit may enter the channel recursively.
      
      Finally, we have to ensure that pch->ppp is not going to be modified
      while executing ppp_channel_push(). Instead of taking this lock only
      while calling ppp_xmit_process(), we now have to hold it for the full
      ppp_channel_push() execution. This respects the ppp locks ordering
      which requires locking ->upl before ->downl.
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a0e1a85
    • H
      rds: Reintroduce statistics counting · 05bfd7db
      Håkon Bugge 提交于
      In commit 7e3f2952 ("rds: don't let RDS shutdown a connection
      while senders are present"), refilling the receive queue was removed
      from rds_ib_recv(), along with the increment of
      s_ib_rx_refill_from_thread.
      
      Commit 73ce4317 ("RDS: make sure we post recv buffers")
      re-introduces filling the receive queue from rds_ib_recv(), but does
      not add the statistics counter. rds_ib_recv() was later renamed to
      rds_ib_recv_path().
      
      This commit reintroduces the statistics counting of
      s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.
      Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: NKnut Omang <knut.omang@oracle.com>
      Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
      Reviewed-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05bfd7db
    • E
      tcp: fastopen: tcp_connect() must refresh the route · 8ba60924
      Eric Dumazet 提交于
      With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
      to call tcp_connect() while socket sk_dst_cache is either NULL
      or invalid.
      
       +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
       +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
       +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
       +0 connect(4, ..., ...) = 0
      
      << sk->sk_dst_cache becomes obsolete, or even set to NULL >>
      
       +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
      
      We need to refresh the route otherwise bad things can happen,
      especially when syzkaller is running on the host :/
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ba60924
    • X
      net: sched: set xt_tgchk_param par.net properly in ipt_init_target · ec0acb09
      Xin Long 提交于
      Now xt_tgchk_param par in ipt_init_target is a local varibale,
      par.net is not initialized there. Later when xt_check_target
      calls target's checkentry in which it may access par.net, it
      would cause kernel panic.
      
      Jaroslav found this panic when running:
      
        # ip link add TestIface type dummy
        # tc qd add dev TestIface ingress handle ffff:
        # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
          action xt -j CONNMARK --set-mark 4
      
      This patch is to pass net param into ipt_init_target and set
      par.net with it properly in there.
      
      v1->v2:
        As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
        it by also passing net_id to __tcf_ipt_init.
      v2->v3:
        Missed the fixes tag, so add it.
      
      Fixes: ecb2421b ("netfilter: add and use nf_ct_netns_get/put")
      Reported-by: NJaroslav Aster <jaster@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0acb09