1. 22 3月, 2017 1 次提交
  2. 17 3月, 2017 6 次提交
    • E
      tcp: tcp_get_info() should read tcp_time_stamp later · db7f00b8
      Eric Dumazet 提交于
      Commit b369e7fd ("tcp: make TCP_INFO more consistent") moved
      lock_sock_fast() earlier in tcp_get_info()
      
      This has the minor effect that jiffies value being sampled at the
      beginning of tcp_get_info() is more likely to be off by one, and we
      report big tcpi_last_data_sent values (like 0xFFFFFFFF).
      
      Since we lock the socket, fetching tcp_time_stamp right before
      doing the jiffies_to_msecs() calls is enough to remove these
      wrong values.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db7f00b8
    • W
      bridge: resolve a false alarm of lockdep · d12c9176
      WANG Cong 提交于
      Andrei reported a false alarm of lockdep at net/bridge/br_fdb.c:109,
      this is because in Andrei's case, a spin_bug() was already triggered
      before this, therefore the debug_locks is turned off, lockdep_is_held()
      is no longer accurate after that. We should use lockdep_assert_held_once()
      instead of lockdep_is_held() to respect debug_locks.
      
      Fixes: 410b3d48 ("bridge: fdb: add proper lock checks in searching functions")
      Reported-by: NAndrei Vagin <avagin@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12c9176
    • D
      rxrpc: Ignore BUSY packets on old calls · 4d4a6ac7
      David Howells 提交于
      If we receive a BUSY packet for a call we think we've just completed, the
      packet is handed off to the connection processor to deal with - but the
      connection processor doesn't expect a BUSY packet and so flags a protocol
      error.
      
      Fix this by simply ignoring the BUSY packet for the moment.
      
      The symptom of this may appear as a system call failing with EPROTO.  This
      may be triggered by pressing ctrl-C under some circumstances.
      
      This comes about we abort calls due to interruption by a signal (which we
      shouldn't do, but that's going to be a large fix and mostly in fs/afs/).
      What happens is that we abort the call and may also abort follow up calls
      too (this needs offloading somehoe).  So we see a transmission of something
      like the following sequence of packets:
      
      	DATA for call N
      	ABORT call N
      	DATA for call N+1
      	ABORT call N+1
      
      in very quick succession on the same channel.  However, the peer may have
      deferred the processing of the ABORT from the call N to a background thread
      and thus sees the DATA message from the call N+1 coming in before it has
      cleared the channel.  Thus it sends a BUSY packet[*].
      
      [*] Note that some implementations (OpenAFS, for example) mark the BUSY
          packet with one plus the callNumber of the call prior to call N.
          Ordinarily, this would be call N, but there's no requirement for the
          calls on a channel to be numbered strictly sequentially (the number is
          required to increase).
      
          This is wrong and means that the callNumber in the BUSY packet should
          be ignored (it really ought to be N+1 since that's what it's in
          response to).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d4a6ac7
    • D
      net: ipv6: set route type for anycast routes · 4ee39733
      David Ahern 提交于
      Anycast routes have the RTF_ANYCAST flag set, but when dumping routes
      for userspace the route type is not set to RTN_ANYCAST. Make it so.
      
      Fixes: 58c4fb86 ("[IPV6]: Flag RTF_ANYCAST for anycast routes")
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ee39733
    • D
      net: mpls: Fix nexthop alive tracking on down events · 61733c91
      David Ahern 提交于
      Alive tracking of nexthops can account for a link twice if the carrier
      goes down followed by an admin down of the same link rendering multipath
      routes useless. This is similar to 79099aab for UNREGISTER events and
      DOWN events.
      
      Fix by tracking number of alive nexthops in mpls_ifdown similar to the
      logic in mpls_ifup. Checking the flags per nexthop once after all events
      have been processed is simpler than trying to maintian a running count
      through all event combinations.
      
      Also, WRITE_ONCE is used instead of ACCESS_ONCE to set rt_nhn_alive
      per a comment from checkpatch:
          WARNING: Prefer WRITE_ONCE(<FOO>, <BAR>) over ACCESS_ONCE(<FOO>) = <BAR>
      
      Fixes: c89359a4 ("mpls: support for dead routes")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61733c91
    • K
      openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PAD · 8f3dbfd7
      Kris Murphy 提交于
      Added a case for OVS_TUNNEL_KEY_ATTR_PAD to the switch statement
      in ip_tun_from_nlattr in order to prevent the default case
      returning an error.
      
      Fixes: b46f6ded ("libnl: nla_put_be64(): align on a 64-bit area")
      Signed-off-by: NKris Murphy <kriskend@linux.vnet.ibm.com>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f3dbfd7
  3. 16 3月, 2017 4 次提交
  4. 15 3月, 2017 1 次提交
  5. 14 3月, 2017 7 次提交
    • H
      dccp: fix memory leak during tear-down of unsuccessful connection request · 72ef9c41
      Hannes Frederic Sowa 提交于
      This patch fixes a memory leak, which happens if the connection request
      is not fulfilled between parsing the DCCP options and handling the SYN
      (because e.g. the backlog is full), because we forgot to free the
      list of ack vectors.
      Reported-by: NJianwen Ji <jiji@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72ef9c41
    • J
      dccp/tcp: fix routing redirect race · 45caeaa5
      Jon Maxwell 提交于
      As Eric Dumazet pointed out this also needs to be fixed in IPv6.
      v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
      
      We have seen a few incidents lately where a dst_enty has been freed
      with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
      dst_entry. If the conditions/timings are right a crash then ensues when the
      freed dst_entry is referenced later on. A Common crashing back trace is:
      
       #8 [] page_fault at ffffffff8163e648
          [exception RIP: __tcp_ack_snd_check+74]
      .
      .
       #9 [] tcp_rcv_established at ffffffff81580b64
      #10 [] tcp_v4_do_rcv at ffffffff8158b54a
      #11 [] tcp_v4_rcv at ffffffff8158cd02
      #12 [] ip_local_deliver_finish at ffffffff815668f4
      #13 [] ip_local_deliver at ffffffff81566bd9
      #14 [] ip_rcv_finish at ffffffff8156656d
      #15 [] ip_rcv at ffffffff81566f06
      #16 [] __netif_receive_skb_core at ffffffff8152b3a2
      #17 [] __netif_receive_skb at ffffffff8152b608
      #18 [] netif_receive_skb at ffffffff8152b690
      #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
      #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
      #21 [] net_rx_action at ffffffff8152bac2
      #22 [] __do_softirq at ffffffff81084b4f
      #23 [] call_softirq at ffffffff8164845c
      #24 [] do_softirq at ffffffff81016fc5
      #25 [] irq_exit at ffffffff81084ee5
      #26 [] do_IRQ at ffffffff81648ff8
      
      Of course it may happen with other NIC drivers as well.
      
      It's found the freed dst_entry here:
      
       224 static bool tcp_in_quickack_mode(struct sock *sk)
       225 {
       226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);
       227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);
       228 
       229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||
       230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
       231 }
      
      But there are other backtraces attributed to the same freed dst_entry in
      netfilter code as well.
      
      All the vmcores showed 2 significant clues:
      
      - Remote hosts behind the default gateway had always been redirected to a
      different gateway. A rtable/dst_entry will be added for that host. Making
      more dst_entrys with lower reference counts. Making this more probable.
      
      - All vmcores showed a postitive LockDroppedIcmps value, e.g:
      
      LockDroppedIcmps                  267
      
      A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
      regardless of whether user space has the socket locked. This can result in a
      race condition where the same dst_entry cached in sk->sk_dst_entry can be
      decremented twice for the same socket via:
      
      do_redirect()->__sk_dst_check()-> dst_release().
      
      Which leads to the dst_entry being prematurely freed with another socket
      pointing to it via sk->sk_dst_cache and a subsequent crash.
      
      To fix this skip do_redirect() if usespace has the socket locked. Instead let
      the redirect take place later when user space does not have the socket
      locked.
      
      The dccp/IPv6 code is very similar in this respect, so fixing it there too.
      
      As Eric Garver pointed out the following commit now invalidates routes. Which
      can set the dst->obsolete flag so that ipv4_dst_check() returns null and
      triggers the dst_release().
      
      Fixes: ceb33206 ("ipv4: Kill routes during PMTU/redirect updates.")
      Cc: Eric Garver <egarver@redhat.com>
      Cc: Hannes Sowa <hsowa@redhat.com>
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45caeaa5
    • A
      net: use net->count to check whether a netns is alive or not · 91864f58
      Andrey Vagin 提交于
      The previous idea was to check whether a net namespace is in
      net_exit_list or not. It doesn't work, because net->exit_list is used in
      __register_pernet_operations and __unregister_pernet_operations where
      all namespaces are added to a temporary list to make cleanup in a error
      case, so list_empty(&net->exit_list) always returns false.
      Reported-by: NMantas Mikulėnas <grawity@gmail.com>
      Fixes: 002d8a1a ("net: skip genenerating uevents for network namespaces that are exiting")
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91864f58
    • F
      bridge: drop netfilter fake rtable unconditionally · a13b2082
      Florian Westphal 提交于
      Andreas reports kernel oops during rmmod of the br_netfilter module.
      Hannes debugged the oops down to a NULL rt6info->rt6i_indev.
      
      Problem is that br_netfilter has the nasty concept of adding a fake
      rtable to skb->dst; this happens in a br_netfilter prerouting hook.
      
      A second hook (in bridge LOCAL_IN) is supposed to remove these again
      before the skb is handed up the stack.
      
      However, on module unload hooks get unregistered which means an
      skb could traverse the prerouting hook that attaches the fake_rtable,
      while the 'fake rtable remove' hook gets removed from the hooklist
      immediately after.
      
      Fixes: 34666d46 ("netfilter: bridge: move br_netfilter out of the core")
      Reported-by: NAndreas Karis <akaris@redhat.com>
      Debugged-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a13b2082
    • F
      ipv6: avoid write to a possibly cloned skb · 79e49503
      Florian Westphal 提交于
      ip6_fragment, in case skb has a fraglist, checks if the
      skb is cloned.  If it is, it will move to the 'slow path' and allocates
      new skbs for each fragment.
      
      However, right before entering the slowpath loop, it updates the
      nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
      to account for the fragment header that will be inserted in the new
      ipv6-fragment skbs.
      
      In case original skb is cloned this munges nexthdr value of another
      skb.  Avoid this by doing the nexthdr update for each of the new fragment
      skbs separately.
      
      This was observed with tcpdump on a bridge device where netfilter ipv6
      reassembly is active:  tcpdump shows malformed fragment headers as
      the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NAndreas Karis <akaris@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e49503
    • S
      ipv6: make ECMP route replacement less greedy · 67e19400
      Sabrina Dubroca 提交于
      Commit 27596472 ("ipv6: fix ECMP route replacement") introduced a
      loop that removes all siblings of an ECMP route that is being
      replaced. However, this loop doesn't stop when it has replaced
      siblings, and keeps removing other routes with a higher metric.
      We also end up triggering the WARN_ON after the loop, because after
      this nsiblings < 0.
      
      Instead, stop the loop when we have taken care of all routes with the
      same metric as the route being replaced.
      
        Reproducer:
        ===========
          #!/bin/sh
      
          ip netns add ns1
          ip netns add ns2
          ip -net ns1 link set lo up
      
          for x in 0 1 2 ; do
              ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
              ip -net ns1 link set eth$x up
              ip -net ns2 link set veth$x up
          done
      
          ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
                  nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
          ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
          ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048
      
          echo "before replace, 3 routes"
          ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
          echo
      
          ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
                  nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2
      
          echo "after replace, only 2 routes, metric 2048 is gone"
          ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67e19400
    • P
      Revert "netfilter: nf_tables: add flush field to struct nft_set_iter" · 04166f48
      Pablo Neira Ayuso 提交于
      This reverts commit 1f48ff6c.
      
      This patch is not required anymore now that we keep a dummy list of
      set elements in the bitmap set implementation, so revert this before
      we forget this code has no clients.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      04166f48
  6. 13 3月, 2017 8 次提交
    • P
      netfilter: nft_set_bitmap: keep a list of dummy elements · e920dde5
      Pablo Neira Ayuso 提交于
      Element comments may come without any prior set flag, so we have to keep
      a list of dummy struct nft_set_ext to keep this information around. This
      is only useful for set dumps to userspace. From the packet path, this
      set type relies on the bitmap representation. This patch simplifies the
      logic since we don't need to allocate the dummy nft_set_ext structure
      anymore on the fly at the cost of increasing memory consumption because
      of the list of dummy struct nft_set_ext.
      
      Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e920dde5
    • S
      netfilter: Force fake conntrack entry to be at least 8 bytes aligned · 170a1fb9
      Steven Rostedt (VMware) 提交于
      Since the nfct and nfctinfo have been combined, the nf_conn structure
      must be at least 8 bytes aligned, as the 3 LSB bits are used for the
      nfctinfo. But there's a fake nf_conn structure to denote untracked
      connections, which is created by a PER_CPU construct. This does not
      guarantee that it will be 8 bytes aligned and can break the logic in
      determining the correct nfctinfo.
      
      I triggered this on a 32bit machine with the following error:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000af4
      IP: nf_ct_deliver_cached_events+0x1b/0xfb
      *pdpt = 0000000031962001 *pde = 0000000000000000
      
      Oops: 0000 [#1] SMP
      [Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 crc_ccitt ppdev r8169 parport_pc parport
        OK  ]
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-test+ #75
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      task: c126ec00 task.stack: c1258000
      EIP: nf_ct_deliver_cached_events+0x1b/0xfb
      EFLAGS: 00010202 CPU: 0
      EAX: 0021cd01 EBX: 00000000 ECX: 27b0c767 EDX: 32bcb17a
      ESI: f34135c0 EDI: f34135c0 EBP: f2debd60 ESP: f2debd3c
       DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      CR0: 80050033 CR2: 00000af4 CR3: 309a0440 CR4: 001406f0
      Call Trace:
       <SOFTIRQ>
       ? ipv6_skip_exthdr+0xac/0xcb
       ipv6_confirm+0x10c/0x119 [nf_conntrack_ipv6]
       nf_hook_slow+0x22/0xc7
       nf_hook+0x9a/0xad [ipv6]
       ? ip6t_do_table+0x356/0x379 [ip6_tables]
       ? ip6_fragment+0x9e9/0x9e9 [ipv6]
       ip6_output+0xee/0x107 [ipv6]
       ? ip6_fragment+0x9e9/0x9e9 [ipv6]
       dst_output+0x36/0x4d [ipv6]
       NF_HOOK.constprop.37+0xb2/0xba [ipv6]
       ? icmp6_dst_alloc+0x2c/0xfd [ipv6]
       ? local_bh_enable+0x14/0x14 [ipv6]
       mld_sendpack+0x1c5/0x281 [ipv6]
       ? mark_held_locks+0x40/0x5c
       mld_ifc_timer_expire+0x1f6/0x21e [ipv6]
       call_timer_fn+0x135/0x283
       ? detach_if_pending+0x55/0x55
       ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
       __run_timers+0x111/0x14b
       ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
       run_timer_softirq+0x1c/0x36
       __do_softirq+0x185/0x37c
       ? test_ti_thread_flag.constprop.19+0xd/0xd
       do_softirq_own_stack+0x22/0x28
       </SOFTIRQ>
       irq_exit+0x5a/0xa4
       smp_apic_timer_interrupt+0x2a/0x34
       apic_timer_interrupt+0x37/0x3c
      
      By using DEFINE/DECLARE_PER_CPU_ALIGNED we can enforce at least 8 byte
      alignment as all cache line sizes are at least 8 bytes or more.
      
      Fixes: a9e419dc ("netfilter: merge ctinfo into nfct pointer storage area")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      170a1fb9
    • F
      netfilter: bridge: honor frag_max_size when refragmenting · 4ca60d08
      Florian Westphal 提交于
      consider a bridge with mtu 9000, but end host sending smaller
      packets to another host with mtu < 9000.
      
      In this case, after reassembly, bridge+defrag would refragment,
      and then attempt to send the reassembled packet as long as it
      was below 9k.
      
      Instead we have to cap by the largest fragment size seen.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4ca60d08
    • L
      netfilter: nf_tables: fix mismatch in big-endian system · 10596608
      Liping Zhang 提交于
      Currently, there are two different methods to store an u16 integer to
      the u32 data register. For example:
        u32 *dest = &regs->data[priv->dreg];
        1. *dest = 0; *(u16 *) dest = val_u16;
        2. *dest = val_u16;
      
      For method 1, the u16 value will be stored like this, either in
      big-endian or little-endian system:
        0          15           31
        +-+-+-+-+-+-+-+-+-+-+-+-+
        |   Value   |     0     |
        +-+-+-+-+-+-+-+-+-+-+-+-+
      
      For method 2, in little-endian system, the u16 value will be the same
      as listed above. But in big-endian system, the u16 value will be stored
      like this:
        0          15           31
        +-+-+-+-+-+-+-+-+-+-+-+-+
        |     0     |   Value   |
        +-+-+-+-+-+-+-+-+-+-+-+-+
      
      So later we use "memcmp(&regs->data[priv->sreg], data, 2);" to do
      compare in nft_cmp, nft_lookup expr ..., method 2 will get the wrong
      result in big-endian system, as 0~15 bits will always be zero.
      
      For the similar reason, when loading an u16 value from the u32 data
      register, we should use "*(u16 *) sreg;" instead of "(u16)*sreg;",
      the 2nd method will get the wrong value in the big-endian system.
      
      So introduce some wrapper functions to store/load an u8 or u16
      integer to/from the u32 data register, and use them in the right
      place.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      10596608
    • L
      netfilter: nft_set_bitmap: fetch the element key based on the set->klen · fd89b23a
      Liping Zhang 提交于
      Currently we just assume the element key as a u32 integer, regardless of
      the set key length.
      
      This is incorrect, for example, the tcp port number is only 16 bits.
      So when we use the nft_payload expr to get the tcp dport and store
      it to dreg, the dport will be stored at 0~15 bits, and 16~31 bits
      will be padded with zero.
      
      So the reg->data[dreg] will be looked like as below:
        0          15           31
        +-+-+-+-+-+-+-+-+-+-+-+-+
        | tcp dport |      0    |
        +-+-+-+-+-+-+-+-+-+-+-+-+
      But for these big-endian systems, if we treate this register as a u32
      integer, the element key will be larger than 65535, so the following
      lookup in bitmap set will cause out of bound access.
      
      Another issue is that if we add element with comments in bitmap
      set(although the comments will be ignored eventually), the element will
      vanish strangely. Because we treate the element key as a u32 integer, so
      the comments will become the part of the element key, then the element
      key will also be larger than 65535 and out of bound access will happen:
        # nft add element t s { 1 comment test }
      
      Since set->klen is 1 or 2, it's fine to treate the element key as a u8 or
      u16 integer.
      
      Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fd89b23a
    • D
      mpls: Do not decrement alive counter for unregister events · 79099aab
      David Ahern 提交于
      Multipath routes can be rendered usesless when a device in one of the
      paths is deleted. For example:
      
      $ ip -f mpls ro ls
      100
      	nexthop as to 200 via inet 172.16.2.2  dev virt12
      	nexthop as to 300 via inet 172.16.3.2  dev br0
      101
      	nexthop as to 201 via inet6 2000:2::2  dev virt12
      	nexthop as to 301 via inet6 2000:3::2  dev br0
      
      $ ip li del br0
      
      When br0 is deleted the other hop is not considered in
      mpls_select_multipath because of the alive check -- rt_nhn_alive
      is 0.
      
      rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
      down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
      a 2 hop route, deleting one device drops the alive count to 0. Since
      devices are taken down before unregistering, the decrement on
      NETDEV_UNREGISTER is redundant.
      
      Fixes: c89359a4 ("mpls: support for dead routes")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79099aab
    • D
      mpls: Send route delete notifications when router module is unloaded · e37791ec
      David Ahern 提交于
      When the mpls_router module is unloaded, mpls routes are deleted but
      notifications are not sent to userspace leaving userspace caches
      out of sync. Add the call to mpls_notify_route in mpls_net_exit as
      routes are freed.
      
      Fixes: 0189197f ("mpls: Basic routing support")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e37791ec
    • E
      act_connmark: avoid crashing on malformed nlattrs with null parms · 52491c76
      Etienne Noss 提交于
      tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
      is set, resulting in a null pointer dereference when trying to access it.
      
      [501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
      ...
      [501099.044334] Call Trace:
      [501099.044345]  [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
      [501099.044363]  [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
      [501099.044380]  [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
      [501099.044398]  [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
      [501099.044417]  [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
      [501099.044436]  [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
      [501099.044454]  [<ffffffffa44a23a1>] ? security_capable+0x41/0x60
      [501099.044471]  [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
      [501099.044490]  [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
      [501099.044507]  [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
      [501099.044524]  [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
      [501099.044541]  [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230
      [501099.044558]  [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
      [501099.044576]  [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
      [501099.044592]  [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
      [501099.044608]  [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
      [501099.044626]  [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b
      
      Fixes: 22a5dc0e ("net: sched: Introduce connmark action")
      Signed-off-by: NÉtienne Noss <etienne.noss@wifirst.fr>
      Signed-off-by: NVictorien Molle <victorien.molle@wifirst.fr>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52491c76
  7. 11 3月, 2017 1 次提交
    • D
      rxrpc: Wake up the transmitter if Rx window size increases on the peer · 702f2ac8
      David Howells 提交于
      The RxRPC ACK packet may contain an extension that includes the peer's
      current Rx window size for this call.  We adjust the local Tx window size
      to match.  However, the transmitter can stall if the receive window is
      reduced to 0 by the peer and then reopened.
      
      This is because the normal way that the transmitter is re-energised is by
      dropping something out of our Tx queue and thus making space.  When a
      single gap is made, the transmitter is woken up.  However, because there's
      nothing in the Tx queue at this point, this doesn't happen.
      
      To fix this, perform a wake_up() any time we see the peer's Rx window size
      increasing.
      
      The observable symptom is that calls start failing on ETIMEDOUT and the
      following:
      
      	kAFS: SERVER DEAD state=-62
      
      appears in dmesg.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      702f2ac8
  8. 10 3月, 2017 7 次提交
    • D
      rxrpc: rxrpc_kernel_send_data() needs to handle failed call better · 6fc166d6
      David Howells 提交于
      If rxrpc_kernel_send_data() is asked to send data through a call that has
      already failed (due to a remote abort, received protocol error or network
      error), then return the associated error code saved in the call rather than
      ESHUTDOWN.
      
      This allows the caller to work out whether to ask for the abort code or not
      based on this.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fc166d6
    • A
      udp: avoid ufo handling on IP payload compression packets · 4b3b45ed
      Alexey Kodanev 提交于
      commit c146066a ("ipv4: Don't use ufo handling on later transformed
      packets") and commit f89c56ce ("ipv6: Don't use ufo handling on
      later transformed packets") added a check that 'rt->dst.header_len' isn't
      zero in order to skip UFO, but it doesn't include IPcomp in transport mode
      where it equals zero.
      
      Packets, after payload compression, may not require further fragmentation,
      and if original length exceeds MTU, later compressed packets will be
      transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
      on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:
      
      * IPv4 case, offset is wrong + unnecessary fragmentation
          udp_ipsec.sh -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
            proto Compressed IP (108), length 49)
            10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
          IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
            proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17
      
      * IPv6 case, sending small fragments
          udp_ipsec.sh -6 -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)
      
      Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
      if xfrm is set. So the new check will include both cases: IPcomp and IPsec.
      
      Fixes: c146066a ("ipv4: Don't use ufo handling on later transformed packets")
      Fixes: f89c56ce ("ipv6: Don't use ufo handling on later transformed packets")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b3b45ed
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
    • A
      net: initialize msg.msg_flags in recvfrom · 9f138fa6
      Alexander Potapenko 提交于
      KMSAN reports a use of uninitialized memory in put_cmsg() because
      msg.msg_flags in recvfrom haven't been initialized properly.
      The flag values don't affect the result on this path, but it's still a
      good idea to initialize them explicitly.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f138fa6
    • P
      net/tunnel: set inner protocol in network gro hooks · 294acf1c
      Paolo Abeni 提交于
      The gso code of several tunnels type (gre and udp tunnels)
      takes for granted that the skb->inner_protocol is properly
      initialized and drops the packet elsewhere.
      
      On the forwarding path no one is initializing such field,
      so gro encapsulated packets are dropped on forward.
      
      Since commit 38720352 ("gre: Use inner_proto to obtain
      inner header protocol"), this can be reproduced when the
      encapsulated packets use gre as the tunneling protocol.
      
      The issue happens also with vxlan and geneve tunnels since
      commit 8bce6d7d ("udp: Generalize skb_udp_segment"), if the
      forwarding host's ingress nic has h/w offload for such tunnel
      and a vxlan/geneve device is configured on top of it, regardless
      of the configured peer address and vni.
      
      To address the issue, this change initialize the inner_protocol
      field for encapsulated packets in both ipv4 and ipv6 gro complete
      callbacks.
      
      Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol")
      Fixes: 8bce6d7d ("udp: Generalize skb_udp_segment")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      294acf1c
    • Z
      rds: ib: add error handle · 3b12f73a
      Zhu Yanjun 提交于
      In the function rds_ib_setup_qp, the error handle is missing. When some
      error occurs, it is possible that memory leak occurs. As such, error
      handle is added.
      
      Cc: Joe Jin <joe.jin@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NGuanglei Li <guanglei.li@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b12f73a
    • D
      net: ipv6: Remove redundant RTA_OIF in multipath routes · 5be083ce
      David Ahern 提交于
      Dinesh reported that RTA_MULTIPATH nexthops are 8-bytes larger with IPv6
      than IPv4. The recent refactoring for multipath support in netlink
      messages does discriminate between non-multipath which needs the OIF
      and multipath which adds a rtnexthop struct for each hop making the
      RTA_OIF attribute redundant. Resolve by adding a flag to the info
      function to skip the oif for multipath.
      
      Fixes: beb1afac ("net: ipv6: Add support to dump multipath routes
             via RTA_MULTIPATH attribute")
      Reported-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be083ce
  9. 09 3月, 2017 2 次提交
    • Y
      netfilter: nf_nat_sctp: fix ICMP packet to be dropped accidently · 8e05ba7f
      Ying Xue 提交于
      Regarding RFC 792, the first 64 bits of the original SCTP datagram's
      data could be contained in ICMP packet, such as:
      
          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |     Type      |     Code      |          Checksum             |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                             unused                            |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |      Internet Header + 64 bits of Original Data Datagram      |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      
      However, according to RFC 4960, SCTP datagram header is as below:
      
          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |     Source Port Number        |     Destination Port Number   |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                      Verification Tag                         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                           Checksum                            |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      
      It means only the first three fields of SCTP header can be carried in
      ICMP packet except for Checksum field.
      
      At present in sctp_manip_pkt(), no matter whether the packet is ICMP or
      not, it always calculates SCTP packet checksum. However, not only the
      calculation of checksum is unnecessary for ICMP, but also it causes
      another fatal issue that ICMP packet is dropped. The header size of
      SCTP is used to identify whether the writeable length of skb is bigger
      than skb->len through skb_make_writable() in sctp_manip_pkt(). But
      when it deals with ICMP packet, skb_make_writable() directly returns
      false as the writeable length of skb is bigger than skb->len.
      Subsequently ICMP is dropped.
      
      Now we correct this misbahavior. When sctp_manip_pkt() handles ICMP
      packet, 8 bytes rather than the whole SCTP header size is used to check
      if writeable length of skb is overflowed. Meanwhile, as it's meaningless
      to calculate checksum when packet is ICMP, the computation of checksum
      is ignored as well.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8e05ba7f
    • F
      netfilter: don't track fragmented packets · 7b4fdf77
      Florian Westphal 提交于
      Andrey reports syzkaller splat caused by
      
      NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
      
      in ipv4 nat.  But this assertion (and the comment) are wrong, this function
      does see fragments when IP_NODEFRAG setsockopt is used.
      
      As conntrack doesn't track packets without complete l4 header, only the
      first fragment is tracked.
      
      Because applying nat to first packet but not the rest makes no sense this
      also turns off tracking of all fragments.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7b4fdf77
  10. 08 3月, 2017 3 次提交
    • W
      ipv6: reorder icmpv6_init() and ip6_mr_init() · 15e66807
      WANG Cong 提交于
      Andrey reported the following kernel crash:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88001f311700 task.stack: ffff88001f6e8000
      RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
      RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000
      RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020
      RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000
      R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040
      FS:  00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
       sock_release+0x8d/0x1e0 net/socket.c:597
       __sock_create+0x39d/0x880 net/socket.c:1226
       sock_create_kern+0x3f/0x50 net/socket.c:1243
       inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
       icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
       ops_init+0x10a/0x550 net/core/net_namespace.c:115
       setup_net+0x261/0x660 net/core/net_namespace.c:291
       copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
      9pnet_virtio: no channels available for device ./file1
       create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
       SYSC_unshare kernel/fork.c:2281 [inline]
       SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because net->ipv6.mr6_tables is not initialized at that point,
      ip6mr_rules_init() is not called yet, therefore on the error path when
      we iterator the list, we trigger this oops. Fix this by reordering
      ip6mr_rules_init() before icmpv6_sk_init().
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e66807
    • E
      dccp: fix use-after-free in dccp_feat_activate_values · 62f8f4d9
      Eric Dumazet 提交于
      Dmitry reported crashes in DCCP stack [1]
      
      Problem here is that when I got rid of listener spinlock, I missed the
      fact that DCCP stores a complex state in struct dccp_request_sock,
      while TCP does not.
      
      Since multiple cpus could access it at the same time, we need to add
      protection.
      
      [1]
      BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
      net/dccp/feat.c:1541 at addr ffff88003713be68
      Read of size 8 by task syz-executor2/8457
      CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
       print_address_description mm/kasan/report.c:200 [inline]
       kasan_report_error mm/kasan/report.c:289 [inline]
       kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
       kasan_report mm/kasan/report.c:332 [inline]
       __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
       dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
       do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
       </IRQ>
       do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
       do_softirq kernel/softirq.c:176 [inline]
       __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
       local_bh_enable include/linux/bottom_half.h:31 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
       ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
       ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
       ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
       inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
       dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
       dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
       dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
       dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       SYSC_sendto+0x660/0x810 net/socket.c:1687
       SyS_sendto+0x40/0x50 net/socket.c:1655
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x4458b9
      RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
      RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
      RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
      R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
      R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
      Object at ffff88003713be50, in cache kmalloc-64 size: 64
      Allocated:
      PID = 8446
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
       kmalloc include/linux/slab.h:490 [inline]
       dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
       dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
       __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
       dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
       dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
       dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
       dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
       dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
       dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
       dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
       sk_backlog_rcv include/net/sock.h:893 [inline]
       __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
       dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Freed:
      PID = 15
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1355 [inline]
       slab_free_freelist_hook mm/slub.c:1377 [inline]
       slab_free mm/slub.c:2954 [inline]
       kfree+0xe8/0x2b0 mm/slub.c:3874
       dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
       dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
       dccp_feat_list_pop net/dccp/feat.c:541 [inline]
       dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Memory state around the buggy address:
       ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
                                                                ^
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62f8f4d9
    • A
      net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump · 6c4dc75c
      Alexey Khoroshilov 提交于
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c4dc75c