1. 17 3月, 2017 5 次提交
    • S
      tcp: remove per-destination timestamp cache · d82bae12
      Soheil Hassas Yeganeh 提交于
      Commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection)
      randomizes TCP timestamps per connection. After this commit,
      there is no guarantee that the timestamps received from the
      same destination are monotonically increasing. As a result,
      the per-destination timestamp cache in TCP metrics (i.e., tcpm_ts
      in struct tcp_metrics_block) is broken and cannot be relied upon.
      
      Remove the per-destination timestamp cache and all related code
      paths.
      
      Note that this cache was already broken for caching timestamps of
      multiple machines behind a NAT sharing the same address.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Cc: Lutz Vieweg <lvml@5t9.de>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d82bae12
    • C
      tcp_westwood: fix tcp_westwood_info() style mistakes · be7164cd
      chun Long 提交于
      replace comma to semi colons in tcp_westwood_info().
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be7164cd
    • I
      ipv4: fib_rules: Dump FIB rules when registering FIB notifier · 5d7bfd14
      Ido Schimmel 提交于
      In commit c3852ef7 ("ipv4: fib: Replay events when registering FIB
      notifier") we dumped the FIB tables and replayed the events to the
      passed notification block.
      
      However, we merely sent a RULE_ADD notification in case custom rules
      were in use. As explained in previous patches, this approach won't work
      anymore. Instead, we should notify the caller about all the FIB rules
      and let it act accordingly.
      
      Upon registration to the FIB notification chain, replay a RULE_ADD
      notification for each programmed FIB rule, custom or not. The integrity
      of the dump is ensured by the mechanism introduced in the above
      mentioned commit.
      
      Prevent regressions by making sure current listeners correctly sanitize
      the notified rules.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d7bfd14
    • I
      ipv4: fib_rules: Add notifier info to FIB rules notifications · 6a003a5f
      Ido Schimmel 提交于
      Whenever a FIB rule is added or removed, a notification is sent in the
      FIB notification chain. However, listeners don't have a way to tell
      which rule was added or removed.
      
      This is problematic as we would like to give listeners the ability to
      decide which action to execute based on the notified rule. Specifically,
      offloading drivers should be able to determine if they support the
      reflection of the notified FIB rule and flush their LPM tables in case
      they don't.
      
      Do that by adding a notifier info to these notifications and embed the
      common FIB rule struct in it.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a003a5f
    • I
      ipv4: fib_rules: Check if rule is a default rule · 3c71006d
      Ido Schimmel 提交于
      Currently, when non-default (custom) FIB rules are used, devices capable
      of layer 3 offloading flush their tables and let the kernel do the
      forwarding instead.
      
      When these devices' drivers are loaded they register to the FIB
      notification chain, which lets them know about the existence of any
      custom FIB rules. This is done by sending a RULE_ADD notification based
      on the value of 'net->ipv4.fib_has_custom_rules'.
      
      This approach is problematic when VRF offload is taken into account, as
      upon the creation of the first VRF netdev, a l3mdev rule is programmed
      to direct skbs to the VRF's table.
      
      Instead of merely reading the above value and sending a single RULE_ADD
      notification, we should iterate over all the FIB rules and send a
      detailed notification for each, thereby allowing offloading drivers to
      sanitize the rules they don't support and potentially flush their
      tables.
      
      While l3mdev rules are uniquely marked, the default rules are not.
      Therefore, when they are being notified they might invoke offloading
      drivers to unnecessarily flush their tables.
      
      Solve this by adding an helper to check if a FIB rule is a default rule.
      Namely, its selector should match all packets and its action should
      point to the local, main or default tables.
      
      As noted by David Ahern, uniquely marking the default rules is
      insufficient. When using VRFs, it's common to avoid false hits by moving
      the rule for the local table to just before the main table:
      
      Default configuration:
      $ ip rule show
      0:      from all lookup local
      32766:  from all lookup main
      32767:  from all lookup default
      
      Common configuration with VRFs:
      $ ip rule show
      1000:   from all lookup [l3mdev-table]
      32765:  from all lookup local
      32766:  from all lookup main
      32767:  from all lookup default
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c71006d
  2. 14 3月, 2017 1 次提交
    • J
      dccp/tcp: fix routing redirect race · 45caeaa5
      Jon Maxwell 提交于
      As Eric Dumazet pointed out this also needs to be fixed in IPv6.
      v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
      
      We have seen a few incidents lately where a dst_enty has been freed
      with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
      dst_entry. If the conditions/timings are right a crash then ensues when the
      freed dst_entry is referenced later on. A Common crashing back trace is:
      
       #8 [] page_fault at ffffffff8163e648
          [exception RIP: __tcp_ack_snd_check+74]
      .
      .
       #9 [] tcp_rcv_established at ffffffff81580b64
      #10 [] tcp_v4_do_rcv at ffffffff8158b54a
      #11 [] tcp_v4_rcv at ffffffff8158cd02
      #12 [] ip_local_deliver_finish at ffffffff815668f4
      #13 [] ip_local_deliver at ffffffff81566bd9
      #14 [] ip_rcv_finish at ffffffff8156656d
      #15 [] ip_rcv at ffffffff81566f06
      #16 [] __netif_receive_skb_core at ffffffff8152b3a2
      #17 [] __netif_receive_skb at ffffffff8152b608
      #18 [] netif_receive_skb at ffffffff8152b690
      #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
      #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
      #21 [] net_rx_action at ffffffff8152bac2
      #22 [] __do_softirq at ffffffff81084b4f
      #23 [] call_softirq at ffffffff8164845c
      #24 [] do_softirq at ffffffff81016fc5
      #25 [] irq_exit at ffffffff81084ee5
      #26 [] do_IRQ at ffffffff81648ff8
      
      Of course it may happen with other NIC drivers as well.
      
      It's found the freed dst_entry here:
      
       224 static bool tcp_in_quickack_mode(struct sock *sk)
       225 {
       226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);
       227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);
       228 
       229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||
       230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
       231 }
      
      But there are other backtraces attributed to the same freed dst_entry in
      netfilter code as well.
      
      All the vmcores showed 2 significant clues:
      
      - Remote hosts behind the default gateway had always been redirected to a
      different gateway. A rtable/dst_entry will be added for that host. Making
      more dst_entrys with lower reference counts. Making this more probable.
      
      - All vmcores showed a postitive LockDroppedIcmps value, e.g:
      
      LockDroppedIcmps                  267
      
      A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
      regardless of whether user space has the socket locked. This can result in a
      race condition where the same dst_entry cached in sk->sk_dst_entry can be
      decremented twice for the same socket via:
      
      do_redirect()->__sk_dst_check()-> dst_release().
      
      Which leads to the dst_entry being prematurely freed with another socket
      pointing to it via sk->sk_dst_cache and a subsequent crash.
      
      To fix this skip do_redirect() if usespace has the socket locked. Instead let
      the redirect take place later when user space does not have the socket
      locked.
      
      The dccp/IPv6 code is very similar in this respect, so fixing it there too.
      
      As Eric Garver pointed out the following commit now invalidates routes. Which
      can set the dst->obsolete flag so that ipv4_dst_check() returns null and
      triggers the dst_release().
      
      Fixes: ceb33206 ("ipv4: Kill routes during PMTU/redirect updates.")
      Cc: Eric Garver <egarver@redhat.com>
      Cc: Hannes Sowa <hsowa@redhat.com>
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45caeaa5
  3. 13 3月, 2017 1 次提交
  4. 11 3月, 2017 2 次提交
  5. 10 3月, 2017 4 次提交
    • A
      udp: avoid ufo handling on IP payload compression packets · 4b3b45ed
      Alexey Kodanev 提交于
      commit c146066a ("ipv4: Don't use ufo handling on later transformed
      packets") and commit f89c56ce ("ipv6: Don't use ufo handling on
      later transformed packets") added a check that 'rt->dst.header_len' isn't
      zero in order to skip UFO, but it doesn't include IPcomp in transport mode
      where it equals zero.
      
      Packets, after payload compression, may not require further fragmentation,
      and if original length exceeds MTU, later compressed packets will be
      transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
      on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:
      
      * IPv4 case, offset is wrong + unnecessary fragmentation
          udp_ipsec.sh -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
            proto Compressed IP (108), length 49)
            10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
          IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
            proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17
      
      * IPv6 case, sending small fragments
          udp_ipsec.sh -6 -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)
      
      Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
      if xfrm is set. So the new check will include both cases: IPcomp and IPsec.
      
      Fixes: c146066a ("ipv4: Don't use ufo handling on later transformed packets")
      Fixes: f89c56ce ("ipv6: Don't use ufo handling on later transformed packets")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b3b45ed
    • A
      tcp: rename *_sequence_number() to *_seq_and_tsoff() · a30aad50
      Alexey Kodanev 提交于
      The functions that are returning tcp sequence number also setup
      TS offset value, so rename them to better describe their purpose.
      
      No functional changes in this patch.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a30aad50
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
    • P
      net/tunnel: set inner protocol in network gro hooks · 294acf1c
      Paolo Abeni 提交于
      The gso code of several tunnels type (gre and udp tunnels)
      takes for granted that the skb->inner_protocol is properly
      initialized and drops the packet elsewhere.
      
      On the forwarding path no one is initializing such field,
      so gro encapsulated packets are dropped on forward.
      
      Since commit 38720352 ("gre: Use inner_proto to obtain
      inner header protocol"), this can be reproduced when the
      encapsulated packets use gre as the tunneling protocol.
      
      The issue happens also with vxlan and geneve tunnels since
      commit 8bce6d7d ("udp: Generalize skb_udp_segment"), if the
      forwarding host's ingress nic has h/w offload for such tunnel
      and a vxlan/geneve device is configured on top of it, regardless
      of the configured peer address and vni.
      
      To address the issue, this change initialize the inner_protocol
      field for encapsulated packets in both ipv4 and ipv6 gro complete
      callbacks.
      
      Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol")
      Fixes: 8bce6d7d ("udp: Generalize skb_udp_segment")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      294acf1c
  6. 08 3月, 2017 1 次提交
  7. 03 3月, 2017 1 次提交
  8. 02 3月, 2017 4 次提交
    • I
      sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1
      Ingo Molnar 提交于
      sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
      
      Fix up affected files that include this signal functionality via sched.h.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      174cd4b1
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571
      Ingo Molnar 提交于
      We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and .c files.
      
      Create a trivial placeholder <linux/sched/clock.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e6017571
    • E
      tcp/dccp: block BH for SYN processing · 449809a6
      Eric Dumazet 提交于
      SYN processing really was meant to be handled from BH.
      
      When I got rid of BH blocking while processing socket backlog
      in commit 5413d1ba ("net: do not block BH while processing socket
      backlog"), I forgot that a malicious user could transition to TCP_LISTEN
      from a state that allowed (SYN) packets to be parked in the socket
      backlog while socket is owned by the thread doing the listen() call.
      
      Sure enough syzkaller found this and reported the bug ;)
      
      =================================
      [ INFO: inconsistent lock state ]
      4.10.0+ #60 Not tainted
      ---------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      syz-executor0/5090 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
      [<ffffffff83a6a370>] spin_lock include/linux/spinlock.h:299 [inline]
       (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
      [<ffffffff83a6a370>] inet_ehash_insert+0x240/0xad0
      net/ipv4/inet_hashtables.c:407
      {IN-SOFTIRQ-W} state was registered at:
        mark_irqflags kernel/locking/lockdep.c:2923 [inline]
        __lock_acquire+0xbcf/0x3270 kernel/locking/lockdep.c:3295
        lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
        spin_lock include/linux/spinlock.h:299 [inline]
        inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
        reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
        inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
        tcp_conn_request+0x25cc/0x3310 net/ipv4/tcp_input.c:6399
        tcp_v4_conn_request+0x157/0x220 net/ipv4/tcp_ipv4.c:1262
        tcp_rcv_state_process+0x802/0x4130 net/ipv4/tcp_input.c:5889
        tcp_v4_do_rcv+0x56b/0x940 net/ipv4/tcp_ipv4.c:1433
        tcp_v4_rcv+0x2e12/0x3210 net/ipv4/tcp_ipv4.c:1711
        ip_local_deliver_finish+0x4ce/0xc40 net/ipv4/ip_input.c:216
        NF_HOOK include/linux/netfilter.h:257 [inline]
        ip_local_deliver+0x1ce/0x710 net/ipv4/ip_input.c:257
        dst_input include/net/dst.h:492 [inline]
        ip_rcv_finish+0xb1d/0x2110 net/ipv4/ip_input.c:396
        NF_HOOK include/linux/netfilter.h:257 [inline]
        ip_rcv+0xd90/0x19c0 net/ipv4/ip_input.c:487
        __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4179
        __netif_receive_skb+0x2a/0x170 net/core/dev.c:4217
        netif_receive_skb_internal+0x1d6/0x430 net/core/dev.c:4245
        napi_skb_finish net/core/dev.c:4602 [inline]
        napi_gro_receive+0x4e6/0x680 net/core/dev.c:4636
        e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4033 [inline]
        e1000_clean_rx_irq+0x5e0/0x1490
      drivers/net/ethernet/intel/e1000/e1000_main.c:4489
        e1000_clean+0xb9a/0x2910 drivers/net/ethernet/intel/e1000/e1000_main.c:3834
        napi_poll net/core/dev.c:5171 [inline]
        net_rx_action+0xe70/0x1900 net/core/dev.c:5236
        __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
        invoke_softirq kernel/softirq.c:364 [inline]
        irq_exit+0x19e/0x1d0 kernel/softirq.c:405
        exiting_irq arch/x86/include/asm/apic.h:658 [inline]
        do_IRQ+0x81/0x1a0 arch/x86/kernel/irq.c:250
        ret_from_intr+0x0/0x20
        native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
        arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
        default_idle+0x8f/0x410 arch/x86/kernel/process.c:271
        arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:262
        default_idle_call+0x36/0x60 kernel/sched/idle.c:96
        cpuidle_idle_call kernel/sched/idle.c:154 [inline]
        do_idle+0x348/0x440 kernel/sched/idle.c:243
        cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:345
        start_secondary+0x344/0x440 arch/x86/kernel/smpboot.c:272
        verify_cpu+0x0/0xfc
      irq event stamp: 1741
      hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
      __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
      [inline]
      hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
      _raw_spin_unlock_irqrestore+0xf7/0x1a0 kernel/locking/spinlock.c:191
      hardirqs last disabled at (1740): [<ffffffff84d4a732>]
      __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
      hardirqs last disabled at (1740): [<ffffffff84d4a732>]
      _raw_spin_lock_irqsave+0xa2/0x110 kernel/locking/spinlock.c:159
      softirqs last  enabled at (1738): [<ffffffff84d4deff>]
      __do_softirq+0x7cf/0xb7d kernel/softirq.c:310
      softirqs last disabled at (1571): [<ffffffff84d4b92c>]
      do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&hashinfo->ehash_locks[i])->rlock);
        <Interrupt>
          lock(&(&hashinfo->ehash_locks[i])->rlock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor0/5090:
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>] lock_sock
      include/net/sock.h:1460 [inline]
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>]
      sock_setsockopt+0x233/0x1e40 net/core/sock.c:683
      
      stack backtrace:
      CPU: 1 PID: 5090 Comm: syz-executor0 Not tainted 4.10.0+ #60
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2387
       valid_state kernel/locking/lockdep.c:2400 [inline]
       mark_lock_irq kernel/locking/lockdep.c:2602 [inline]
       mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3065
       mark_irqflags kernel/locking/lockdep.c:2941 [inline]
       __lock_acquire+0x6dc/0x3270 kernel/locking/lockdep.c:3295
       lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
       reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
       inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
       dccp_v6_conn_request+0xada/0x11b0 net/dccp/ipv6.c:380
       dccp_rcv_state_process+0x51e/0x1660 net/dccp/input.c:606
       dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
       sk_backlog_rcv include/net/sock.h:896 [inline]
       __release_sock+0x127/0x3a0 net/core/sock.c:2052
       release_sock+0xa5/0x2b0 net/core/sock.c:2539
       sock_setsockopt+0x60f/0x1e40 net/core/sock.c:1016
       SYSC_setsockopt net/socket.c:1782 [inline]
       SyS_setsockopt+0x2fb/0x3a0 net/socket.c:1765
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x4458b9
      RSP: 002b:00007fe8b26c2b58 EFLAGS: 00000292 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00000000004458b9
      RDX: 000000000000001a RSI: 0000000000000001 RDI: 0000000000000006
      RBP: 00000000006e2110 R08: 0000000000000010 R09: 0000000000000000
      R10: 00000000208c3000 R11: 0000000000000292 R12: 0000000000708000
      R13: 0000000020000000 R14: 0000000000001000 R15: 0000000000000000
      
      Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      449809a6
    • L
      net: route: add missing nla_policy entry for RTA_MARK attribute · 3b45a410
      Liping Zhang 提交于
      This will add stricter validating for RTA_MARK attribute.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b45a410
  9. 28 2月, 2017 2 次提交
  10. 27 2月, 2017 2 次提交
  11. 23 2月, 2017 2 次提交
  12. 22 2月, 2017 2 次提交
  13. 18 2月, 2017 2 次提交
  14. 15 2月, 2017 3 次提交
  15. 14 2月, 2017 1 次提交
    • R
      NET: Fix /proc/net/arp for AX.25 · 4872e57c
      Ralf Baechle 提交于
      When sending ARP requests over AX.25 links the hwaddress in the neighbour
      cache are not getting initialized.  For such an incomplete arp entry
      ax2asc2 will generate an empty string resulting in /proc/net/arp output
      like the following:
      
      $ cat /proc/net/arp
      IP address       HW type     Flags       HW address            Mask     Device
      192.168.122.1    0x1         0x2         52:54:00:00:5d:5f     *        ens3
      172.20.1.99      0x3         0x0              *        bpq0
      
      The missing field will confuse the procfs parsing of arp(8) resulting in
      incorrect output for the device such as the following:
      
      $ arp
      Address                  HWtype  HWaddress           Flags Mask            Iface
      gateway                  ether   52:54:00:00:5d:5f   C                     ens3
      172.20.1.99                      (incomplete)                              ens3
      
      This changes the content of /proc/net/arp to:
      
      $ cat /proc/net/arp
      IP address       HW type     Flags       HW address            Mask     Device
      172.20.1.99      0x3         0x0         *                     *        bpq0
      192.168.122.1    0x1         0x2         52:54:00:00:5d:5f     *        ens3
      
      To do so it change ax2asc to put the string "*" in buf for a NULL address
      argument.  Finally the HW address field is left aligned in a 17 character
      field (the length of an ethernet HW address in the usual hex notation) for
      readability.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4872e57c
  16. 12 2月, 2017 1 次提交
  17. 11 2月, 2017 4 次提交
  18. 10 2月, 2017 1 次提交
  19. 09 2月, 2017 1 次提交