1. 08 4月, 2015 1 次提交
    • D
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller 提交于
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7026b1dd
  2. 05 4月, 2015 6 次提交
  3. 04 4月, 2015 2 次提交
  4. 03 4月, 2015 6 次提交
  5. 01 4月, 2015 5 次提交
    • J
      netlink: implement nla_get_in_addr and nla_get_in6_addr · 67b61f6c
      Jiri Benc 提交于
      Those are counterparts to nla_put_in_addr and nla_put_in6_addr.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b61f6c
    • J
      netlink: implement nla_put_in_addr and nla_put_in6_addr · 930345ea
      Jiri Benc 提交于
      IP addresses are often stored in netlink attributes. Add generic functions
      to do that.
      
      For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
      not used universally throughout the kernel, in way too many places __be32 is
      used to store IPv4 address.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      930345ea
    • J
      tcp: simplify inetpeer_addr_base use · 8f55db48
      Jiri Benc 提交于
      In many places, the a6 field is typecasted to struct in6_addr. As the
      fields are in union anyway, just add in6_addr type to the union and get rid
      of the typecasting.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f55db48
    • A
      fib_trie: Cleanup ip_fib_net_exit code path · 6e47d6ca
      Alexander Duyck 提交于
      While fixing a recent issue I noticed that we are doing some unnecessary
      work inside the loop for ip_fib_net_exit.  As such I am pulling out the
      initialization to NULL for the locally stored fib_local, fib_main, and
      fib_default.
      
      In addition I am restoring the original code for flushing the table as
      there is no need to split up the fib_table_flush and hlist_del work since
      the code for packing the tnodes with multiple key vectors was dropped.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e47d6ca
    • A
      fib_trie: Fix warning on fib4_rules_exit · ad88d051
      Alexander Duyck 提交于
      This fixes the following warning:
      
       BUG: sleeping function called from invalid context at mm/slub.c:1268
       in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
       INFO: lockdep is turned off.
       CPU: 3 PID: 6 Comm: kworker/u8:0 Tainted: G        W       4.0.0-rc5+ #895
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       Workqueue: netns cleanup_net
        0000000000000006 ffff88011953fa68 ffffffff81a203b6 000000002c3a2c39
        ffff88011952a680 ffff88011953fa98 ffffffff8109daf0 ffff8801186c6aa8
        ffffffff81fbc9e5 00000000000004f4 0000000000000000 ffff88011953fac8
       Call Trace:
        [<ffffffff81a203b6>] dump_stack+0x4c/0x65
        [<ffffffff8109daf0>] ___might_sleep+0x1c3/0x1cb
        [<ffffffff8109db70>] __might_sleep+0x78/0x80
        [<ffffffff8117a60e>] slab_pre_alloc_hook+0x31/0x8f
        [<ffffffff8117d4f6>] __kmalloc+0x69/0x14e
        [<ffffffff818ed0e1>] ? kzalloc.constprop.20+0xe/0x10
        [<ffffffff818ed0e1>] kzalloc.constprop.20+0xe/0x10
        [<ffffffff818ef622>] fib_trie_table+0x27/0x8b
        [<ffffffff818ef6bd>] fib_trie_unmerge+0x37/0x2a6
        [<ffffffff810b06e1>] ? arch_local_irq_save+0x9/0xc
        [<ffffffff818e9793>] fib_unmerge+0x2d/0xb3
        [<ffffffff818f5f56>] fib4_rule_delete+0x1f/0x52
        [<ffffffff817f1c3f>] ? fib_rules_unregister+0x30/0xb2
        [<ffffffff817f1c8b>] fib_rules_unregister+0x7c/0xb2
        [<ffffffff818f64a1>] fib4_rules_exit+0x15/0x18
        [<ffffffff818e8c0a>] ip_fib_net_exit+0x23/0xf2
        [<ffffffff818e91f8>] fib_net_exit+0x32/0x36
        [<ffffffff817c8352>] ops_exit_list+0x45/0x57
        [<ffffffff817c8d3d>] cleanup_net+0x13c/0x1cd
        [<ffffffff8108b05d>] process_one_work+0x255/0x4ad
        [<ffffffff8108af69>] ? process_one_work+0x161/0x4ad
        [<ffffffff8108b4b1>] worker_thread+0x1cd/0x2ab
        [<ffffffff8108b2e4>] ? process_scheduled_works+0x2f/0x2f
        [<ffffffff81090686>] kthread+0xd4/0xdc
        [<ffffffff8109ec8f>] ? local_clock+0x19/0x22
        [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
        [<ffffffff81a2c0c8>] ret_from_fork+0x58/0x90
        [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
      
      The issue was that as a part of exiting the default rules were being
      deleted which resulted in the local trie being unmerged.  By moving the
      freeing of the FIB tables up we can avoid the unmerge since there is no
      local table left when we call the fib4_rules_exit function.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad88d051
  6. 30 3月, 2015 2 次提交
  7. 26 3月, 2015 1 次提交
  8. 25 3月, 2015 7 次提交
  9. 24 3月, 2015 8 次提交
    • M
      tcp: prevent fetching dst twice in early demux code · d0c294c5
      Michal Kubeček 提交于
      On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()
      
              struct dst_entry *dst = sk->sk_rx_dst;
      
              if (dst)
                      dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
      
      to code reading sk->sk_rx_dst twice, once for the test and once for
      the argument of ip6_dst_check() (dst_check() is inline). This allows
      ip6_dst_check() to be called with null first argument, causing a crash.
      
      Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6
      TCP early demux code.
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Fixes: c7109986 ("ipv6: Early TCP socket demux")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0c294c5
    • F
      inet: fix double request socket freeing · c6973669
      Fan Du 提交于
      Eric Hugne reported following error :
      
      I'm hitting this warning on latest net-next when i try to SSH into a machine
      with eth0 added to a bridge (but i think the problem is older than that)
      
      Steps to reproduce:
      node2 ~ # brctl addif br0 eth0
      [  223.758785] device eth0 entered promiscuous mode
      node2 ~ # ip link set br0 up
      [  244.503614] br0: port 1(eth0) entered forwarding state
      [  244.505108] br0: port 1(eth0) entered forwarding state
      node2 ~ # [  251.160159] ------------[ cut here ]------------
      [  251.160831] WARNING: CPU: 0 PID: 3 at include/net/request_sock.h:102 tcp_v4_err+0x6b1/0x720()
      [  251.162077] Modules linked in:
      [  251.162496] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc3+ #18
      [  251.163334] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  251.164078]  ffffffff81a8365c ffff880038a6ba18 ffffffff8162ace4 0000000000009898
      [  251.165084]  0000000000000000 ffff880038a6ba58 ffffffff8104da85 ffff88003fa437c0
      [  251.166195]  ffff88003fa437c0 ffff88003fa74e00 ffff88003fa43bb8 ffff88003fad99a0
      [  251.167203] Call Trace:
      [  251.167533]  [<ffffffff8162ace4>] dump_stack+0x45/0x57
      [  251.168206]  [<ffffffff8104da85>] warn_slowpath_common+0x85/0xc0
      [  251.169239]  [<ffffffff8104db65>] warn_slowpath_null+0x15/0x20
      [  251.170271]  [<ffffffff81559d51>] tcp_v4_err+0x6b1/0x720
      [  251.171408]  [<ffffffff81630d03>] ? _raw_read_lock_irq+0x3/0x10
      [  251.172589]  [<ffffffff81534e20>] ? inet_del_offload+0x40/0x40
      [  251.173366]  [<ffffffff81569295>] icmp_socket_deliver+0x65/0xb0
      [  251.174134]  [<ffffffff815693a2>] icmp_unreach+0xc2/0x280
      [  251.174820]  [<ffffffff8156a82d>] icmp_rcv+0x2bd/0x3a0
      [  251.175473]  [<ffffffff81534ea2>] ip_local_deliver_finish+0x82/0x1e0
      [  251.176282]  [<ffffffff815354d8>] ip_local_deliver+0x88/0x90
      [  251.177004]  [<ffffffff815350f0>] ip_rcv_finish+0xf0/0x310
      [  251.177693]  [<ffffffff815357bc>] ip_rcv+0x2dc/0x390
      [  251.178336]  [<ffffffff814f5da3>] __netif_receive_skb_core+0x713/0xa20
      [  251.179170]  [<ffffffff814f7fca>] __netif_receive_skb+0x1a/0x80
      [  251.179922]  [<ffffffff814f97d4>] process_backlog+0x94/0x120
      [  251.180639]  [<ffffffff814f9612>] net_rx_action+0x1e2/0x310
      [  251.181356]  [<ffffffff81051267>] __do_softirq+0xa7/0x290
      [  251.182046]  [<ffffffff81051469>] run_ksoftirqd+0x19/0x30
      [  251.182726]  [<ffffffff8106cc23>] smpboot_thread_fn+0x153/0x1d0
      [  251.183485]  [<ffffffff8106cad0>] ? SyS_setgroups+0x130/0x130
      [  251.184228]  [<ffffffff8106935e>] kthread+0xee/0x110
      [  251.184871]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
      [  251.185690]  [<ffffffff81631108>] ret_from_fork+0x58/0x90
      [  251.186385]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
      [  251.187216] ---[ end trace c947fc7b24e42ea1 ]---
      [  259.542268] br0: port 1(eth0) entered forwarding state
      
      Remove the double calls to reqsk_put()
      
      [edumazet] :
      
      I got confused because reqsk_timer_handler() _has_ to call
      reqsk_put(req) after calling inet_csk_reqsk_queue_drop(), as
      the timer handler holds a reference on req.
      Signed-off-by: NFan Du <fan.du@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NErik Hugne <erik.hugne@ericsson.com>
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6973669
    • A
      fib_trie: Fix regression in handling of inflate/halve failure · b6f15f82
      Alexander Duyck 提交于
      When I updated the code to address a possible null pointer dereference in
      resize I ended up reverting an exception handling fix for the suffix length
      in the event that inflate or halve failed.  This change is meant to correct
      that by reverting the earlier fix and instead simply getting the parent
      again after inflate has been completed to avoid the possible null pointer
      issue.
      
      Fixes: ddb4b9a1 ("fib_trie: Address possible NULL pointer dereference in resize")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f15f82
    • E
      ipv4: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets · 26e37360
      Eric Dumazet 提交于
      tcp_v4_err() can restrict lookups to ehash table, and not to listeners.
      
      Note this patch creates the infrastructure, but this means that ICMP
      messages for request sockets are ignored until complete conversion.
      
      New tcp_req_err() helper is exported so that we can use it in IPv6
      in following patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26e37360
    • E
      net: convert syn_wait_lock to a spinlock · b2827053
      Eric Dumazet 提交于
      This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.
      
      We hold syn_wait_lock for such small sections, that it makes no sense to use
      a read/write lock. A spin lock is simply faster.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2827053
    • E
      inet: remove some sk_listener dependencies · 8b929ab1
      Eric Dumazet 提交于
      listener can be source of false sharing. request sock has some
      useful information like : ireq->ir_iif, ireq->ir_num, ireq->ireq_net
      
      This patch does not solve the major problem of having to read
      sk->sk_protocol which is sharing a cache line with sk->sk_wmem_alloc.
      (This same field is read later in ip_build_and_send_pkt())
      
      One idea would be to move sk_protocol close to sk_family
      (using 8 bits instead of 16 for sk_family seems enough)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b929ab1
    • E
      inet: remove sk_listener parameter from syn_ack_timeout() · 42cb80a2
      Eric Dumazet 提交于
      It is not needed, and req->sk_listener points to the listener anyway.
      request_sock argument can be const.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42cb80a2
    • E
      inet: cache listen_sock_qlen() and read rskq_defer_accept once · 2b41fab7
      Eric Dumazet 提交于
      Cache listen_sock_qlen() to limit false sharing, and read
      rskq_defer_accept once as it might change under us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b41fab7
  10. 23 3月, 2015 1 次提交
  11. 21 3月, 2015 1 次提交