1. 17 5月, 2013 1 次提交
    • E
      tcp: gso: do not generate out of order packets · 6ff50cd5
      Eric Dumazet 提交于
      GSO TCP handler has following issues :
      
      1) ooo_okay from original GSO packet is duplicated to all segments
      2) segments (but the last one) are orphaned, so transmit path can not
      get transmit queue number from the socket. This happens if GSO
      segmentation is done before stacked device for example.
      
      Result is we can send packets from a given TCP flow to different TX
      queues (if using multiqueue NICS). This generates OOO problems and
      spurious SACK & retransmits.
      
      Fix this by keeping socket pointer set for all segments.
      
      This means that every segment must also have a destructor, and the
      original gso skb truesize must be split on all segments, to keep
      precise sk->sk_wmem_alloc accounting.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ff50cd5
  2. 15 5月, 2013 2 次提交
  3. 12 5月, 2013 1 次提交
  4. 09 5月, 2013 1 次提交
  5. 06 5月, 2013 3 次提交
    • A
      fib_trie: no need to delay vfree() · 00203563
      Al Viro 提交于
      Now that vfree() can be called from interrupt contexts, there's no
      need to play games with schedule_work() to escape calling vfree()
      from RCU callbacks.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00203563
    • K
      net: frag, fix race conditions in LRU list maintenance · b56141ab
      Konstantin Khlebnikov 提交于
      This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
      which was introduced in commit 3ef0eb0d
      ("net: frag, move LRU list maintenance outside of rwlock")
      
      One cpu already added new fragment queue into hash but not into LRU.
      Other cpu found it in hash and tries to move it to the end of LRU.
      This leads to NULL pointer dereference inside of list_move_tail().
      
      Another possible race condition is between inet_frag_lru_move() and
      inet_frag_lru_del(): move can happens after deletion.
      
      This patch initializes LRU list head before adding fragment into hash and
      inet_frag_lru_move() doesn't touches it if it's empty.
      
      I saw this kernel oops two times in a couple of days.
      
      [119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
      [119482.140221] Oops: 0000 [#1] SMP
      [119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
      [119482.152692] CPU 3
      [119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
      [119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
      [119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
      [119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
      [119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
      [119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
      [119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
      [119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
      [119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
      [119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
      [119482.223113] Stack:
      [119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
      [119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
      [119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
      [119482.243090] Call Trace:
      [119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
      [119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
      [119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
      [119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
      [119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
      [119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
      [119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
      [119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
      [119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
      [119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
      [119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
      [119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
      [119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
      [119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
      [119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
      [119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
      [119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
      [119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
      [119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
      [119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
      [119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.348675]  RSP <ffff880216d5db58>
      [119482.353493] CR2: 0000000000000000
      
      Oops happened on this path:
      ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b56141ab
    • E
      tcp: do not expire TCP fastopen cookies · efeaa555
      Eric Dumazet 提交于
      TCP metric cache expires entries after one hour.
      
      This probably make sense for TCP RTT/RTTVAR/CWND, but not
      for TCP fastopen cookies.
      
      Its better to try previous cookie. If it appears to be obsolete,
      server will send us new cookie anyway.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efeaa555
  6. 04 5月, 2013 2 次提交
  7. 02 5月, 2013 1 次提交
  8. 30 4月, 2013 3 次提交
  9. 25 4月, 2013 1 次提交
  10. 20 4月, 2013 2 次提交
  11. 19 4月, 2013 4 次提交
    • F
      netfilter: xt_rpfilter: depend on raw or mangle table · d37d6968
      Florian Westphal 提交于
      rpfilter is only valid in raw/mangle PREROUTING, i.e.
      RPFILTER=y|m is useless without raw or mangle table support.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d37d6968
    • F
      netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too · f83a7ea2
      Florian Westphal 提交于
      Alex Efros reported rpfilter module doesn't match following packets:
      IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
      (netfilter bugzilla #814).
      
      Problem is that network stack arranges for the locally generated broadcasts
      to appear on the interface they were sent out, so the IFF_LOOPBACK check
      doesn't trigger.
      
      As -m rpfilter is restricted to PREROUTING, we can check for existing
      rtable instead, it catches locally-generated broad/multicast case, too.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f83a7ea2
    • E
      tcp: introduce TCPSpuriousRtxHostQueues SNMP counter · 0e280af0
      Eric Dumazet 提交于
      Host queues (Qdisc + NIC) can hold packets so long that TCP can
      eventually retransmit a packet before the first transmit even left
      the host.
      
      Its not clear right now if we could avoid this in the first place :
      
      - We could arm RTO timer not at the time we enqueue packets, but
        at the time we TX complete them (tcp_wfree())
      
      - Cancel the sending of the new copy of the packet if prior one
        is still in queue.
      
      This patch adds instrumentation so that we can at least see how
      often this problem happens.
      
      TCPSpuriousRtxHostQueues SNMP counter is incremented every time
      we detect the fast clone is not yet freed in tcp_transmit_skb()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e280af0
    • P
      netfilter: add my copyright statements · f229f6ce
      Patrick McHardy 提交于
      Add copyright statements to all netfilter files which have had significant
      changes done by myself in the past.
      
      Some notes:
      
      - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
        Core Team when it got split out of nf_conntrack_core.c. The copyrights
        even state a date which lies six years before it was written. It was
        written in 2005 by Harald and myself.
      
      - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
        statements. I've added the copyright statement from net/netfilter/core.c,
        where this code originated
      
      - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
        it to give the wrong impression
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f229f6ce
  12. 17 4月, 2013 1 次提交
    • E
      net: drop dst before queueing fragments · 97599dc7
      Eric Dumazet 提交于
      Commit 4a94445c (net: Use ip_route_input_noref() in input path)
      added a bug in IP defragmentation handling, as non refcounted
      dst could escape an RCU protected section.
      
      Commit 64f3b9e2 (net: ip_expire() must revalidate route) fixed
      the case of timeouts, but not the general problem.
      
      Tom Parkin noticed crashes in UDP stack and provided a patch,
      but further analysis permitted us to pinpoint the root cause.
      
      Before queueing a packet into a frag list, we must drop its dst,
      as this dst has limited lifetime (RCU protected)
      
      When/if a packet is finally reassembled, we use the dst of the very
      last skb, still protected by RCU and valid, as the dst of the
      reassembled packet.
      
      Use same logic in IPv6, as there is no need to hold dst references.
      Reported-by: NTom Parkin <tparkin@katalix.com>
      Tested-by: NTom Parkin <tparkin@katalix.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97599dc7
  13. 16 4月, 2013 1 次提交
  14. 15 4月, 2013 2 次提交
  15. 14 4月, 2013 1 次提交
  16. 13 4月, 2013 1 次提交
    • E
      tcp: GSO should be TSQ friendly · d6a4a104
      Eric Dumazet 提交于
      I noticed that TSQ (TCP Small queues) was less effective when TSO is
      turned off, and GSO is on. If BQL is not enabled, TSQ has then no
      effect.
      
      It turns out the GSO engine frees the original gso_skb at the time the
      fragments are generated and queued to the NIC.
      
      We should instead call the tcp_wfree() destructor for the last fragment,
      to keep the flow control as intended in TSQ. This effectively limits
      the number of queued packets on qdisc + NIC layers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a4a104
  17. 12 4月, 2013 2 次提交
  18. 10 4月, 2013 3 次提交
  19. 09 4月, 2013 3 次提交
  20. 08 4月, 2013 2 次提交
  21. 06 4月, 2013 2 次提交
    • G
      netfilter: ipt_ULOG: add net namespace support for ipt_ULOG · 35543067
      Gao feng 提交于
      Add pernet support to ipt_ULOG by means of the new nf_log_set
      function added in (30e0c6a6 netfilter: nf_log: prepare net
      namespace support for loggers).
      
      This patch also make ulog_buffers and netlink socket
      nflognl per netns.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      35543067
    • G
      netfilter: nf_log: prepare net namespace support for loggers · 30e0c6a6
      Gao feng 提交于
      This patch adds netns support to nf_log and it prepares netns
      support for existing loggers. It is composed of four major
      changes.
      
      1) nf_log_register has been split to two functions: nf_log_register
         and nf_log_set. The new nf_log_register is used to globally
         register the nf_logger and nf_log_set is used for enabling
         pernet support from nf_loggers.
      
         Per netns is not yet complete after this patch, it comes in
         separate follow up patches.
      
      2) Add net as a parameter of nf_log_bind_pf. Per netns is not
         yet complete after this patch, it only allows to bind the
         nf_logger to the protocol family from init_net and it skips
         other cases.
      
      3) Adapt all nf_log_packet callers to pass netns as parameter.
         After this patch, this function only works for init_net.
      
      4) Make the sysctl net/netfilter/nf_log pernet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30e0c6a6
  22. 05 4月, 2013 1 次提交