1. 10 2月, 2014 1 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f41f0319
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/nftables/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes, mostly nftables
      fixes, most relevantly they are:
      
      * Fix a crash in the h323 conntrack NAT helper due to expectation list
        corruption, from Alexey Dobriyan.
      
      * A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
        in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
        me.
      
      * Dump direction attribute in nft_ct only if it is set, from Arturo
        Borrero.
      
      * Fix IPVS bug in its own connection tracking system that may lead to
        copying only 4 bytes of the IPv6 address when initializing the
        ip_vs_conn object, from Michal Kubecek.
      
      * Fix -EBUSY errors in nftables when deleting the rules, chain and tables
        in a row due mixture of asynchronous and synchronous object releasing,
        from me.
      
      * Three fixes for the nf_tables set infrastructure when using intervals and
        mappings, from me.
      
      * Four patches to fixing the nf_tables log, reject and ct expressions from
        the new inet table, from Patrick McHardy.
      
      * Fix memory overrun in the map that is used to dynamically allocate names
        from anonymous sets, also from Patrick.
      
      * Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
        name, from Patrick McHardy.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f41f0319
  2. 08 2月, 2014 2 次提交
  3. 07 2月, 2014 21 次提交
  4. 06 2月, 2014 12 次提交
    • P
      netfilter: nf_tables: fix racy rule deletion · 0165d932
      Pablo Neira Ayuso 提交于
      We may lost race if we flush the rule-set (which happens asynchronously
      via call_rcu) and we try to remove the table (that userspace assumes
      to be empty).
      
      Fix this by recovering synchronous rule and chain deletion. This was
      introduced time ago before we had no batch support, and synchronous
      rule deletion performance was not good. Now that we have the batch
      support, we can just postpone the purge of old rule in a second step
      in the commit phase. All object deletions are synchronous after this
      patch.
      
      As a side effect, we save memory as we don't need rcu_head per rule
      anymore.
      
      Cc: Patrick McHardy <kaber@trash.net>
      Reported-by: NArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0165d932
    • P
      netfilter: nf_tables: fix log/queue expressions for NFPROTO_INET · b8ecbee6
      Patrick McHardy 提交于
      The log and queue expressions both store the family during ->init() and
      use it to deliver packets. This is wrong when used in NFPROTO_INET since
      they should both deliver to the actual AF of the packet, not the dummy
      NFPROTO_INET.
      
      Use the family from the hook ops to fix this.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b8ecbee6
    • P
      netfilter: nf_tables: add reject module for NFPROTO_INET · 05513e9e
      Patrick McHardy 提交于
      Add a reject module for NFPROTO_INET. It does nothing but dispatch
      to the AF-specific modules based on the hook family.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      05513e9e
    • P
      netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc parts · cc4723ca
      Patrick McHardy 提交于
      Currently the nft_reject module depends on symbols from ipv6. This is
      wrong since no generic module should force IPv6 support to be loaded.
      Split up the module into AF-specific and a generic part.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cc4723ca
    • D
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · 028b86b7
      David S. Miller 提交于
      Jesse Gross says:
      
      ====================
      Open vSwitch
      
      A handful of bug fixes for net/3.14. High level fixes are:
       * Regressions introduced by the zerocopy changes, particularly with
         old userspaces.
       * A few bugs lingering from the introduction of megaflows.
       * Overly zealous error checking that is now being triggered frequently
         in common cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      028b86b7
    • Z
      xen-netback: Fix Rx stall due to race condition · 9ab9831b
      Zoltan Kiss 提交于
      The recent patch to fix receive side flow control
      (11b57f90: xen-netback: stop vif thread
      spinning if frontend is unresponsive) solved the spinning thread problem,
      however caused an another one. The receive side can stall, if:
      - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
      - [INTERRUPT] interrupt happens, and sets rx_event to true
      - [THREAD] then xenvif_kthread sets rx_event to false
      - [THREAD] rx_work_todo doesn't return true anymore
      
      Also, if interrupt sent but there is still no room in the ring, it take quite a
      long time until xenvif_rx_action realize it. This patch ditch that two variable,
      and rework rx_work_todo. If the thread finds it can't fit more skb's into the
      ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
      kept as 0. Then rx_work_todo will check if:
      - there is something to send to the ring (like before)
      - there is space for the topmost packet in the queue
      
      I think that's more natural and optimal thing to test than two bool which are
      set somewhere else.
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab9831b
    • P
      netfilter: nf_tables: add AF specific expression support · 64d46806
      Patrick McHardy 提交于
      For the reject module, we need to add AF-specific implementations to
      get rid of incorrect module dependencies. Try to load an AF-specific
      module first and fall back to generic modules.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      64d46806
    • P
      netfilter: nft_ct: fix missing NFT_CT_L3PROTOCOL key in validity checks · 51292c07
      Patrick McHardy 提交于
      The key was missing in the list of valid keys, add it.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      51292c07
    • P
      netfilter: nf_tables: fix potential oops when dumping sets · ec2c9935
      Patrick McHardy 提交于
      Commit c9c8e485 (netfilter: nf_tables: dump sets in all existing families)
      changed nft_ctx_init_from_setattr() to only look up the address family if it
      is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute
      is given, nftables_afinfo_lookup() will dereference the NULL afi pointer.
      
      Fix by checking for non-NULL afi and also move a check added by that commit
      to the proper position.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ec2c9935
    • P
      netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name() · 53b70287
      Patrick McHardy 提交于
      The map that is used to allocate anonymous sets is indeed
      BITS_PER_BYTE * PAGE_SIZE long.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      53b70287
    • P
      netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt · e53376be
      Pablo Neira Ayuso 提交于
      With this patch, the conntrack refcount is initially set to zero and
      it is bumped once it is added to any of the list, so we fulfill
      Eric's golden rule which is that all released objects always have a
      refcount that equals zero.
      
      Andrey Vagin reports that nf_conntrack_free can't be called for a
      conntrack with non-zero ref-counter, because it can race with
      nf_conntrack_find_get().
      
      A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
      ref-counter says that this conntrack is used. So when we release
      a conntrack with non-zero counter, we break this assumption.
      
      CPU1                                    CPU2
      ____nf_conntrack_find()
                                              nf_ct_put()
                                               destroy_conntrack()
                                              ...
                                              init_conntrack
                                               __nf_conntrack_alloc (set use = 1)
      atomic_inc_not_zero(&ct->use) (use = 2)
                                               if (!l4proto->new(ct, skb, dataoff, timeouts))
                                                nf_conntrack_free(ct); (use = 2 !!!)
                                              ...
                                              __nf_conntrack_alloc (set use = 1)
       if (!nf_ct_key_equal(h, tuple, zone))
        nf_ct_put(ct); (use = 0)
         destroy_conntrack()
                                              /* continue to work with CT */
      
      After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
      race in nf_conntrack_find_get" another bug was triggered in
      destroy_conntrack():
      
      <4>[67096.759334] ------------[ cut here ]------------
      <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
      ...
      <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
      <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
      <4>[67096.760255] Call Trace:
      <4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
      <4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
      <4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
      <4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
      <4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
      <4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
      <4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
      <4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
      <4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
      <4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
      <4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
      <4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
      <4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
      <4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      
      I have reused the original title for the RFC patch that Andrey posted and
      most of the original patch description.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Vagin <avagin@parallels.com>
      Cc: Florian Westphal <fw@strlen.de>
      Reported-by: NAndrew Vagin <avagin@parallels.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NAndrew Vagin <avagin@parallels.com>
      e53376be
    • A
      netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() · 829d9315
      Alexey Dobriyan 提交于
      Similar bug fixed in SIP module in 3f509c68 ("netfilter: nf_nat_sip: fix
      incorrect handling of EBUSY for RTCP expectation").
      
      BUG: unable to handle kernel paging request at 00100104
      IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
      ...
      Call Trace:
        [<c0244bd8>] ? del_timer+0x48/0x70
        [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack]
        [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack]
        [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack]
        [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
        [<c024442d>] call_timer_fn+0x1d/0x80
        [<c024461e>] run_timer_softirq+0x18e/0x1a0
        [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
        [<c023e6f3>] __do_softirq+0xa3/0x170
        [<c023e650>] ? __local_bh_enable+0x70/0x70
        <IRQ>
        [<c023e587>] ? irq_exit+0x67/0xa0
        [<c0202af6>] ? do_IRQ+0x46/0xb0
        [<c027ad05>] ? clockevents_notify+0x35/0x110
        [<c066ac6c>] ? common_interrupt+0x2c/0x40
        [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0
        [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100
        [<c02085f8>] ? arch_cpu_idle+0x8/0x30
        [<c027314b>] ? cpu_idle_loop+0x4b/0x140
        [<c0273258>] ? cpu_startup_entry+0x18/0x20
        [<c066056d>] ? rest_init+0x5d/0x70
        [<c0813ac8>] ? start_kernel+0x2ec/0x2f2
        [<c081364f>] ? repair_env_string+0x5b/0x5b
        [<c0813269>] ? i386_start_kernel+0x33/0x35
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      829d9315
  5. 05 2月, 2014 4 次提交
    • A
      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get · c6825c09
      Andrey Vagin 提交于
      Lets look at destroy_conntrack:
      
      hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
      ...
      nf_conntrack_free(ct)
      	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
      
      net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
      
      The hash is protected by rcu, so readers look up conntracks without
      locks.
      A conntrack is removed from the hash, but in this moment a few readers
      still can use the conntrack. Then this conntrack is released and another
      thread creates conntrack with the same address and the equal tuple.
      After this a reader starts to validate the conntrack:
      * It's not dying, because a new conntrack was created
      * nf_ct_tuple_equal() returns true.
      
      But this conntrack is not initialized yet, so it can not be used by two
      threads concurrently. In this case BUG_ON may be triggered from
      nf_nat_setup_info().
      
      Florian Westphal suggested to check the confirm bit too. I think it's
      right.
      
      task 1			task 2			task 3
      			nf_conntrack_find_get
      			 ____nf_conntrack_find
      destroy_conntrack
       hlist_nulls_del_rcu
       nf_conntrack_free
       kmem_cache_free
      						__nf_conntrack_alloc
      						 kmem_cache_alloc
      						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
      			 if (nf_ct_is_dying(ct))
      			 if (!nf_ct_tuple_equal()
      
      I'm not sure, that I have ever seen this race condition in a real life.
      Currently we are investigating a bug, which is reproduced on a few nodes.
      In our case one conntrack is initialized from a few tasks concurrently,
      we don't have any other explanation for this.
      
      <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
      ...
      <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
      ...
      <4>[46267.085549] Call Trace:
      <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
      <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
      <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
      <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
      <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
      <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
      <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
      <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
      <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
      <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
      <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
      <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
      <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
      <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
      <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
      <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
      <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
      <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
      <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
      <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
      <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
      <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
      <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
      <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
      <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c6825c09
    • P
      netfilter: nf_tables: fix oops when deleting a chain with references · 3dd7279f
      Patrick McHardy 提交于
      The following commands trigger an oops:
      
       # nft -i
       nft> add table filter
       nft> add chain filter input { type filter hook input priority 0; }
       nft> add chain filter test
       nft> add rule filter input jump test
       nft> delete chain filter test
      
      We need to check the chain use counter before allowing destruction since
      we might have references from sets or jump rules.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341Reported-by: NMatthew Ife <deleriux1@gmail.com>
      Tested-by: NMatthew Ife <deleriux1@gmail.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3dd7279f
    • A
      netfilter: nft_ct: fix unconditional dump of 'dir' attr · 2a53bfb3
      Arturo Borrero 提交于
      We want to make sure that the information that we get from the kernel can
      be reinjected without troubles. The kernel shouldn't return an attribute
      that is not required, or even prohibited.
      
      Dumping unconditionally NFTA_CT_DIRECTION could lead an application in
      userspace to interpret that the attribute was originally set, while it
      was not.
      Signed-off-by: NArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2a53bfb3
    • A
      openvswitch: Suppress error messages on megaflow updates · c14e0953
      Andy Zhou 提交于
      With subfacets, we'd expect megaflow updates message to carry
      the original micro flow. If not, EINVAL is returned and kernel
      logs an error message.  Now that the user space subfacet layer is
      removed, it is expected that flow updates can arrive with a
      micro flow other than the original. Change the return code to
      EEXIST and remove the kernel error log message.
      Reported-by: NBen Pfaff <blp@nicira.com>
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      c14e0953