1. 15 11月, 2013 1 次提交
    • W
      sit: fix use after free of fb_tunnel_dev · 9434266f
      Willem de Bruijn 提交于
      Bug: The fallback device is created in sit_init_net and assumed to be
      freed in sit_exit_net. First, it is dereferenced in that function, in
      sit_destroy_tunnels:
      
              struct net *net = dev_net(sitn->fb_tunnel_dev);
      
      Prior to this, rtnl_unlink_register has removed all devices that match
      rtnl_link_ops == sit_link_ops.
      
      Commit 205983c4 added the line
      
      +       sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
      
      which cases the fallback device to match here and be freed before it
      is last dereferenced.
      
      Fix: This commit adds an explicit .delllink callback to sit_link_ops
      that skips deallocation at rtnl_unlink_register for the fallback
      device. This mechanism is comparable to the one in ip_tunnel.
      
      It also modifies sit_destroy_tunnels and its only caller sit_exit_net
      to avoid the offending dereference in the first place. That double
      lookup is more complicated than required.
      
      Test: The bug is only triggered when CONFIG_NET_NS is enabled. It
      causes a GPF only when CONFIG_DEBUG_SLAB is enabled. Verified that
      this bug exists at the mentioned commit, at davem-net HEAD and at
      3.11.y HEAD. Verified that it went away after applying this patch.
      
      Fixes: 205983c4 ("sit: allow to use rtnl ops on fb tunnel")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9434266f
  2. 11 11月, 2013 3 次提交
  3. 09 11月, 2013 4 次提交
  4. 06 11月, 2013 3 次提交
  5. 01 11月, 2013 1 次提交
  6. 31 10月, 2013 1 次提交
  7. 29 10月, 2013 3 次提交
  8. 28 10月, 2013 1 次提交
    • S
      xfrm: Increase the garbage collector threshold · eeb1b733
      Steffen Klassert 提交于
      With the removal of the routing cache, we lost the
      option to tweak the garbage collector threshold
      along with the maximum routing cache size. So git
      commit 703fb94e ("xfrm: Fix the gc threshold value
      for ipv4") moved back to a static threshold.
      
      It turned out that the current threshold before we
      start garbage collecting is much to small for some
      workloads, so increase it from 1024 to 32768. This
      means that we start the garbage collector if we have
      more than 32768 dst entries in the system and refuse
      new allocations if we are above 65536.
      Reported-by: NWolfgang Walter <linux@stwm.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      eeb1b733
  9. 26 10月, 2013 1 次提交
    • H
      ipv6: ip6_dst_check needs to check for expired dst_entries · e3bc10bd
      Hannes Frederic Sowa 提交于
      On receiving a packet too big icmp error we check if our current cached
      dst_entry in the socket is still valid. This validation check did not
      care about the expiration of the (cached) route.
      
      The error path I traced down:
      The socket receives a packet too big mtu notification. It still has a
      valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
      setting RTF_EXPIRE and updates the dst.expiration value (which could
      fail because of not up-to-date expiration values, see previous patch).
      
      In some seldom cases we race with a) the ip6_fib gc or b) another routing
      lookup which would result in a recreation of the cached rt6_info from its
      parent non-cached rt6_info. While copying the rt6_info we reinitialize the
      metrics store by copying it over from the parent thus invalidating the
      just installed pmtu update (both dsts use the same key to the inetpeer
      storage). The dst_entry with the just invalidated metrics data would
      just get its RTF_EXPIRES flag cleared and would continue to stay valid
      for the socket.
      
      We should have not issued the pmtu update on the already expired dst_entry
      in the first placed. By checking the expiration on the dst entry and
      doing a relookup in case it is out of date we close the race because
      we would install a new rt6_info into the fib before we issue the pmtu
      update, thus closing this race.
      
      Not reliably updating the dst.expire value was fixed by the patch "ipv6:
      reset dst.expires value when clearing expire flag".
      Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
      Reported-by: NValentijn Sessink <valentyn@blub.net>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NValentijn Sessink <valentyn@blub.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3bc10bd
  10. 24 10月, 2013 1 次提交
  11. 23 10月, 2013 1 次提交
    • S
      netfilter: ip6t_REJECT: skip checksum verification for outgoing ipv6 packets · f2020b27
      Stanislav Fomichev 提交于
      Don't verify checksum for outgoing packets because checksum calculation
      may be done by the device.
      
      Without this patch:
      $ ip6tables -I OUTPUT -p tcp --dport 80 -j REJECT --reject-with tcp-reset
      $ time telnet ipv6.google.com 80
      Trying 2a00:1450:4010:c03::67...
      telnet: Unable to connect to remote host: Connection timed out
      
      real    0m7.201s
      user    0m0.000s
      sys     0m0.000s
      
      With the patch applied:
      $ ip6tables -I OUTPUT -p tcp --dport 80 -j REJECT --reject-with tcp-reset
      $ time telnet ipv6.google.com 80
      Trying 2a00:1450:4010:c03::67...
      telnet: Unable to connect to remote host: Connection refused
      
      real    0m0.085s
      user    0m0.000s
      sys     0m0.000s
      Signed-off-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f2020b27
  12. 22 10月, 2013 6 次提交
    • W
      netfilter: x_tables: fix ordering of jumpstack allocation and table update · b416c144
      Will Deacon 提交于
      During kernel stability testing on an SMP ARMv7 system, Yalin Wang
      reported the following panic from the netfilter code:
      
        1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
        [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
        [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
        [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
        [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
        [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
        [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
        [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
        [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
        [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
        [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
        [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
        [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
        [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
        Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
        ---[ end trace da227214a82491bd ]---
        Kernel panic - not syncing: Fatal exception in interrupt
      
      This comes about because CPU1 is executing xt_replace_table in response
      to a setsockopt syscall, resulting in:
      
      	ret = xt_jumpstack_alloc(newinfo);
      		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);
      
      	[...]
      
      	table->private = newinfo;
      	newinfo->initial_entries = private->initial_entries;
      
      Meanwhile, CPU0 is handling the network receive path and ends up in
      ipt_do_table, resulting in:
      
      	private = table->private;
      
      	[...]
      
      	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
      
      On weakly ordered memory architectures, the writes to table->private
      and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
      Furthermore, on architectures which don't respect ordering of address
      dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.
      
      This patch adds an smp_wmb() before the assignment to table->private
      (which is essentially publishing newinfo) to ensure that all writes to
      newinfo will be observed before plugging it into the table structure.
      A dependent-read barrier is also added on the consumer sides, to ensure
      the same ordering requirements are also respected there.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Tested-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b416c144
    • H
      ipv6: probe routes asynchronous in rt6_probe · c2f17e82
      Hannes Frederic Sowa 提交于
      Routes need to be probed asynchronous otherwise the call stack gets
      exhausted when the kernel attemps to deliver another skb inline, like
      e.g. xt_TEE does, and we probe at the same time.
      
      We update neigh->updated still at once, otherwise we would send to
      many probes.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2f17e82
    • E
      ipv6: sit: add GSO/TSO support · 61c1db7f
      Eric Dumazet 提交于
      Now ipv6_gso_segment() is stackable, its relatively easy to
      implement GSO/TSO support for SIT tunnels
      
      Performance results, when segmentation is done after tunnel
      device (as no NIC is yet enabled for TSO SIT support) :
      
      Before patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      3168.31   4.81     4.64     2.988   2.877
      
      After patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      5525.00   7.76     5.17     2.763   1.840
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c1db7f
    • E
      ipv6: gso: make ipv6_gso_segment() stackable · d3e5e006
      Eric Dumazet 提交于
      In order to support GSO on SIT tunnels, we need to make
      inet_gso_segment() stackable.
      
      It should not assume network header starts right after mac
      header.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e5e006
    • E
      tcp_memcontrol: Remove the per netns control. · a4fe34bf
      Eric W. Biederman 提交于
      The code that is implemented is per memory cgroup not per netns, and
      having per netns bits is just confusing.  Remove the per netns bits to
      make it easier to see what is really going on.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4fe34bf
    • J
      ipv6: fill rt6i_gateway with nexthop address · 550bab42
      Julian Anastasov 提交于
      Make sure rt6i_gateway contains nexthop information in
      all routes returned from lookup or when routes are directly
      attached to skb for generated ICMP packets.
      
      The effect of this patch should be a faster version of
      rt6_nexthop() and the consideration of local addresses as
      nexthop.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      550bab42
  13. 20 10月, 2013 8 次提交
  14. 19 10月, 2013 1 次提交
  15. 15 10月, 2013 3 次提交
    • P
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso 提交于
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
    • T
      netfilter: nf_tables: Add support for IPv6 NAT · eb31628e
      Tomasz Bursztyka 提交于
      This patch generalizes the NAT expression to support both IPv4 and IPv6
      using the existing IPv4/IPv6 NAT infrastructure. This also adds the
      NAT chain type for IPv6.
      
      This patch collapses the following patches that were posted to the
      netfilter-devel mailing list, from Tomasz:
      
      * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
      * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
      * nf_tables: Add support for IPv6 NAT expression
      * nf_tables: Add support for IPv6 NAT chain
      * nf_tables: Fix up build issue on IPv6 NAT support
      
      And, from Pablo Neira Ayuso:
      
      * fix missing dependencies in nft_chain_nat
      Signed-off-by: NTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb31628e
    • P
      netfilter: nf_tables: add compatibility layer for x_tables · 0ca743a5
      Pablo Neira Ayuso 提交于
      This patch adds the x_tables compatibility layer. This allows you
      to use existing x_tables matches and targets from nf_tables.
      
      This compatibility later allows us to use existing matches/targets
      for features that are still missing in nf_tables. We can progressively
      replace them with native nf_tables extensions. It also provides the
      userspace compatibility software that allows you to express the
      rule-set using the iptables syntax but using the nf_tables kernel
      components.
      
      In order to get this compatibility layer working, I've done the
      following things:
      
      * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
      to query the x_tables match/target revision, so we don't need to
      use the native x_table getsockopt interface.
      
      * emulate xt structures: this required extending the struct nft_pktinfo
      to include the fragment offset, which is already obtained from
      ip[6]_tables and that is used by some matches/targets.
      
      * add support for default policy to base chains, required to emulate
        x_tables.
      
      * add NFTA_CHAIN_USE attribute to obtain the number of references to
        chains, required by x_tables emulation.
      
      * add chain packet/byte counters using per-cpu.
      
      * support 32-64 bits compat.
      
      For historical reasons, this patch includes the following patches
      that were posted in the netfilter-devel mailing list.
      
      From Pablo Neira Ayuso:
      * nf_tables: add default policy to base chains
      * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
      * nf_tables: nft_compat: private data of target and matches in contiguous area
      * nf_tables: validate hooks for compat match/target
      * nf_tables: nft_compat: release cached matches/targets
      * nf_tables: x_tables support as a compile time option
      * nf_tables: fix alias for xtables over nftables module
      * nf_tables: add packet and byte counters per chain
      * nf_tables: fix per-chain counter stats if no counters are passed
      * nf_tables: don't bump chain stats
      * nf_tables: add protocol and flags for xtables over nf_tables
      * nf_tables: add ip[6]t_entry emulation
      * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
      * nf_tables: support 32bits-64bits x_tables compat
      * nf_tables: fix compilation if CONFIG_COMPAT is disabled
      
      From Patrick McHardy:
      * nf_tables: move policy to struct nft_base_chain
      * nf_tables: send notifications for base chain policy changes
      
      From Alexander Primak:
      * nf_tables: remove the duplicate NF_INET_LOCAL_OUT
      
      From Nicolas Dichtel:
      * nf_tables: fix compilation when nf-netlink is a module
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0ca743a5
  16. 14 10月, 2013 2 次提交
    • P
      netfilter: nf_tables: convert built-in tables/chains to chain types · 9370761c
      Pablo Neira Ayuso 提交于
      This patch converts built-in tables/chains to chain types that
      allows you to deploy customized table and chain configurations from
      userspace.
      
      After this patch, you have to specify the chain type when
      creating a new chain:
      
       add chain ip filter output { type filter hook input priority 0; }
                                    ^^^^ ------
      
      The existing chain types after this patch are: filter, route and
      nat. Note that tables are just containers of chains with no specific
      semantics, which is a significant change with regards to iptables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9370761c
    • P
      netfilter: add nftables · 96518518
      Patrick McHardy 提交于
      This patch adds nftables which is the intended successor of iptables.
      This packet filtering framework reuses the existing netfilter hooks,
      the connection tracking system, the NAT subsystem, the transparent
      proxying engine, the logging infrastructure and the userspace packet
      queueing facilities.
      
      In a nutshell, nftables provides a pseudo-state machine with 4 general
      purpose registers of 128 bits and 1 specific purpose register to store
      verdicts. This pseudo-machine comes with an extensible instruction set,
      a.k.a. "expressions" in the nftables jargon. The expressions included
      in this patch provide the basic functionality, they are:
      
      * bitwise: to perform bitwise operations.
      * byteorder: to change from host/network endianess.
      * cmp: to compare data with the content of the registers.
      * counter: to enable counters on rules.
      * ct: to store conntrack keys into register.
      * exthdr: to match IPv6 extension headers.
      * immediate: to load data into registers.
      * limit: to limit matching based on packet rate.
      * log: to log packets.
      * meta: to match metainformation that usually comes with the skbuff.
      * nat: to perform Network Address Translation.
      * payload: to fetch data from the packet payload and store it into
        registers.
      * reject (IPv4 only): to explicitly close connection, eg. TCP RST.
      
      Using this instruction-set, the userspace utility 'nft' can transform
      the rules expressed in human-readable text representation (using a
      new syntax, inspired by tcpdump) to nftables bytecode.
      
      nftables also inherits the table, chain and rule objects from
      iptables, but in a more configurable way, and it also includes the
      original datatype-agnostic set infrastructure with mapping support.
      This set infrastructure is enhanced in the follow up patch (netfilter:
      nf_tables: add netlink set API).
      
      This patch includes the following components:
      
      * the netlink API: net/netfilter/nf_tables_api.c and
        include/uapi/netfilter/nf_tables.h
      * the packet filter core: net/netfilter/nf_tables_core.c
      * the expressions (described above): net/netfilter/nft_*.c
      * the filter tables: arp, IPv4, IPv6 and bridge:
        net/ipv4/netfilter/nf_tables_ipv4.c
        net/ipv6/netfilter/nf_tables_ipv6.c
        net/ipv4/netfilter/nf_tables_arp.c
        net/bridge/netfilter/nf_tables_bridge.c
      * the NAT table (IPv4 only):
        net/ipv4/netfilter/nf_table_nat_ipv4.c
      * the route table (similar to mangle):
        net/ipv4/netfilter/nf_table_route_ipv4.c
        net/ipv6/netfilter/nf_table_route_ipv6.c
      * internal definitions under:
        include/net/netfilter/nf_tables.h
        include/net/netfilter/nf_tables_core.h
      * It also includes an skeleton expression:
        net/netfilter/nft_expr_template.c
        and the preliminary implementation of the meta target
        net/netfilter/nft_meta_target.c
      
      It also includes a change in struct nf_hook_ops to add a new
      pointer to store private data to the hook, that is used to store
      the rule list per chain.
      
      This patch is based on the patch from Patrick McHardy, plus merged
      accumulated cleanups, fixes and small enhancements to the nftables
      code that has been done since 2009, which are:
      
      From Patrick McHardy:
      * nf_tables: adjust netlink handler function signatures
      * nf_tables: only retry table lookup after successful table module load
      * nf_tables: fix event notification echo and avoid unnecessary messages
      * nft_ct: add l3proto support
      * nf_tables: pass expression context to nft_validate_data_load()
      * nf_tables: remove redundant definition
      * nft_ct: fix maxattr initialization
      * nf_tables: fix invalid event type in nf_tables_getrule()
      * nf_tables: simplify nft_data_init() usage
      * nf_tables: build in more core modules
      * nf_tables: fix double lookup expression unregistation
      * nf_tables: move expression initialization to nf_tables_core.c
      * nf_tables: build in payload module
      * nf_tables: use NFPROTO constants
      * nf_tables: rename pid variables to portid
      * nf_tables: save 48 bits per rule
      * nf_tables: introduce chain rename
      * nf_tables: check for duplicate names on chain rename
      * nf_tables: remove ability to specify handles for new rules
      * nf_tables: return error for rule change request
      * nf_tables: return error for NLM_F_REPLACE without rule handle
      * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
      * nf_tables: fix NLM_F_MULTI usage in netlink notifications
      * nf_tables: include NLM_F_APPEND in rule dumps
      
      From Pablo Neira Ayuso:
      * nf_tables: fix stack overflow in nf_tables_newrule
      * nf_tables: nft_ct: fix compilation warning
      * nf_tables: nft_ct: fix crash with invalid packets
      * nft_log: group and qthreshold are 2^16
      * nf_tables: nft_meta: fix socket uid,gid handling
      * nft_counter: allow to restore counters
      * nf_tables: fix module autoload
      * nf_tables: allow to remove all rules placed in one chain
      * nf_tables: use 64-bits rule handle instead of 16-bits
      * nf_tables: fix chain after rule deletion
      * nf_tables: improve deletion performance
      * nf_tables: add missing code in route chain type
      * nf_tables: rise maximum number of expressions from 12 to 128
      * nf_tables: don't delete table if in use
      * nf_tables: fix basechain release
      
      From Tomasz Bursztyka:
      * nf_tables: Add support for changing users chain's name
      * nf_tables: Change chain's name to be fixed sized
      * nf_tables: Add support for replacing a rule by another one
      * nf_tables: Update uapi nftables netlink header documentation
      
      From Florian Westphal:
      * nft_log: group is u16, snaplen u32
      
      From Phil Oester:
      * nf_tables: operational limit match
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      96518518
新手
引导
客服 返回
顶部