1. 29 12月, 2013 1 次提交
    • E
      netfilter: select NFNETLINK when enabling NF_TABLES · 5f291c28
      Eric Leblond 提交于
      In Kconfig, nf_tables depends on NFNETLINK so building nf_tables as
      a module or inside kernel depends on the state of NFNETLINK inside
      the kernel config. If someone wants to build nf_tables inside the
      kernel, it is necessary to also build NFNETLINK inside the kernel.
      But NFNETLINK can not be set in the menu so it is necessary to
      toggle other nfnetlink subsystems such as logging and nfacct to see
      the nf_tables switch.
      
      This patch changes the dependency from 'depend' to 'select' inside
      Kconfig to allow to set the build of nftables as modules or inside
      kernel independently.
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5f291c28
  2. 28 12月, 2013 2 次提交
  3. 17 12月, 2013 1 次提交
  4. 08 12月, 2013 3 次提交
  5. 05 11月, 2013 1 次提交
  6. 04 11月, 2013 2 次提交
  7. 30 10月, 2013 1 次提交
    • D
      net: ipvs: sctp: do not recalc sctp csum when ports didn't change · 97203abe
      Daniel Borkmann 提交于
      Unlike UDP or TCP, we do not take the pseudo-header into
      account in SCTP checksums. So in case port mapping is the
      very same, we do not need to recalculate the whole SCTP
      checksum in software, which is very expensive.
      
      Also, similarly as in TCP, take into account when a private
      helper mangled the packet. In that case, we also need to
      recalculate the checksum even if ports might be same.
      
      Thanks for feedback regarding skb->ip_summed checks from
      Julian Anastasov; here's a discussion on these checks for
      snat and dnat:
      
      * For snat_handler(), we can see CHECKSUM_PARTIAL from
        virtual devices, and from LOCAL_OUT, otherwise it
        should be CHECKSUM_UNNECESSARY. In general, in snat it
        is more complex. skb contains the original route and
        ip_vs_route_me_harder() can change the route after
        snat_handler. So, for locally generated replies from
        local server we can not preserve the CHECKSUM_PARTIAL
        mode. It is an chicken or egg dilemma: snat_handler
        needs the device after rerouting (to check for
        NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
        the snat_handler() to put the new saddr for proper
        rerouting.
      
      * For dnat_handler(), we should not see CHECKSUM_COMPLETE
        for SCTP, in fact the small set of drivers that support
        SCTP offloading return CHECKSUM_UNNECESSARY on correctly
        received SCTP csum. We can see CHECKSUM_PARTIAL from
        local stack or received from virtual drivers. The idea is
        that SCTP decides to avoid csum calculation if hardware
        supports offloading. IPVS can change the device after
        rerouting to real server but we can preserve the
        CHECKSUM_PARTIAL mode if the new device supports
        offloading too. This works because skb dst is changed
        before dnat_handler and we see the new device. So, checks
        in the 'if' part will decide whether it is ok to keep
        CHECKSUM_PARTIAL for the output. If the packet was with
        CHECKSUM_NONE, hence we deal with unknown checksum. As we
        recalculate the sum for IP header in all cases, it should
        be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
        checksum in this case (without cp->app). In case of
        CHECKSUM_UNNECESSARY, the csum was valid on receive.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      97203abe
  8. 29 10月, 2013 2 次提交
    • H
      netfilter: xt_NFQUEUE: fix --queue-bypass regression · d9547773
      Holger Eitzenberger 提交于
      V3 of the NFQUEUE target ignores the --queue-bypass flag,
      causing packets to be dropped when the userspace listener
      isn't running.
      
      Regression is in since 8746ddcf ("netfilter: xt_NFQUEUE:
      introduce CPU fanout").
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d9547773
    • T
      netfilter: nft_nat: Fix endianness issue reported by sparse · 98c37b6b
      Tomasz Bursztyka 提交于
      This patch fixes this:
      
      CHECK   net/netfilter/nft_nat.c
      net/netfilter/nft_nat.c:50:43: warning: incorrect type in assignment (different base types)
      net/netfilter/nft_nat.c:50:43:    expected restricted __be32 [addressable] [usertype] ip
      net/netfilter/nft_nat.c:50:43:    got unsigned int [unsigned] [usertype] <noident>
      net/netfilter/nft_nat.c:51:43: warning: incorrect type in assignment (different base types)
      net/netfilter/nft_nat.c:51:43:    expected restricted __be32 [addressable] [usertype] ip
      net/netfilter/nft_nat.c:51:43:    got unsigned int [unsigned] [usertype] <noident>
      net/netfilter/nft_nat.c:65:37: warning: incorrect type in assignment (different base types)
      net/netfilter/nft_nat.c:65:37:    expected restricted __be16 [addressable] [assigned] [usertype] all
      net/netfilter/nft_nat.c:65:37:    got unsigned int [unsigned] <noident>
      net/netfilter/nft_nat.c:66:37: warning: incorrect type in assignment (different base types)
      net/netfilter/nft_nat.c:66:37:    expected restricted __be16 [addressable] [assigned] [usertype] all
      net/netfilter/nft_nat.c:66:37:    got unsigned int [unsigned] <noident>
      Signed-off-by: NTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      98c37b6b
  9. 28 10月, 2013 3 次提交
  10. 22 10月, 2013 4 次提交
    • J
      netfilter: ipset: The unnamed union initialization may lead to compilation error · 1a869205
      Jozsef Kadlecsik 提交于
      The unnamed union should be possible to be initialized directly, but
      unfortunately it's not so:
      
      /usr/src/ipset/kernel/net/netfilter/ipset/ip_set_hash_netnet.c: In
      function ?hash_netnet4_kadt?:
      /usr/src/ipset/kernel/net/netfilter/ipset/ip_set_hash_netnet.c:141:
      error: unknown field ?cidr? specified in initializer
      Reported-by: NHusnu Demir <hdemir@metu.edu.tr>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1a869205
    • J
      netfilter: ipset: Use netlink callback dump args only · 93302880
      Jozsef Kadlecsik 提交于
      Instead of cb->data, use callback dump args only and introduce symbolic
      names instead of plain numbers at accessing the argument members.
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      93302880
    • W
      netfilter: x_tables: fix ordering of jumpstack allocation and table update · b416c144
      Will Deacon 提交于
      During kernel stability testing on an SMP ARMv7 system, Yalin Wang
      reported the following panic from the netfilter code:
      
        1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
        [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
        [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
        [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
        [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
        [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
        [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
        [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
        [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
        [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
        [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
        [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
        [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
        [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
        Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
        ---[ end trace da227214a82491bd ]---
        Kernel panic - not syncing: Fatal exception in interrupt
      
      This comes about because CPU1 is executing xt_replace_table in response
      to a setsockopt syscall, resulting in:
      
      	ret = xt_jumpstack_alloc(newinfo);
      		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);
      
      	[...]
      
      	table->private = newinfo;
      	newinfo->initial_entries = private->initial_entries;
      
      Meanwhile, CPU0 is handling the network receive path and ends up in
      ipt_do_table, resulting in:
      
      	private = table->private;
      
      	[...]
      
      	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
      
      On weakly ordered memory architectures, the writes to table->private
      and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
      Furthermore, on architectures which don't respect ordering of address
      dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.
      
      This patch adds an smp_wmb() before the assignment to table->private
      (which is essentially publishing newinfo) to ensure that all writes to
      newinfo will be observed before plugging it into the table structure.
      A dependent-read barrier is also added on the consumer sides, to ensure
      the same ordering requirements are also respected there.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Tested-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b416c144
    • J
      netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper · 56e42441
      Julian Anastasov 提交于
      Now when rt6_nexthop() can return nexthop address we can use it
      for proper nexthop comparison of directly connected destinations.
      For more information refer to commit bbb5823c
      ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56e42441
  11. 20 10月, 2013 1 次提交
  12. 17 10月, 2013 1 次提交
  13. 15 10月, 2013 10 次提交
    • A
      ipvs: improved SH fallback strategy · 1255ce5f
      Alexander Frolkin 提交于
      Improve the SH fallback realserver selection strategy.
      
      With sh and sh-fallback, if a realserver is down, this attempts to
      distribute the traffic that would have gone to that server evenly
      among the remaining servers.
      Signed-off-by: NAlexander Frolkin <avf@eldamar.org.uk>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1255ce5f
    • J
      ipvs: avoid rcu_barrier during netns cleanup · 9e4e948a
      Julian Anastasov 提交于
      commit 578bc3ef ("ipvs: reorganize dest trash") added
      rcu_barrier() on cleanup to wait dest users and schedulers
      like LBLC and LBLCR to put their last dest reference.
      Using rcu_barrier with many namespaces is problematic.
      
      Trying to fix it by freeing dest with kfree_rcu is not
      a solution, RCU callbacks can run in parallel and execution
      order is random.
      
      Fix it by creating new function ip_vs_dest_put_and_free()
      which is heavier than ip_vs_dest_put(). We will use it just
      for schedulers like LBLC, LBLCR that can delay their dest
      release.
      
      By default, dests reference is above 0 if they are present in
      service and it is 0 when deleted but still in trash list.
      Change the dest trash code to use ip_vs_dest_put_and_free(),
      so that refcnt -1 can be used for freeing. As result,
      such checks remain in slow path and the rcu_barrier() from
      netns cleanup can be removed.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      9e4e948a
    • P
      netfilter: nf_tables: add trace support · b5bc89bf
      Pablo Neira Ayuso 提交于
      This patch adds support for tracing the packet travel through
      the ruleset, in a similar fashion to x_tables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b5bc89bf
    • P
      netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123
      Pablo Neira Ayuso 提交于
      This patch adds a batch support to nfnetlink. Basically, it adds
      two new control messages:
      
      * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
        the nfgenmsg->res_id indicates the nfnetlink subsystem ID.
      
      * NFNL_MSG_BATCH_END, that results in the invocation of the
        ss->commit callback function. If not specified or an error
        ocurred in the batch, the ss->abort function is invoked
        instead.
      
      The end message represents the commit operation in nftables, the
      lack of end message results in an abort. This patch also adds the
      .call_batch function that is only called from the batch receival
      path.
      
      This patch adds atomic rule updates and dumps based on
      bitmask generations. This allows to atomically commit a set of
      rule-set updates incrementally without altering the internal
      state of existing nf_tables expressions/matches/targets.
      
      The idea consists of using a generation cursor of 1 bit and
      a bitmask of 2 bits per rule. Assuming the gencursor is 0,
      then the genmask (expressed as a bitmask) can be interpreted
      as:
      
      00 active in the present, will be active in the next generation.
      01 inactive in the present, will be active in the next generation.
      10 active in the present, will be deleted in the next generation.
       ^
       gencursor
      
      Once you invoke the transition to the next generation, the global
      gencursor is updated:
      
      00 active in the present, will be active in the next generation.
      01 active in the present, needs to zero its future, it becomes 00.
      10 inactive in the present, delete now.
      ^
      gencursor
      
      If a dump is in progress and nf_tables enters a new generation,
      the dump will stop and return -EBUSY to let userspace know that
      it has to retry again. In order to invalidate dumps, a global
      genctr counter is increased everytime nf_tables enters a new
      generation.
      
      This new operation can be used from the user-space utility
      that controls the firewall, eg.
      
      nft -f restore
      
      The rule updates contained in `file' will be applied atomically.
      
      cat file
      -----
      add filter INPUT ip saddr 1.1.1.1 counter accept #1
      del filter INPUT ip daddr 2.2.2.2 counter drop   #2
      -EOF-
      
      Note that the rule 1 will be inactive until the transition to the
      next generation, the rule 2 will be evicted in the next generation.
      
      There is a penalty during the rule update due to the branch
      misprediction in the packet matching framework. But that should be
      quickly resolved once the iteration over the commit list that
      contain rules that require updates is finished.
      
      Event notification happens once the rule-set update has been
      committed. So we skip notifications is case the rule-set update
      is aborted, which can happen in case that the rule-set is tested
      to apply correctly.
      
      This patch squashed the following patches from Pablo:
      
      * nf_tables: atomic rule updates and dumps
      * nf_tables: get rid of per rule list_head for commits
      * nf_tables: use per netns commit list
      * nfnetlink: add batch support and use it from nf_tables
      * nf_tables: all rule updates are transactional
      * nf_tables: attach replacement rule after stale one
      * nf_tables: do not allow deletion/replacement of stale rules
      * nf_tables: remove unused NFTA_RULE_FLAGS
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0628b123
    • E
      netfilter: nf_tables: add insert operation · 5e948466
      Eric Leblond 提交于
      This patch adds a new rule attribute NFTA_RULE_POSITION which is
      used to store the position of a rule relatively to the others.
      By providing the create command and specifying the position, the
      rule is inserted after the rule with the handle equal to the
      provided position.
      
      Regarding notification, the position attribute specifies the
      handle of the previous rule to make sure we don't point to any
      stale rule in notifications coming from the commit path.
      
      This patch includes the following fix from Pablo:
      
      * nf_tables: fix rule deletion event reporting
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5e948466
    • P
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso 提交于
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
    • T
      netfilter: nf_tables: Add support for IPv6 NAT · eb31628e
      Tomasz Bursztyka 提交于
      This patch generalizes the NAT expression to support both IPv4 and IPv6
      using the existing IPv4/IPv6 NAT infrastructure. This also adds the
      NAT chain type for IPv6.
      
      This patch collapses the following patches that were posted to the
      netfilter-devel mailing list, from Tomasz:
      
      * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
      * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
      * nf_tables: Add support for IPv6 NAT expression
      * nf_tables: Add support for IPv6 NAT chain
      * nf_tables: Fix up build issue on IPv6 NAT support
      
      And, from Pablo Neira Ayuso:
      
      * fix missing dependencies in nft_chain_nat
      Signed-off-by: NTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb31628e
    • P
      netfilter: nf_tables: add support for dormant tables · 9ddf6323
      Pablo Neira Ayuso 提交于
      This patch allows you to temporarily disable an entire table.
      You can change the state of a dormant table via NFT_MSG_NEWTABLE
      messages. Using this operation you can wake up a table, so their
      chains are registered.
      
      This provides atomicity at chain level. Thus, the rule-set of one
      chain is applied at once, avoiding any possible intermediate state
      in every chain. Still, the chains that belongs to a table are
      registered consecutively. This also allows you to have inactive
      tables in the kernel.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9ddf6323
    • P
      netfilter: nf_tables: nft_payload: fix transport header base · c54032e0
      Pablo Neira Ayuso 提交于
      We cannot use skb->transport_header since it's unset, use
      pkt->xt.thoff instead.
      
      Now possible using information made available through the x_tables
      compatibility layer.
      Reported-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c54032e0
    • P
      netfilter: nf_tables: add compatibility layer for x_tables · 0ca743a5
      Pablo Neira Ayuso 提交于
      This patch adds the x_tables compatibility layer. This allows you
      to use existing x_tables matches and targets from nf_tables.
      
      This compatibility later allows us to use existing matches/targets
      for features that are still missing in nf_tables. We can progressively
      replace them with native nf_tables extensions. It also provides the
      userspace compatibility software that allows you to express the
      rule-set using the iptables syntax but using the nf_tables kernel
      components.
      
      In order to get this compatibility layer working, I've done the
      following things:
      
      * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
      to query the x_tables match/target revision, so we don't need to
      use the native x_table getsockopt interface.
      
      * emulate xt structures: this required extending the struct nft_pktinfo
      to include the fragment offset, which is already obtained from
      ip[6]_tables and that is used by some matches/targets.
      
      * add support for default policy to base chains, required to emulate
        x_tables.
      
      * add NFTA_CHAIN_USE attribute to obtain the number of references to
        chains, required by x_tables emulation.
      
      * add chain packet/byte counters using per-cpu.
      
      * support 32-64 bits compat.
      
      For historical reasons, this patch includes the following patches
      that were posted in the netfilter-devel mailing list.
      
      From Pablo Neira Ayuso:
      * nf_tables: add default policy to base chains
      * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
      * nf_tables: nft_compat: private data of target and matches in contiguous area
      * nf_tables: validate hooks for compat match/target
      * nf_tables: nft_compat: release cached matches/targets
      * nf_tables: x_tables support as a compile time option
      * nf_tables: fix alias for xtables over nftables module
      * nf_tables: add packet and byte counters per chain
      * nf_tables: fix per-chain counter stats if no counters are passed
      * nf_tables: don't bump chain stats
      * nf_tables: add protocol and flags for xtables over nf_tables
      * nf_tables: add ip[6]t_entry emulation
      * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
      * nf_tables: support 32bits-64bits x_tables compat
      * nf_tables: fix compilation if CONFIG_COMPAT is disabled
      
      From Patrick McHardy:
      * nf_tables: move policy to struct nft_base_chain
      * nf_tables: send notifications for base chain policy changes
      
      From Alexander Primak:
      * nf_tables: remove the duplicate NF_INET_LOCAL_OUT
      
      From Nicolas Dichtel:
      * nf_tables: fix compilation when nf-netlink is a module
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0ca743a5
  14. 14 10月, 2013 8 次提交
    • P
      netfilter: nf_tables: convert built-in tables/chains to chain types · 9370761c
      Pablo Neira Ayuso 提交于
      This patch converts built-in tables/chains to chain types that
      allows you to deploy customized table and chain configurations from
      userspace.
      
      After this patch, you have to specify the chain type when
      creating a new chain:
      
       add chain ip filter output { type filter hook input priority 0; }
                                    ^^^^ ------
      
      The existing chain types after this patch are: filter, route and
      nat. Note that tables are just containers of chains with no specific
      semantics, which is a significant change with regards to iptables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9370761c
    • P
      netfilter: nft_payload: add optimized payload implementation for small loads · c29b72e0
      Patrick McHardy 提交于
      Add an optimized payload expression implementation for small (up to 4 bytes)
      aligned data loads from the linear packet area.
      
      This patch also includes original Patrick McHardy's entitled (nf_tables:
      inline nft_payload_fast_eval() into main evaluation loop).
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c29b72e0
    • P
      netfilter: nf_tables: add optimized data comparison for small values · cb7dbfd0
      Patrick McHardy 提交于
      Add an optimized version of nft_data_cmp() that only handles values of to
      4 bytes length.
      
      This patch includes original Patrick McHardy's patch entitled (nf_tables:
      inline nft_cmp_fast_eval() into main evaluation loop).
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cb7dbfd0
    • P
      netfilter: nf_tables: expression ops overloading · ef1f7df9
      Patrick McHardy 提交于
      Split the expression ops into two parts and support overloading of
      the runtime expression ops based on the requested function through
      a ->select_ops() callback.
      
      This can be used to provide optimized implementations, for instance
      for loading small aligned amounts of data from the packet or inlining
      frequently used operations into the main evaluation loop.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ef1f7df9
    • P
      netfilter: nf_tables: add netlink set API · 20a69341
      Patrick McHardy 提交于
      This patch adds the new netlink API for maintaining nf_tables sets
      independently of the ruleset. The API supports the following operations:
      
      - creation of sets
      - deletion of sets
      - querying of specific sets
      - dumping of all sets
      
      - addition of set elements
      - removal of set elements
      - dumping of all set elements
      
      Sets are identified by name, each table defines an individual namespace.
      The name of a set may be allocated automatically, this is mostly useful
      in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
      automatically once the last reference has been released.
      
      Sets can be marked constant, meaning they're not allowed to change while
      linked to a rule. This allows to perform lockless operation for set
      types that would otherwise require locking.
      
      Additionally, if the implementation supports it, sets can (as before) be
      used as maps, associating a data value with each key (or range), by
      specifying the NFT_SET_MAP flag and can be used for interval queries by
      specifying the NFT_SET_INTERVAL flag.
      
      Set elements are added and removed incrementally. All element operations
      support batching, reducing netlink message and set lookup overhead.
      
      The old "set" and "hash" expressions are replaced by a generic "lookup"
      expression, which binds to the specified set. Userspace is not aware
      of the actual set implementation used by the kernel anymore, all
      configuration options are generic.
      
      Currently the implementation selection logic is largely missing and the
      kernel will simply use the first registered implementation supporting the
      requested operation. Eventually, the plan is to have userspace supply a
      description of the data characteristics and select the implementation
      based on expected performance and memory use.
      
      This patch includes the new 'lookup' expression to look up for element
      matching in the set.
      
      This patch includes kernel-doc descriptions for this set API and it
      also includes the following fixes.
      
      From Patrick McHardy:
      * netfilter: nf_tables: fix set element data type in dumps
      * netfilter: nf_tables: fix indentation of struct nft_set_elem comments
      * netfilter: nf_tables: fix oops in nft_validate_data_load()
      * netfilter: nf_tables: fix oops while listing sets of built-in tables
      * netfilter: nf_tables: destroy anonymous sets immediately if binding fails
      * netfilter: nf_tables: propagate context to set iter callback
      * netfilter: nf_tables: add loop detection
      
      From Pablo Neira Ayuso:
      * netfilter: nf_tables: allow to dump all existing sets
      * netfilter: nf_tables: fix wrong type for flags variable in newelem
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      20a69341
    • P
      netfilter: add nftables · 96518518
      Patrick McHardy 提交于
      This patch adds nftables which is the intended successor of iptables.
      This packet filtering framework reuses the existing netfilter hooks,
      the connection tracking system, the NAT subsystem, the transparent
      proxying engine, the logging infrastructure and the userspace packet
      queueing facilities.
      
      In a nutshell, nftables provides a pseudo-state machine with 4 general
      purpose registers of 128 bits and 1 specific purpose register to store
      verdicts. This pseudo-machine comes with an extensible instruction set,
      a.k.a. "expressions" in the nftables jargon. The expressions included
      in this patch provide the basic functionality, they are:
      
      * bitwise: to perform bitwise operations.
      * byteorder: to change from host/network endianess.
      * cmp: to compare data with the content of the registers.
      * counter: to enable counters on rules.
      * ct: to store conntrack keys into register.
      * exthdr: to match IPv6 extension headers.
      * immediate: to load data into registers.
      * limit: to limit matching based on packet rate.
      * log: to log packets.
      * meta: to match metainformation that usually comes with the skbuff.
      * nat: to perform Network Address Translation.
      * payload: to fetch data from the packet payload and store it into
        registers.
      * reject (IPv4 only): to explicitly close connection, eg. TCP RST.
      
      Using this instruction-set, the userspace utility 'nft' can transform
      the rules expressed in human-readable text representation (using a
      new syntax, inspired by tcpdump) to nftables bytecode.
      
      nftables also inherits the table, chain and rule objects from
      iptables, but in a more configurable way, and it also includes the
      original datatype-agnostic set infrastructure with mapping support.
      This set infrastructure is enhanced in the follow up patch (netfilter:
      nf_tables: add netlink set API).
      
      This patch includes the following components:
      
      * the netlink API: net/netfilter/nf_tables_api.c and
        include/uapi/netfilter/nf_tables.h
      * the packet filter core: net/netfilter/nf_tables_core.c
      * the expressions (described above): net/netfilter/nft_*.c
      * the filter tables: arp, IPv4, IPv6 and bridge:
        net/ipv4/netfilter/nf_tables_ipv4.c
        net/ipv6/netfilter/nf_tables_ipv6.c
        net/ipv4/netfilter/nf_tables_arp.c
        net/bridge/netfilter/nf_tables_bridge.c
      * the NAT table (IPv4 only):
        net/ipv4/netfilter/nf_table_nat_ipv4.c
      * the route table (similar to mangle):
        net/ipv4/netfilter/nf_table_route_ipv4.c
        net/ipv6/netfilter/nf_table_route_ipv6.c
      * internal definitions under:
        include/net/netfilter/nf_tables.h
        include/net/netfilter/nf_tables_core.h
      * It also includes an skeleton expression:
        net/netfilter/nft_expr_template.c
        and the preliminary implementation of the meta target
        net/netfilter/nft_meta_target.c
      
      It also includes a change in struct nf_hook_ops to add a new
      pointer to store private data to the hook, that is used to store
      the rule list per chain.
      
      This patch is based on the patch from Patrick McHardy, plus merged
      accumulated cleanups, fixes and small enhancements to the nftables
      code that has been done since 2009, which are:
      
      From Patrick McHardy:
      * nf_tables: adjust netlink handler function signatures
      * nf_tables: only retry table lookup after successful table module load
      * nf_tables: fix event notification echo and avoid unnecessary messages
      * nft_ct: add l3proto support
      * nf_tables: pass expression context to nft_validate_data_load()
      * nf_tables: remove redundant definition
      * nft_ct: fix maxattr initialization
      * nf_tables: fix invalid event type in nf_tables_getrule()
      * nf_tables: simplify nft_data_init() usage
      * nf_tables: build in more core modules
      * nf_tables: fix double lookup expression unregistation
      * nf_tables: move expression initialization to nf_tables_core.c
      * nf_tables: build in payload module
      * nf_tables: use NFPROTO constants
      * nf_tables: rename pid variables to portid
      * nf_tables: save 48 bits per rule
      * nf_tables: introduce chain rename
      * nf_tables: check for duplicate names on chain rename
      * nf_tables: remove ability to specify handles for new rules
      * nf_tables: return error for rule change request
      * nf_tables: return error for NLM_F_REPLACE without rule handle
      * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
      * nf_tables: fix NLM_F_MULTI usage in netlink notifications
      * nf_tables: include NLM_F_APPEND in rule dumps
      
      From Pablo Neira Ayuso:
      * nf_tables: fix stack overflow in nf_tables_newrule
      * nf_tables: nft_ct: fix compilation warning
      * nf_tables: nft_ct: fix crash with invalid packets
      * nft_log: group and qthreshold are 2^16
      * nf_tables: nft_meta: fix socket uid,gid handling
      * nft_counter: allow to restore counters
      * nf_tables: fix module autoload
      * nf_tables: allow to remove all rules placed in one chain
      * nf_tables: use 64-bits rule handle instead of 16-bits
      * nf_tables: fix chain after rule deletion
      * nf_tables: improve deletion performance
      * nf_tables: add missing code in route chain type
      * nf_tables: rise maximum number of expressions from 12 to 128
      * nf_tables: don't delete table if in use
      * nf_tables: fix basechain release
      
      From Tomasz Bursztyka:
      * nf_tables: Add support for changing users chain's name
      * nf_tables: Change chain's name to be fixed sized
      * nf_tables: Add support for replacing a rule by another one
      * nf_tables: Update uapi nftables netlink header documentation
      
      From Florian Westphal:
      * nft_log: group is u16, snaplen u32
      
      From Phil Oester:
      * nf_tables: operational limit match
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      96518518
    • P
      netfilter: nf_nat: move alloc_null_binding to nf_nat_core.c · f59cb045
      Pablo Neira Ayuso 提交于
      Similar to nat_decode_session, alloc_null_binding is needed for both
      ip_tables and nf_tables, so move it to nf_nat_core.c. This change
      is required by nf_tables.
      
      This is an adapted version of the original patch from Patrick McHardy.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f59cb045
    • P
      netfilter: pass hook ops to hookfn · 795aa6ef
      Patrick McHardy 提交于
      Pass the hook ops to the hookfn to allow for generic hook
      functions. This change is required by nf_tables.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      795aa6ef