1. 22 10月, 2013 5 次提交
  2. 20 10月, 2013 9 次提交
  3. 19 10月, 2013 4 次提交
  4. 18 10月, 2013 2 次提交
  5. 15 10月, 2013 4 次提交
    • P
      netfilter: nf_tables: add ARP filtering support · ed683f13
      Pablo Neira Ayuso 提交于
      This patch registers the ARP family and he filter chain type
      for this family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ed683f13
    • P
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso 提交于
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
    • T
      netfilter: nf_tables: Add support for IPv6 NAT · eb31628e
      Tomasz Bursztyka 提交于
      This patch generalizes the NAT expression to support both IPv4 and IPv6
      using the existing IPv4/IPv6 NAT infrastructure. This also adds the
      NAT chain type for IPv6.
      
      This patch collapses the following patches that were posted to the
      netfilter-devel mailing list, from Tomasz:
      
      * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
      * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
      * nf_tables: Add support for IPv6 NAT expression
      * nf_tables: Add support for IPv6 NAT chain
      * nf_tables: Fix up build issue on IPv6 NAT support
      
      And, from Pablo Neira Ayuso:
      
      * fix missing dependencies in nft_chain_nat
      Signed-off-by: NTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb31628e
    • P
      netfilter: nf_tables: add compatibility layer for x_tables · 0ca743a5
      Pablo Neira Ayuso 提交于
      This patch adds the x_tables compatibility layer. This allows you
      to use existing x_tables matches and targets from nf_tables.
      
      This compatibility later allows us to use existing matches/targets
      for features that are still missing in nf_tables. We can progressively
      replace them with native nf_tables extensions. It also provides the
      userspace compatibility software that allows you to express the
      rule-set using the iptables syntax but using the nf_tables kernel
      components.
      
      In order to get this compatibility layer working, I've done the
      following things:
      
      * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
      to query the x_tables match/target revision, so we don't need to
      use the native x_table getsockopt interface.
      
      * emulate xt structures: this required extending the struct nft_pktinfo
      to include the fragment offset, which is already obtained from
      ip[6]_tables and that is used by some matches/targets.
      
      * add support for default policy to base chains, required to emulate
        x_tables.
      
      * add NFTA_CHAIN_USE attribute to obtain the number of references to
        chains, required by x_tables emulation.
      
      * add chain packet/byte counters using per-cpu.
      
      * support 32-64 bits compat.
      
      For historical reasons, this patch includes the following patches
      that were posted in the netfilter-devel mailing list.
      
      From Pablo Neira Ayuso:
      * nf_tables: add default policy to base chains
      * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
      * nf_tables: nft_compat: private data of target and matches in contiguous area
      * nf_tables: validate hooks for compat match/target
      * nf_tables: nft_compat: release cached matches/targets
      * nf_tables: x_tables support as a compile time option
      * nf_tables: fix alias for xtables over nftables module
      * nf_tables: add packet and byte counters per chain
      * nf_tables: fix per-chain counter stats if no counters are passed
      * nf_tables: don't bump chain stats
      * nf_tables: add protocol and flags for xtables over nf_tables
      * nf_tables: add ip[6]t_entry emulation
      * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
      * nf_tables: support 32bits-64bits x_tables compat
      * nf_tables: fix compilation if CONFIG_COMPAT is disabled
      
      From Patrick McHardy:
      * nf_tables: move policy to struct nft_base_chain
      * nf_tables: send notifications for base chain policy changes
      
      From Alexander Primak:
      * nf_tables: remove the duplicate NF_INET_LOCAL_OUT
      
      From Nicolas Dichtel:
      * nf_tables: fix compilation when nf-netlink is a module
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0ca743a5
  6. 14 10月, 2013 4 次提交
    • P
      netfilter: nf_tables: convert built-in tables/chains to chain types · 9370761c
      Pablo Neira Ayuso 提交于
      This patch converts built-in tables/chains to chain types that
      allows you to deploy customized table and chain configurations from
      userspace.
      
      After this patch, you have to specify the chain type when
      creating a new chain:
      
       add chain ip filter output { type filter hook input priority 0; }
                                    ^^^^ ------
      
      The existing chain types after this patch are: filter, route and
      nat. Note that tables are just containers of chains with no specific
      semantics, which is a significant change with regards to iptables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9370761c
    • P
      netfilter: nf_tables: expression ops overloading · ef1f7df9
      Patrick McHardy 提交于
      Split the expression ops into two parts and support overloading of
      the runtime expression ops based on the requested function through
      a ->select_ops() callback.
      
      This can be used to provide optimized implementations, for instance
      for loading small aligned amounts of data from the packet or inlining
      frequently used operations into the main evaluation loop.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ef1f7df9
    • P
      netfilter: add nftables · 96518518
      Patrick McHardy 提交于
      This patch adds nftables which is the intended successor of iptables.
      This packet filtering framework reuses the existing netfilter hooks,
      the connection tracking system, the NAT subsystem, the transparent
      proxying engine, the logging infrastructure and the userspace packet
      queueing facilities.
      
      In a nutshell, nftables provides a pseudo-state machine with 4 general
      purpose registers of 128 bits and 1 specific purpose register to store
      verdicts. This pseudo-machine comes with an extensible instruction set,
      a.k.a. "expressions" in the nftables jargon. The expressions included
      in this patch provide the basic functionality, they are:
      
      * bitwise: to perform bitwise operations.
      * byteorder: to change from host/network endianess.
      * cmp: to compare data with the content of the registers.
      * counter: to enable counters on rules.
      * ct: to store conntrack keys into register.
      * exthdr: to match IPv6 extension headers.
      * immediate: to load data into registers.
      * limit: to limit matching based on packet rate.
      * log: to log packets.
      * meta: to match metainformation that usually comes with the skbuff.
      * nat: to perform Network Address Translation.
      * payload: to fetch data from the packet payload and store it into
        registers.
      * reject (IPv4 only): to explicitly close connection, eg. TCP RST.
      
      Using this instruction-set, the userspace utility 'nft' can transform
      the rules expressed in human-readable text representation (using a
      new syntax, inspired by tcpdump) to nftables bytecode.
      
      nftables also inherits the table, chain and rule objects from
      iptables, but in a more configurable way, and it also includes the
      original datatype-agnostic set infrastructure with mapping support.
      This set infrastructure is enhanced in the follow up patch (netfilter:
      nf_tables: add netlink set API).
      
      This patch includes the following components:
      
      * the netlink API: net/netfilter/nf_tables_api.c and
        include/uapi/netfilter/nf_tables.h
      * the packet filter core: net/netfilter/nf_tables_core.c
      * the expressions (described above): net/netfilter/nft_*.c
      * the filter tables: arp, IPv4, IPv6 and bridge:
        net/ipv4/netfilter/nf_tables_ipv4.c
        net/ipv6/netfilter/nf_tables_ipv6.c
        net/ipv4/netfilter/nf_tables_arp.c
        net/bridge/netfilter/nf_tables_bridge.c
      * the NAT table (IPv4 only):
        net/ipv4/netfilter/nf_table_nat_ipv4.c
      * the route table (similar to mangle):
        net/ipv4/netfilter/nf_table_route_ipv4.c
        net/ipv6/netfilter/nf_table_route_ipv6.c
      * internal definitions under:
        include/net/netfilter/nf_tables.h
        include/net/netfilter/nf_tables_core.h
      * It also includes an skeleton expression:
        net/netfilter/nft_expr_template.c
        and the preliminary implementation of the meta target
        net/netfilter/nft_meta_target.c
      
      It also includes a change in struct nf_hook_ops to add a new
      pointer to store private data to the hook, that is used to store
      the rule list per chain.
      
      This patch is based on the patch from Patrick McHardy, plus merged
      accumulated cleanups, fixes and small enhancements to the nftables
      code that has been done since 2009, which are:
      
      From Patrick McHardy:
      * nf_tables: adjust netlink handler function signatures
      * nf_tables: only retry table lookup after successful table module load
      * nf_tables: fix event notification echo and avoid unnecessary messages
      * nft_ct: add l3proto support
      * nf_tables: pass expression context to nft_validate_data_load()
      * nf_tables: remove redundant definition
      * nft_ct: fix maxattr initialization
      * nf_tables: fix invalid event type in nf_tables_getrule()
      * nf_tables: simplify nft_data_init() usage
      * nf_tables: build in more core modules
      * nf_tables: fix double lookup expression unregistation
      * nf_tables: move expression initialization to nf_tables_core.c
      * nf_tables: build in payload module
      * nf_tables: use NFPROTO constants
      * nf_tables: rename pid variables to portid
      * nf_tables: save 48 bits per rule
      * nf_tables: introduce chain rename
      * nf_tables: check for duplicate names on chain rename
      * nf_tables: remove ability to specify handles for new rules
      * nf_tables: return error for rule change request
      * nf_tables: return error for NLM_F_REPLACE without rule handle
      * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
      * nf_tables: fix NLM_F_MULTI usage in netlink notifications
      * nf_tables: include NLM_F_APPEND in rule dumps
      
      From Pablo Neira Ayuso:
      * nf_tables: fix stack overflow in nf_tables_newrule
      * nf_tables: nft_ct: fix compilation warning
      * nf_tables: nft_ct: fix crash with invalid packets
      * nft_log: group and qthreshold are 2^16
      * nf_tables: nft_meta: fix socket uid,gid handling
      * nft_counter: allow to restore counters
      * nf_tables: fix module autoload
      * nf_tables: allow to remove all rules placed in one chain
      * nf_tables: use 64-bits rule handle instead of 16-bits
      * nf_tables: fix chain after rule deletion
      * nf_tables: improve deletion performance
      * nf_tables: add missing code in route chain type
      * nf_tables: rise maximum number of expressions from 12 to 128
      * nf_tables: don't delete table if in use
      * nf_tables: fix basechain release
      
      From Tomasz Bursztyka:
      * nf_tables: Add support for changing users chain's name
      * nf_tables: Change chain's name to be fixed sized
      * nf_tables: Add support for replacing a rule by another one
      * nf_tables: Update uapi nftables netlink header documentation
      
      From Florian Westphal:
      * nft_log: group is u16, snaplen u32
      
      From Phil Oester:
      * nf_tables: operational limit match
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      96518518
    • P
      netfilter: pass hook ops to hookfn · 795aa6ef
      Patrick McHardy 提交于
      Pass the hook ops to the hookfn to allow for generic hook
      functions. This change is required by nf_tables.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      795aa6ef
  7. 12 10月, 2013 1 次提交
  8. 11 10月, 2013 1 次提交
  9. 10 10月, 2013 4 次提交
  10. 09 10月, 2013 6 次提交
    • E
      udp: fix a typo in __udp4_lib_mcast_demux_lookup · f69b923a
      Eric Dumazet 提交于
      At this point sk might contain garbage.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f69b923a
    • E
      ipv6: make lookups simpler and faster · efe4208f
      Eric Dumazet 提交于
      TCP listener refactoring, part 4 :
      
      To speed up inet lookups, we moved IPv4 addresses from inet to struct
      sock_common
      
      Now is time to do the same for IPv6, because it permits us to have fast
      lookups for all kind of sockets, including upcoming SYN_RECV.
      
      Getting IPv6 addresses in TCP lookups currently requires two extra cache
      lines, plus a dereference (and memory stall).
      
      inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
      
      This patch is way bigger than its IPv4 counter part, because for IPv4,
      we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
      it's not doable easily.
      
      inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
      inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
      
      And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
      at the same offset.
      
      We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
      macro.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe4208f
    • E
      tcp/dccp: remove twchain · 05dbc7b5
      Eric Dumazet 提交于
      TCP listener refactoring, part 3 :
      
      Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
      and parallel SYN processing.
      
      Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
      friend states) sockets, another for TIME_WAIT sockets only.
      
      As the hash table is sized to get at most one socket per bucket, it
      makes little sense to have separate twchain, as it makes the lookup
      slightly more complicated, and doubles hash table memory usage.
      
      If we make sure all socket types have the lookup keys at the same
      offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
      and ESTABLISHED sockets already have common lookup fields for IPv4.
      
      [ INET_TW_MATCH() is no longer needed ]
      
      I'll provide a follow-up to factorize IPv6 lookup as well, to remove
      INET6_TW_MATCH()
      
      This way, SYN_RECV pseudo sockets will be supported the same.
      
      A new sock_gen_put() helper is added, doing either a sock_put() or
      inet_twsk_put() [ and will support SYN_RECV later ].
      
      Note this helper should only be called in real slow path, when rcu
      lookup found a socket that was moved to another identity (freed/reused
      immediately), but could eventually be used in other contexts, like
      sock_edemux()
      
      Before patch :
      
      dmesg | grep "TCP established"
      
      TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
      
      After patch :
      
      TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05dbc7b5
    • S
      net: ipv4 only populate IP_PKTINFO when needed · fbf8866d
      Shawn Bohrer 提交于
      The since the removal of the routing cache computing
      fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
      packet received.  This has introduced a performance regression for some
      UDP workloads.
      
      This change skips populating the packet info for sockets that do not have
      IP_PKTINFO set.
      
      Benchmark results from a netperf UDP_RR test:
      Before 89789.68 transactions/s
      After  90587.62 transactions/s
      
      Benchmark results from a fio 1 byte UDP multicast pingpong test
      (Multicast one way unicast response):
      Before 12.63us RTT
      After  12.48us RTT
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbf8866d
    • S
      udp: ipv4: Add udp early demux · 421b3885
      Shawn Bohrer 提交于
      The removal of the routing cache introduced a performance regression for
      some UDP workloads since a dst lookup must be done for each packet.
      This change caches the dst per socket in a similar manner to what we do
      for TCP by implementing early_demux.
      
      For UDP multicast we can only cache the dst if there is only one
      receiving socket on the host.  Since caching only works when there is
      one receiving socket we do the multicast socket lookup using RCU.
      
      For UDP unicast we only demux sockets with an exact match in order to
      not break forwarding setups.  Additionally since the hash chains may be
      long we only check the first socket to see if it is a match and not
      waste extra time searching the whole chain when we might not find an
      exact match.
      
      Benchmark results from a netperf UDP_RR test:
      Before 87961.22 transactions/s
      After  89789.68 transactions/s
      
      Benchmark results from a fio 1 byte UDP multicast pingpong test
      (Multicast one way unicast response):
      Before 12.97us RTT
      After  12.63us RTT
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      421b3885
    • S
      udp: Only allow busy read/poll on connected sockets · 005ec974
      Shawn Bohrer 提交于
      UDP sockets can receive packets from multiple endpoints and thus may be
      received on multiple receive queues.  Since packets packets can arrive
      on multiple receive queues we should not mark the napi_id for all
      packets.  This makes busy read/poll only work for connected UDP sockets.
      
      This additionally enables busy read/poll for UDP multicast packets as
      long as the socket is connected by moving the check into
      __udp_queue_rcv_skb().
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      005ec974