1. 20 10月, 2013 14 次提交
    • H
      inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once · b23a002f
      Hannes Frederic Sowa 提交于
      This patch splits the secret key for syncookies for ipv4 and ipv6 and
      initializes them with net_get_random_once. This change was the reason I
      did this series. I think the initialization of the syncookie_secret is
      way to early.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b23a002f
    • H
      net: introduce new macro net_get_random_once · a48e4292
      Hannes Frederic Sowa 提交于
      net_get_random_once is a new macro which handles the initialization
      of secret keys. It is possible to call it in the fast path. Only the
      initialization depends on the spinlock and is rather slow. Otherwise
      it should get used just before the key is used to delay the entropy
      extration as late as possible to get better randomness. It returns true
      if the key got initialized.
      
      The usage of static_keys for net_get_random_once is a bit uncommon so
      it needs some further explanation why this actually works:
      
      === In the simple non-HAVE_JUMP_LABEL case we actually have ===
      no constrains to use static_key_(true|false) on keys initialized with
      STATIC_KEY_INIT_(FALSE|TRUE). So this path just expands in favor of
      the likely case that the initialization is already done. The key is
      initialized like this:
      
      ___done_key = { .enabled = ATOMIC_INIT(0) }
      
      The check
      
                      if (!static_key_true(&___done_key))                     \
      
      expands into (pseudo code)
      
                      if (!likely(___done_key > 0))
      
      , so we take the fast path as soon as ___done_key is increased from the
      helper function.
      
      === If HAVE_JUMP_LABELs are available this depends ===
      on patching of jumps into the prepared NOPs, which is done in
      jump_label_init at boot-up time (from start_kernel). It is forbidden
      and dangerous to use net_get_random_once in functions which are called
      before that!
      
      At compilation time NOPs are generated at the call sites of
      net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we
      need to call net_get_random_once two times in inet6_ehashfn, so two NOPs):
      
            71:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
            76:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
      
      Both will be patched to the actual jumps to the end of the function to
      call __net_get_random_once at boot time as explained above.
      
      arch_static_branch is optimized and inlined for false as return value and
      actually also returns false in case the NOP is placed in the instruction
      stream. So in the fast case we get a "return false". But because we
      initialize ___done_key with (enabled != (entries & 1)) this call-site
      will get patched up at boot thus returning true. The final check looks
      like this:
      
                      if (!static_key_true(&___done_key))                     \
                              ___ret = __net_get_random_once(buf,             \
      
      expands to
      
                      if (!!static_key_false(&___done_key))                     \
                              ___ret = __net_get_random_once(buf,             \
      
      So we get true at boot time and as soon as static_key_slow_inc is called
      on the key it will invert the logic and return false for the fast path.
      static_key_slow_inc will change the branch because it got initialized
      with .enabled == 0. After static_key_slow_inc is called on the key the
      branch is replaced with a nop again.
      
      === Misc: ===
      The helper defers the increment into a workqueue so we don't
      have problems calling this code from atomic sections. A seperate boolean
      (___done) guards the case where we enter net_get_random_once again before
      the increment happend.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a48e4292
    • H
      ipv6: split inet6_ehashfn to hash functions per compilation unit · b50026b5
      Hannes Frederic Sowa 提交于
      This patch splits the inet6_ehashfn into separate ones in
      ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of
      seperate secrets keys later.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b50026b5
    • H
      ipv4: split inet_ehashfn to hash functions per compilation unit · 65cd8033
      Hannes Frederic Sowa 提交于
      This duplicates a bit of code but let's us easily introduce
      separate secret keys later. The separate compilation units are
      ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65cd8033
    • E
      ipip: add GSO/TSO support · cb32f511
      Eric Dumazet 提交于
      Now inet_gso_segment() is stackable, its relatively easy to
      implement GSO/TSO support for IPIP
      
      Performance results, when segmentation is done after tunnel
      device (as no NIC is yet enabled for TSO IPIP support) :
      
      Before patch :
      
      lpq83:~# ./netperf -H 7.7.9.84 -Cc
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      3357.88   5.09     3.70     2.983   2.167
      
      After patch :
      
      lpq83:~# ./netperf -H 7.7.9.84 -Cc
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      7710.19   4.52     6.62     1.152   1.687
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb32f511
    • E
      ipv4: gso: make inet_gso_segment() stackable · 3347c960
      Eric Dumazet 提交于
      In order to support GSO on IPIP, we need to make
      inet_gso_segment() stackable.
      
      It should not assume network header starts right after mac
      header.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3347c960
    • E
      ipv4: generalize gre_handle_offloads · 2d26f0a3
      Eric Dumazet 提交于
      This patch makes gre_handle_offloads() more generic
      and rename it to iptunnel_handle_offloads()
      
      This will be used to add GSO/TSO support to IPIP tunnels.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d26f0a3
    • E
      net: generalize skb_segment() · 030737bc
      Eric Dumazet 提交于
      While implementing GSO/TSO support for IPIP, I found skb_segment()
      was assuming network header was immediately following mac header.
      
      Its not really true in the case inet_gso_segment() is stacked :
      By the time tcp_gso_segment() is called, network header points
      to the inner IP header.
      
      Let's instead assume nothing and pick the current offsets found in
      original skb, we have skb_headers_offset_update() helper for that.
      
      Also move the csum_start update inside skb_headers_offset_update()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      030737bc
    • E
      ipv6: gso: remove redundant locking · b917eb15
      Eric Dumazet 提交于
      ipv6_gso_send_check() and ipv6_gso_segment() are called by
      skb_mac_gso_segment() under rcu lock, no need to use
      rcu_read_lock() / rcu_read_unlock()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b917eb15
    • J
      net: misc: Remove extern from function prototypes · c1b1203d
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1b1203d
    • J
      net: ipv4/ipv6: Remove extern from function prototypes · 7e58487b
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e58487b
    • J
      net: dccp: Remove extern from function prototypes · a402a5aa
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a402a5aa
    • J
      net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes · 348662a1
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      348662a1
    • E
      ipv4: gso: send_check() & segment() cleanups · 47d27aad
      Eric Dumazet 提交于
      inet_gso_segment() and inet_gso_send_check() are called by
      skb_mac_gso_segment() under rcu lock, no need to use
      rcu_read_lock() / rcu_read_unlock()
      
      Avoid calling ip_hdr() twice per function.
      
      We can use ip_send_check() helper.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d27aad
  2. 19 10月, 2013 14 次提交
  3. 18 10月, 2013 3 次提交
  4. 15 10月, 2013 9 次提交
    • P
      netfilter: nf_tables: add ARP filtering support · ed683f13
      Pablo Neira Ayuso 提交于
      This patch registers the ARP family and he filter chain type
      for this family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ed683f13
    • P
      netfilter: nf_tables: add trace support · b5bc89bf
      Pablo Neira Ayuso 提交于
      This patch adds support for tracing the packet travel through
      the ruleset, in a similar fashion to x_tables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b5bc89bf
    • P
      netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123
      Pablo Neira Ayuso 提交于
      This patch adds a batch support to nfnetlink. Basically, it adds
      two new control messages:
      
      * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
        the nfgenmsg->res_id indicates the nfnetlink subsystem ID.
      
      * NFNL_MSG_BATCH_END, that results in the invocation of the
        ss->commit callback function. If not specified or an error
        ocurred in the batch, the ss->abort function is invoked
        instead.
      
      The end message represents the commit operation in nftables, the
      lack of end message results in an abort. This patch also adds the
      .call_batch function that is only called from the batch receival
      path.
      
      This patch adds atomic rule updates and dumps based on
      bitmask generations. This allows to atomically commit a set of
      rule-set updates incrementally without altering the internal
      state of existing nf_tables expressions/matches/targets.
      
      The idea consists of using a generation cursor of 1 bit and
      a bitmask of 2 bits per rule. Assuming the gencursor is 0,
      then the genmask (expressed as a bitmask) can be interpreted
      as:
      
      00 active in the present, will be active in the next generation.
      01 inactive in the present, will be active in the next generation.
      10 active in the present, will be deleted in the next generation.
       ^
       gencursor
      
      Once you invoke the transition to the next generation, the global
      gencursor is updated:
      
      00 active in the present, will be active in the next generation.
      01 active in the present, needs to zero its future, it becomes 00.
      10 inactive in the present, delete now.
      ^
      gencursor
      
      If a dump is in progress and nf_tables enters a new generation,
      the dump will stop and return -EBUSY to let userspace know that
      it has to retry again. In order to invalidate dumps, a global
      genctr counter is increased everytime nf_tables enters a new
      generation.
      
      This new operation can be used from the user-space utility
      that controls the firewall, eg.
      
      nft -f restore
      
      The rule updates contained in `file' will be applied atomically.
      
      cat file
      -----
      add filter INPUT ip saddr 1.1.1.1 counter accept #1
      del filter INPUT ip daddr 2.2.2.2 counter drop   #2
      -EOF-
      
      Note that the rule 1 will be inactive until the transition to the
      next generation, the rule 2 will be evicted in the next generation.
      
      There is a penalty during the rule update due to the branch
      misprediction in the packet matching framework. But that should be
      quickly resolved once the iteration over the commit list that
      contain rules that require updates is finished.
      
      Event notification happens once the rule-set update has been
      committed. So we skip notifications is case the rule-set update
      is aborted, which can happen in case that the rule-set is tested
      to apply correctly.
      
      This patch squashed the following patches from Pablo:
      
      * nf_tables: atomic rule updates and dumps
      * nf_tables: get rid of per rule list_head for commits
      * nf_tables: use per netns commit list
      * nfnetlink: add batch support and use it from nf_tables
      * nf_tables: all rule updates are transactional
      * nf_tables: attach replacement rule after stale one
      * nf_tables: do not allow deletion/replacement of stale rules
      * nf_tables: remove unused NFTA_RULE_FLAGS
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0628b123
    • E
      netfilter: nf_tables: add insert operation · 5e948466
      Eric Leblond 提交于
      This patch adds a new rule attribute NFTA_RULE_POSITION which is
      used to store the position of a rule relatively to the others.
      By providing the create command and specifying the position, the
      rule is inserted after the rule with the handle equal to the
      provided position.
      
      Regarding notification, the position attribute specifies the
      handle of the previous rule to make sure we don't point to any
      stale rule in notifications coming from the commit path.
      
      This patch includes the following fix from Pablo:
      
      * nf_tables: fix rule deletion event reporting
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5e948466
    • P
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso 提交于
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
    • T
      netfilter: nf_tables: Add support for IPv6 NAT · eb31628e
      Tomasz Bursztyka 提交于
      This patch generalizes the NAT expression to support both IPv4 and IPv6
      using the existing IPv4/IPv6 NAT infrastructure. This also adds the
      NAT chain type for IPv6.
      
      This patch collapses the following patches that were posted to the
      netfilter-devel mailing list, from Tomasz:
      
      * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
      * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
      * nf_tables: Add support for IPv6 NAT expression
      * nf_tables: Add support for IPv6 NAT chain
      * nf_tables: Fix up build issue on IPv6 NAT support
      
      And, from Pablo Neira Ayuso:
      
      * fix missing dependencies in nft_chain_nat
      Signed-off-by: NTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb31628e
    • P
      netfilter: nf_tables: add support for dormant tables · 9ddf6323
      Pablo Neira Ayuso 提交于
      This patch allows you to temporarily disable an entire table.
      You can change the state of a dormant table via NFT_MSG_NEWTABLE
      messages. Using this operation you can wake up a table, so their
      chains are registered.
      
      This provides atomicity at chain level. Thus, the rule-set of one
      chain is applied at once, avoiding any possible intermediate state
      in every chain. Still, the chains that belongs to a table are
      registered consecutively. This also allows you to have inactive
      tables in the kernel.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9ddf6323
    • P
      netfilter: nf_tables: nft_payload: fix transport header base · c54032e0
      Pablo Neira Ayuso 提交于
      We cannot use skb->transport_header since it's unset, use
      pkt->xt.thoff instead.
      
      Now possible using information made available through the x_tables
      compatibility layer.
      Reported-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c54032e0
    • P
      netfilter: nf_tables: add compatibility layer for x_tables · 0ca743a5
      Pablo Neira Ayuso 提交于
      This patch adds the x_tables compatibility layer. This allows you
      to use existing x_tables matches and targets from nf_tables.
      
      This compatibility later allows us to use existing matches/targets
      for features that are still missing in nf_tables. We can progressively
      replace them with native nf_tables extensions. It also provides the
      userspace compatibility software that allows you to express the
      rule-set using the iptables syntax but using the nf_tables kernel
      components.
      
      In order to get this compatibility layer working, I've done the
      following things:
      
      * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
      to query the x_tables match/target revision, so we don't need to
      use the native x_table getsockopt interface.
      
      * emulate xt structures: this required extending the struct nft_pktinfo
      to include the fragment offset, which is already obtained from
      ip[6]_tables and that is used by some matches/targets.
      
      * add support for default policy to base chains, required to emulate
        x_tables.
      
      * add NFTA_CHAIN_USE attribute to obtain the number of references to
        chains, required by x_tables emulation.
      
      * add chain packet/byte counters using per-cpu.
      
      * support 32-64 bits compat.
      
      For historical reasons, this patch includes the following patches
      that were posted in the netfilter-devel mailing list.
      
      From Pablo Neira Ayuso:
      * nf_tables: add default policy to base chains
      * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
      * nf_tables: nft_compat: private data of target and matches in contiguous area
      * nf_tables: validate hooks for compat match/target
      * nf_tables: nft_compat: release cached matches/targets
      * nf_tables: x_tables support as a compile time option
      * nf_tables: fix alias for xtables over nftables module
      * nf_tables: add packet and byte counters per chain
      * nf_tables: fix per-chain counter stats if no counters are passed
      * nf_tables: don't bump chain stats
      * nf_tables: add protocol and flags for xtables over nf_tables
      * nf_tables: add ip[6]t_entry emulation
      * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
      * nf_tables: support 32bits-64bits x_tables compat
      * nf_tables: fix compilation if CONFIG_COMPAT is disabled
      
      From Patrick McHardy:
      * nf_tables: move policy to struct nft_base_chain
      * nf_tables: send notifications for base chain policy changes
      
      From Alexander Primak:
      * nf_tables: remove the duplicate NF_INET_LOCAL_OUT
      
      From Nicolas Dichtel:
      * nf_tables: fix compilation when nf-netlink is a module
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0ca743a5