1. 18 9月, 2015 1 次提交
  2. 03 9月, 2015 1 次提交
    • D
      netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled · a82b0e63
      Daniel Borkmann 提交于
      While testing various Kconfig options on another issue, I found that
      the following one triggers as well on allmodconfig and nf_conntrack
      disabled:
      
        net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’:
        net/ipv4/netfilter/nf_dup_ipv4.c:72:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
          if (this_cpu_read(nf_skb_duplicated))
        [...]
        net/ipv6/netfilter/nf_dup_ipv6.c: In function ‘nf_dup_ipv6’:
        net/ipv6/netfilter/nf_dup_ipv6.c:66:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
          if (this_cpu_read(nf_skb_duplicated))
      
      Fix it by including directly the header where it is defined.
      
      Fixes: bbde9fc1 ("netfilter: factor out packet duplication for IPv4/IPv6")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a82b0e63
  3. 29 8月, 2015 1 次提交
  4. 22 8月, 2015 1 次提交
    • P
      netfilter: nf_dup: fix sparse warnings · 59e26423
      Pablo Neira Ayuso 提交于
      >> net/ipv4/netfilter/nft_dup_ipv4.c:29:37: sparse: incorrect type in initializer (different base types)
         net/ipv4/netfilter/nft_dup_ipv4.c:29:37:    expected restricted __be32 [user type] s_addr
         net/ipv4/netfilter/nft_dup_ipv4.c:29:37:    got unsigned int [unsigned] <noident>
      
      >> net/ipv6/netfilter/nf_dup_ipv6.c:48:23: sparse: incorrect type in assignment (different base types)
         net/ipv6/netfilter/nf_dup_ipv6.c:48:23:    expected restricted __be32 [addressable] [assigned] [usertype] flowlabel
         net/ipv6/netfilter/nf_dup_ipv6.c:48:23:    got int
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      59e26423
  5. 18 8月, 2015 3 次提交
    • T
      net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d
      Tom Herbert 提交于
      inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
      the checksum field carries a pseudo header. This argument should be a
      boolean instead of an int.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b048d6d
    • D
      netfilter: nf_conntrack: add efficient mark to zone mapping · 5e8018fc
      Daniel Borkmann 提交于
      This work adds the possibility of deriving the zone id from the skb->mark
      field in a scalable manner. This allows for having only a single template
      serving hundreds/thousands of different zones, for example, instead of the
      need to have one match for each zone as an extra CT jump target.
      
      Note that we'd need to have this information attached to the template as at
      the time when we're trying to lookup a possible ct object, we already need
      to know zone information for a possible match when going into
      __nf_conntrack_find_get(). This work provides a minimal implementation for
      a possible mapping.
      
      In order to not add/expose an extra ct->status bit, the zone structure has
      been extended to carry a flag for deriving the mark.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5e8018fc
    • D
      netfilter: nf_conntrack: add direction support for zones · deedb590
      Daniel Borkmann 提交于
      This work adds a direction parameter to netfilter zones, so identity
      separation can be performed only in original/reply or both directions
      (default). This basically opens up the possibility of doing NAT with
      conflicting IP address/port tuples from multiple, isolated tenants
      on a host (e.g. from a netns) without requiring each tenant to NAT
      twice resp. to use its own dedicated IP address to SNAT to, meaning
      overlapping tuples can be made unique with the zone identifier in
      original direction, where the NAT engine will then allocate a unique
      tuple in the commonly shared default zone for the reply direction.
      In some restricted, local DNAT cases, also port redirection could be
      used for making the reply traffic unique w/o requiring SNAT.
      
      The consensus we've reached and discussed at NFWS and since the initial
      implementation [1] was to directly integrate the direction meta data
      into the existing zones infrastructure, as opposed to the ct->mark
      approach we proposed initially.
      
      As we pass the nf_conntrack_zone object directly around, we don't have
      to touch all call-sites, but only those, that contain equality checks
      of zones. Thus, based on the current direction (original or reply),
      we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
      CT expectations are direction-agnostic entities when expectations are
      being compared among themselves, so we can only use the identifier
      in this case.
      
      Note that zone identifiers can not be included into the hash mix
      anymore as they don't contain a "stable" value that would be equal
      for both directions at all times, f.e. if only zone->id would
      unconditionally be xor'ed into the table slot hash, then replies won't
      find the corresponding conntracking entry anymore.
      
      If no particular direction is specified when configuring zones, the
      behaviour is exactly as we expect currently (both directions).
      
      Support has been added for the CT netlink interface as well as the
      x_tables raw CT target, which both already offer existing interfaces
      to user space for the configuration of zones.
      
      Below a minimal, simplified collision example (script in [2]) with
      netperf sessions:
      
        +--- tenant-1 ---+   mark := 1
        |    netperf     |--+
        +----------------+  |                CT zone := mark [ORIGINAL]
         [ip,sport] := X   +--------------+  +--- gateway ---+
                           | mark routing |--|     SNAT      |-- ... +
                           +--------------+  +---------------+       |
        +--- tenant-2 ---+  |                                     ~~~|~~~
        |    netperf     |--+                +-----------+           |
        +----------------+   mark := 2       | netserver |------ ... +
         [ip,sport] := X                     +-----------+
                                              [ip,port] := Y
      On the gateway netns, example:
      
        iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
        iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
      
        iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
        iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
      
      conntrack dump from gateway netns:
      
        netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
      
        tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
                                 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
                     [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
      
        tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
                                 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
                     [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
      
        tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
                              src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
                     [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
      
        tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
                              src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
                     [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
      
      Taking this further, test script in [2] creates 200 tenants and runs
      original-tuple colliding netperf sessions each. A conntrack -L dump in
      the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
      state as expected.
      
      I also did run various other tests with some permutations of the script,
      to mention some: SNAT in random/random-fully/persistent mode, no zones (no
      overlaps), static zones (original, reply, both directions), etc.
      
        [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
        [2] https://paste.fedoraproject.org/242835/65657871/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      deedb590
  6. 11 8月, 2015 1 次提交
  7. 10 8月, 2015 1 次提交
    • P
      netfilter: SYNPROXY: fix sending window update to client · 3c16241c
      Phil Sutter 提交于
      Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to
      finish the server handshake, then calls nf_ct_seqadj_init() to initiate
      sequence number adjustment of forwarded packets to the client and finally sends
      a window update to the client to unblock it's TX queue.
      
      Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct
      parameter, no sequence number adjustment happens and the client receives the
      window update with incorrect sequence number. Depending on client TCP
      implementation, this leads to a significant delay (until a window probe is
      being sent).
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3c16241c
  8. 07 8月, 2015 2 次提交
  9. 30 7月, 2015 1 次提交
  10. 16 7月, 2015 4 次提交
    • F
      netfilter: xtables: remove __pure annotation · 6c7941de
      Florian Westphal 提交于
      sparse complains:
      ip_tables.c:361:27: warning: incorrect type in assignment (different modifiers)
      ip_tables.c:361:27:    expected struct ipt_entry *[assigned] e
      ip_tables.c:361:27:    got struct ipt_entry [pure] *
      
      doesn't change generated code.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6c7941de
    • F
      netfilter: add and use jump label for xt_tee · dcebd315
      Florian Westphal 提交于
      Don't bother testing if we need to switch to alternate stack
      unless TEE target is used.
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dcebd315
    • F
      netfilter: xtables: don't save/restore jumpstack offset · 7814b6ec
      Florian Westphal 提交于
      In most cases there is no reentrancy into ip/ip6tables.
      
      For skbs sent by REJECT or SYNPROXY targets, there is one level
      of reentrancy, but its not relevant as those targets issue an absolute
      verdict, i.e. the jumpstack can be clobbered since its not used
      after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).
      
      So the only special case where it is relevant is the TEE target, which
      returns XT_CONTINUE.
      
      This patch changes ip(6)_do_table to always use the jump stack starting
      from 0.
      
      When we detect we're operating on an skb sent via TEE (percpu
      nf_skb_duplicated is 1) we switch to an alternate stack to leave
      the original one alone.
      
      Since there is no TEE support for arptables, it doesn't need to
      test if tee is active.
      
      The jump stack overflow tests are no longer needed as well --
      since ->stacksize is the largest call depth we cannot exceed it.
      
      A much better alternative to the external jumpstack would be to just
      declare a jumps[32] stack on the local stack frame, but that would mean
      we'd have to reject iptables rulesets that used to work before.
      
      Another alternative would be to start rejecting rulesets with a larger
      call depth, e.g. 1000 -- in this case it would be feasible to allocate the
      entire stack in the percpu area which would avoid one dereference.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7814b6ec
    • F
      netfilter: xtables: compute exact size needed for jumpstack · 98d1bd80
      Florian Westphal 提交于
      The {arp,ip,ip6tables} jump stack is currently sized based
      on the number of user chains.
      
      However, its rather unlikely that every user defined chain jumps to the
      next, so lets use the existing loop detection logic to also track the
      chain depths.
      
      The stacksize is then set to the largest chain depth seen.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      98d1bd80
  11. 02 7月, 2015 1 次提交
    • F
      netfilter: arptables: use percpu jumpstack · 3bd22997
      Florian Westphal 提交于
      commit 482cfc31 ("netfilter: xtables: avoid percpu ruleset duplication")
      
      Unlike ip and ip6tables, arp tables were never converted to use the percpu
      jump stack.
      
      It still uses the rule blob to store return address, which isn't safe
      anymore since we now share this blob among all processors.
      
      Because there is no TEE support for arptables, we don't need to cope
      with reentrancy, so we can use loocal variable to hold stack offset.
      
      Fixes: 482cfc31 ("netfilter: xtables: avoid percpu ruleset duplication")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3bd22997
  12. 24 6月, 2015 1 次提交
    • A
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek 提交于
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eeb075f
  13. 16 6月, 2015 1 次提交
  14. 15 6月, 2015 1 次提交
  15. 12 6月, 2015 2 次提交
  16. 27 5月, 2015 1 次提交
  17. 20 5月, 2015 1 次提交
    • D
      netfilter: ensure number of counters is >0 in do_replace() · 1086bbe9
      Dave Jones 提交于
      After improving setsockopt() coverage in trinity, I started triggering
      vmalloc failures pretty reliably from this code path:
      
      warn_alloc_failed+0xe9/0x140
      __vmalloc_node_range+0x1be/0x270
      vzalloc+0x4b/0x50
      __do_replace+0x52/0x260 [ip_tables]
      do_ipt_set_ctl+0x15d/0x1d0 [ip_tables]
      nf_setsockopt+0x65/0x90
      ip_setsockopt+0x61/0xa0
      raw_setsockopt+0x16/0x60
      sock_common_setsockopt+0x14/0x20
      SyS_setsockopt+0x71/0xd0
      
      It turns out we don't validate that the num_counters field in the
      struct we pass in from userspace is initialized.
      
      The same problem also exists in ebtables, arptables, ipv6, and the
      compat variants.
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1086bbe9
  18. 18 5月, 2015 1 次提交
  19. 16 5月, 2015 1 次提交
    • P
      netfilter: x_tables: add context to know if extension runs from nft_compat · 55917a21
      Pablo Neira Ayuso 提交于
      Currently, we have four xtables extensions that cannot be used from the
      xt over nft compat layer. The problem is that they need real access to
      the full blown xt_entry to validate that the rule comes with the right
      dependencies. This check was introduced to overcome the lack of
      sufficient userspace dependency validation in iptables.
      
      To resolve this problem, this patch introduces a new field to the
      xt_tgchk_param structure that tell us if the extension is run from
      nft_compat context.
      
      The three affected extensions are:
      
      1) CLUSTERIP, this target has been superseded by xt_cluster. So just
         bail out by returning -EINVAL.
      
      2) TCPMSS. Relax the checking when used from nft_compat. If used with
         the wrong configuration, it will corrupt !syn packets by adding TCP
         MSS option.
      
      3) ebt_stp. Relax the check to make sure it uses the reserved
         destination MAC address for STP.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: NArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      55917a21
  20. 13 4月, 2015 2 次提交
    • P
      netfilter: nf_tables: switch registers to 32 bit addressing · 49499c3e
      Patrick McHardy 提交于
      Switch the nf_tables registers from 128 bit addressing to 32 bit
      addressing to support so called concatenations, where multiple values
      can be concatenated over multiple registers for O(1) exact matches of
      multiple dimensions using sets.
      
      The old register values are mapped to areas of 128 bits for compatibility.
      When dumping register numbers, values are expressed using the old values
      if they refer to the beginning of a 128 bit area for compatibility.
      
      To support concatenations, register loads of less than a full 32 bit
      value need to be padded. This mainly affects the payload and exthdr
      expressions, which both unconditionally zero the last word before
      copying the data.
      
      Userspace fully passes the testsuite using both old and new register
      addressing.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      49499c3e
    • P
      netfilter: nf_tables: get rid of NFT_REG_VERDICT usage · a55e22e9
      Patrick McHardy 提交于
      Replace the array of registers passed to expressions by a struct nft_regs,
      containing the verdict as a seperate member, which aliases to the
      NFT_REG_VERDICT register.
      
      This is needed to seperate the verdict from the data registers completely,
      so their size can be changed.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a55e22e9
  21. 09 4月, 2015 1 次提交
  22. 08 4月, 2015 1 次提交
  23. 05 4月, 2015 5 次提交
  24. 01 4月, 2015 2 次提交
  25. 25 3月, 2015 1 次提交
  26. 19 3月, 2015 1 次提交
    • P
      netfilter: restore rule tracing via nfnetlink_log · 4017a7ee
      Pablo Neira Ayuso 提交于
      Since fab4085f ("netfilter: log: nf_log_packet() as real unified
      interface"), the loginfo structure that is passed to nf_log_packet() is
      used to explicitly indicate the logger type you want to use.
      
      This is a problem for people tracing rules through nfnetlink_log since
      packets are always routed to the NF_LOG_TYPE logger after the
      aforementioned patch.
      
      We can fix this by removing the trace loginfo structures, but that still
      changes the log level from 4 to 5 for tracing messages and there may be
      someone relying on this outthere. So let's just introduce a new
      nf_log_trace() function that restores the former behaviour.
      Reported-by: NMarkus Kötter <koetter@rrzn.uni-hannover.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4017a7ee
  27. 18 3月, 2015 1 次提交