1. 01 7月, 2013 2 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4e144d3a
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      The following batch contains Netfilter/IPVS updates for net-next,
      they are:
      
      * Enforce policy to several nfnetlink subsystem, from Daniel
        Borkmann.
      
      * Use xt_socket to match the third packet (to perform simplistic
        socket-based stateful filtering), from Eric Dumazet.
      
      * Avoid large timeout for picked up from the middle TCP flows,
        from Florian Westphal.
      
      * Exclude IPVS from struct net if IPVS is disabled and removal
        of unnecessary included header file, from JunweiZhang.
      
      * Release SCTP connection immediately under load, to mimic current
        TCP behaviour, from Julian Anastasov.
      
      * Replace and enhance SCTP state machine, from Julian Anastasov.
      
      * Add tweak to reduce sync traffic in the presence of persistence,
        also from Julian Anastasov.
      
      * Add tweak for the IPVS SH scheduler not to reject connections
        directed to a server, choose a new one instead, from Alexander
        Frolkin.
      
      * Add support for sloppy TCP and SCTP modes, that creates state
        information on any packet, not only initial handshake packets,
        from Alexander Frolkin.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e144d3a
    • F
      netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flag · 496e4ae7
      Florian Westphal 提交于
      The common case is that TCP/IP checksums have already been
      verified, e.g. by hardware (rx checksum offload), or conntrack.
      
      Userspace can use this flag to determine when the checksum
      has not been validated yet.
      
      If the flag is set, this doesn't necessarily mean that the packet has
      an invalid checksum, e.g. if NIC doesn't support rx checksum.
      
      Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can
      infer that IP/TCP checksum has already been validated if either the
      SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED
      flag is unset.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      496e4ae7
  2. 30 6月, 2013 1 次提交
  3. 29 6月, 2013 13 次提交
  4. 28 6月, 2013 6 次提交
    • N
      bonding: when cloning a MAC use NET_ADDR_STOLEN · ae0d6750
      nikolay@redhat.com 提交于
      A simple semantic change, when a slave's MAC is cloned by the bond
      master then set addr_assign_type to NET_ADDR_STOLEN instead of
      NET_ADDR_SET. Also use bond_set_dev_addr() in BOND_FOM_ACTIVE mode
      to change the bond's MAC address because the assign_type has to be
      set properly.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0d6750
    • N
      bonding: remove unnecessary dev_addr_from_first member · 97a1e639
      nikolay@redhat.com 提交于
      In struct bonding there's a member called dev_addr_from_first which is
      used to denote when the bond dev should clone the first slave's MAC
      address but since we have netdev's addr_assign_type variable that is not
      necessary. We clone the first slave's MAC each time we have a random MAC
      set to the bond device. This has the nice side-effect of also fixing an
      inconsistency - when the MAC address of the bond dev is set after its
      creation, but prior to having slaves, it's not kept and the first slave's
      MAC is cloned. The only way to keep the MAC was to create the bond device
      with the MAC address set (e.g. through ip link). In all cases if the
      bond device is left without any slaves - its MAC gets reset to a random
      one as before.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a1e639
    • N
      bonding: remove unnecessary setup_by_slave member · 8d2ada77
      nikolay@redhat.com 提交于
      We have a member called setup_by_slave in struct bonding to denote if the
      bond dev has different type than ARPHRD_ETHER, but that is already denoted
      in bond's netdev type variable if it was setup by the slave, so use that
      instead of the member.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d2ada77
    • P
      netlink: fix splat in skb_clone with large messages · 3a36515f
      Pablo Neira 提交于
      Since (c05cdb1b netlink: allow large data transfers from user-space),
      netlink splats if it invokes skb_clone on large netlink skbs since:
      
      * skb_shared_info was not correctly initialized.
      * skb->destructor is not set in the cloned skb.
      
      This was spotted by trinity:
      
      [  894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
      [  894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0
      [...]
      [  894.991034] Call Trace:
      [  894.991034]  [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240
      [  894.991034]  [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40
      [  894.991034]  [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0
      [  894.991034]  [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0
      
      Fix it by:
      
      1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
         that sets our special skb->destructor in the cloned skb. Moreover, handle
         the release of the large cloned skb head area in the destructor path.
      
      2) not allowing large skbuffs in the netlink broadcast path. I cannot find
         any reasonable use of the large data transfer using netlink in that path,
         moreover this helps to skip extra skb_clone handling.
      
      I found two more netlink clients that are cloning the skbs, but they are
      not in the sendmsg path. Therefore, the sole client cloning that I found
      seems to be the fib frontend.
      
      Thanks to Eric Dumazet for helping to address this issue.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a36515f
    • N
      sit: add support of x-netns · 5e6700b3
      Nicolas Dichtel 提交于
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e6700b3
    • N
      dev: introduce skb_scrub_packet() · 621e84d6
      Nicolas Dichtel 提交于
      The goal of this new function is to perform all needed cleanup before sending
      an skb into another netns.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      621e84d6
  5. 27 6月, 2013 8 次提交
  6. 26 6月, 2013 10 次提交
    • J
      netns: exclude ipvs from struct net when IPVS disabled · 8b4d14d8
      JunweiZhang 提交于
      no real problem is fixed, just save a few bytes in
      net_namespace structure.
      Signed-off-by: NJunweiZhang <junwei.zhang@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      8b4d14d8
    • J
      kernel: remove unnecessary head file · d0667186
      JunweiZhang 提交于
      ip_vs.h is not necessary for sysctl_binary.c.
      
      prepare for the next patch to avoid compile issue.
      Signed-off-by: NJunweiZhang <junwei.zhang@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      d0667186
    • J
      ipvs: add sync_persist_mode flag · 4d0c875d
      Julian Anastasov 提交于
      Add sync_persist_mode flag to reduce sync traffic
      by syncing only persistent templates.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      4d0c875d
    • A
      ipvs: SH fallback and L4 hashing · eba3b5a7
      Alexander Frolkin 提交于
      By default the SH scheduler rejects connections that are hashed onto a
      realserver of weight 0.  This patch adds a flag to make SH choose a
      different realserver in this case, instead of rejecting the connection.
      
      The patch also adds a flag to make SH include the source port (TCP, UDP,
      SCTP) in the hash as well as the source address.  This basically allows
      for deterministic round-robin load balancing (i.e., where any director
      in a cluster of directors with identical config will send the same
      packet the same way).
      
      The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
      can be set per service.  They are set using a new option to ipvsadm.
      Signed-off-by: NAlexander Frolkin <avf@eldamar.org.uk>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      eba3b5a7
    • J
      ipvs: drop SCTP connections depending on state · acaac5d8
      Julian Anastasov 提交于
      Drop SCTP connections under load (dropentry context) depending
      on the protocol state, just like for TCP: INIT conns are
      dropped immediately, established are dropped randomly while
      connections in progress or shutdown are skipped.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      acaac5d8
    • J
      ipvs: replace the SCTP state machine · 61e7c420
      Julian Anastasov 提交于
      Convert the SCTP state table, so that it is more readable.
      Change the states to be according to the diagram in RFC 2960
      and add more states suitable for middle box. Still, such
      change in states adds incompatibility if systems in sync
      setup include this change and others do not include it.
      
      With this change we also have proper transitions in INPUT-ONLY
      mode (DR/TUN) where we see packets only from client. Now
      we should not switch to 10-second CLOSED state at a time
      when we should stay in ESTABLISHED state.
      
      The short names for states are because we have 16-char space
      in ipvsadm and 11-char limit for the connection list format.
      It is a sequence of the TCP implementation where the longest
      state name is ESTABLISHED.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      61e7c420
    • A
      ipvs: sloppy TCP and SCTP · c6c96c18
      Alexander Frolkin 提交于
      This adds support for sloppy TCP and SCTP modes to IPVS.
      
      When enabled (sysctls net.ipv4.vs.sloppy_tcp and
      net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
      packet, not just a TCP SYN (or SCTP INIT).
      
      This allows connections to fail over from one IPVS director to another
      mid-flight.
      Signed-off-by: NAlexander Frolkin <avf@eldamar.org.uk>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      c6c96c18
    • J
      ipvs: provide iph to schedulers · bba54de5
      Julian Anastasov 提交于
      Before now the schedulers needed access only to IP
      addresses and it was easy to get them from skb by
      using ip_vs_fill_iph_addr_only.
      
      New changes for the SH scheduler will need the protocol
      and ports which is difficult to get from skb for the
      IPv6 case. As we have all the data in the iph structure,
      to avoid the same slow lookups provide the iph to schedulers.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NHans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      bba54de5
    • A
      arc_emac: fix compile-time errors & warnings on PPC64 · a4a1139b
      Alexey Brodkin 提交于
      As reported by "kbuild test robot" there were some errors and warnings
      on attempt to build kernel with "make ARCH=powerpc allmodconfig".
      
      And this patch addresses both errors and warnings.
      Below is a list of introduced changes:
      1. Fix compile-time errors (misspellings in "dma_unmap_single") on PPC.
      2. Use DMA address instead of "skb->data" as a pointer to data buffer.
      This fixed warnings on pointer to int conversion on 64-bit systems.
      3. Re-implemented initial allocation of Rx buffers in "arc_emac_open" in
      the same way they're re-allocated during operation (receiving packets).
      So once again DMA address could be used instead of "skb->data".
      4. Explicitly use EMAC_BUFFER_SIZE for Rx buffers allocation.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      
      Cc: netdev@vger.kernel.org
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Mischa Jonker <mjonker@synopsys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: Florian Fainelli <florian@openwrt.org>
      Cc: David Laight <david.laight@aculab.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4a1139b
    • V
      bonding: add an option to fail when any of arp_ip_target is inaccessible · 8599b52e
      Veaceslav Falico 提交于
      Currently, we fail only when all of the ips in arp_ip_target are gone.
      However, in some situations we might need to fail if even one host from
      arp_ip_target becomes unavailable.
      
      All situations, obviously, rely on the idea that we need *completely*
      functional network, with all interfaces/addresses working correctly.
      
      One real world example might be:
      vlans on top on bond (hybrid port). If bond and vlans have ips assigned
      and we have their peers monitored via arp_ip_target - in case of switch
      misconfiguration (trunk/access port), slave driver malfunction or
      tagged/untagged traffic dropped on the way - we will be able to switch
      to another slave.
      
      Though any other configuration needs that if we need to have access to all
      arp_ip_targets.
      
      This patch adds this possibility by adding a new parameter -
      arp_all_targets (both as a module parameter and as a sysfs knob). It can be
      set to:
      
      	0 or any (the default) - which works exactly as it's working now -
      	the slave is up if any of the arp_ip_targets are up.
      
      	1 or all - the slave is up if all of the arp_ip_targets are up.
      
      This parameter can be changed on the fly (via sysfs), and requires the mode
      to be active-backup and arp_validate to be enabled (it obeys the
      arp_validate config on which slaves to validate).
      
      Internally it's done through:
      
      1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
         an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
         last time we've received arp from bond->params.arp_targets[i] on this
         slave.
      
      2) If we successfully validate an arp from bond->params.arp_targets[i] in
         bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
         current jiffies value.
      
      3) When getting slave's last_rx via slave_last_rx(), we return the oldest
         time when we've received an arp from any address in
         bond->params.arp_targets[].
      
      If the value of arp_all_targets == 0 - we still work the same way as
      before.
      
      Also, update the documentation to reflect the new parameter.
      
      v3->v4:
      Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
      more clear, don't fail setting arp_all_targets if arp_validate is not set -
      it has no effect anyway but can be easier to set up. Also, print a warning
      if the last arp_ip_target is removed while the arp_interval is on, but not
      the arp_validate.
      
      v2->v3:
      Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
      arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
      use the same initialization value for target_last_arp_rx[] as is used
      for the default last_arp_rx, to avoid useless interface flaps.
      
      Also, instead of failing to remove the last arp_ip_target just print a
      warning - otherwise it might break existing scripts.
      
      v1->v2:
      Correctly handle adding/removing hosts in arp_ip_target - we need to
      shift/initialize all slave's target_last_arp_rx. Also, don't fail module
      loading on arp_all_targets misconfiguration, just disable it, and some
      minor style fixes.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8599b52e