1. 16 5月, 2012 4 次提交
  2. 15 5月, 2012 2 次提交
  3. 13 5月, 2012 16 次提交
    • J
      etherdevice: Remove now unused compare_ether_addr_64bits · e550ba1a
      Joe Perches 提交于
      Move and invert the logic from the otherwise unused
      compare_ether_addr_64bits to ether_addr_equal_64bits.
      
      Neaten the logic in is_etherdev_addr.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e550ba1a
    • E
      fq_codel: Fair Queue Codel AQM · 4b549a2e
      Eric Dumazet 提交于
      Fair Queue Codel packet scheduler
      
      Principles :
      
      - Packets are classified (internal classifier or external) on flows.
      - This is a Stochastic model (as we use a hash, several flows might
                                    be hashed on same slot)
      - Each flow has a CoDel managed queue.
      - Flows are linked onto two (Round Robin) lists,
        so that new flows have priority on old ones.
      
      - For a given flow, packets are not reordered (CoDel uses a FIFO)
      - head drops only.
      - ECN capability is on by default.
      - Very low memory footprint (64 bytes per flow)
      
      tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                            [ target TIME ] [ interval TIME ] [ noecn ]
                            [ quantum BYTES ]
      
      defaults : 1024 flows, 10240 packets limit, quantum : device MTU
                 target : 5ms (CoDel default)
                 interval : 100ms (CoDel default)
      
      Impressive results on load :
      
      class htb 1:1 root leaf 10: prio 0 quantum 1514 rate 200000Kbit ceil 200000Kbit burst 1475b/8 mpu 0b overhead 0b cburst 1475b/8 mpu 0b overhead 0b level 0
       Sent 43304920109 bytes 33063109 pkt (dropped 0, overlimits 0 requeues 0)
       rate 201691Kbit 28595pps backlog 0b 312p requeues 0
       lended: 33063109 borrowed: 0 giants: 0
       tokens: -912 ctokens: -912
      
      class fq_codel 10:1735 parent 10:
       (dropped 1292, overlimits 0 requeues 0)
       backlog 15140b 10p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 7.1ms
      class fq_codel 10:4524 parent 10:
       (dropped 1291, overlimits 0 requeues 0)
       backlog 16654b 11p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 7.1ms
      class fq_codel 10:4e74 parent 10:
       (dropped 1290, overlimits 0 requeues 0)
       backlog 6056b 4p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 6.4ms dropping drop_next 92.0ms
      class fq_codel 10:628a parent 10:
       (dropped 1289, overlimits 0 requeues 0)
       backlog 7570b 5p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 5.4ms dropping drop_next 90.9ms
      class fq_codel 10:a4b3 parent 10:
       (dropped 302, overlimits 0 requeues 0)
       backlog 16654b 11p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 7.1ms
      class fq_codel 10:c3c2 parent 10:
       (dropped 1284, overlimits 0 requeues 0)
       backlog 13626b 9p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 5.9ms
      class fq_codel 10:d331 parent 10:
       (dropped 299, overlimits 0 requeues 0)
       backlog 15140b 10p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 7.0ms
      class fq_codel 10:d526 parent 10:
       (dropped 12160, overlimits 0 requeues 0)
       backlog 35870b 211p requeues 0
        deficit 1508 count 12160 lastcount 1 ldelay 15.3ms dropping drop_next 247us
      class fq_codel 10:e2c6 parent 10:
       (dropped 1288, overlimits 0 requeues 0)
       backlog 15140b 10p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 7.1ms
      class fq_codel 10:eab5 parent 10:
       (dropped 1285, overlimits 0 requeues 0)
       backlog 16654b 11p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 5.9ms
      class fq_codel 10:f220 parent 10:
       (dropped 1289, overlimits 0 requeues 0)
       backlog 15140b 10p requeues 0
        deficit 1514 count 1 lastcount 1 ldelay 7.1ms
      
      qdisc htb 1: root refcnt 6 r2q 10 default 1 direct_packets_stat 0 ver 3.17
       Sent 43331086547 bytes 33092812 pkt (dropped 0, overlimits 66063544 requeues 71)
       rate 201697Kbit 28602pps backlog 0b 260p requeues 71
      qdisc fq_codel 10: parent 1:1 limit 10240p flows 65536 target 5.0ms interval 100.0ms ecn
       Sent 43331086547 bytes 33092812 pkt (dropped 949359, overlimits 0 requeues 0)
       rate 201697Kbit 28602pps backlog 189352b 260p requeues 0
        maxpacket 1514 drop_overlimit 0 new_flow_count 5582 ecn_mark 125593
        new_flows_len 0 old_flows_len 11
      
      PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data.
      64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=0.227 ms
      64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=0.165 ms
      64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=0.166 ms
      64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=0.151 ms
      64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=0.164 ms
      64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=0.172 ms
      64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=0.175 ms
      64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=0.183 ms
      64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=0.158 ms
      64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=0.200 ms
      
      10 packets transmitted, 10 received, 0% packet loss, time 8999ms
      rtt min/avg/max/mdev = 0.151/0.176/0.227/0.022 ms
      
      Much better than SFQ because of priority given to new flows, and fast
      path dirtying less cache lines.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b549a2e
    • E
      codel: use Newton method instead of sqrt() and divides · 536edd67
      Eric Dumazet 提交于
      As Van pointed out, interval/sqrt(count) can be implemented using
      multiplies only.
      
      http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
      
      This patch implements the Newton method and reciprocal divide.
      
      Total cost is 15 cycles instead of 120 on my Corei5 machine (64bit
      kernel).
      
      There is a small 'error' for count values < 5, but we don't really care.
      
      I reuse a hole in struct codel_vars :
       - pack the dropping boolean into one bit
       - use 31bit to store the reciprocal value of sqrt(count).
      Suggested-by: NVan Jacobson <van@pollere.net>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Dave Taht <dave.taht@bufferbloat.net>
      Cc: Kathleen Nichols <nichols@pollere.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      536edd67
    • L
      usb/net: rndis: move bus message definition · d5543206
      Linus Walleij 提交于
      This moves the bus message definition to land together with the
      other message types. This message is not used in the kernel but
      I'm keeping it anyway.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5543206
    • L
      usb/net: rndis: fixup a few name prefixes · e20289ed
      Linus Walleij 提交于
      This switches a horde of NDIS_*-prefixed variables to the RNDIS_*
      prefix. Most of them aren't used much and causes no changes.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e20289ed
    • L
      usb/net: rndis: merge command codes · 51491167
      Linus Walleij 提交于
      Switch the hyperv filter and rndis gadget driver to use the same command
      enumerators as the other drivers and delete the surplus command codes.
      Reviewed-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51491167
    • L
      usb/net: rndis: move and namespace PnP defines · c80174f3
      Linus Walleij 提交于
      This moves the PnP OID definitions to the RNDIS_* namespace
      and puts them in the next falling slot in the list. Oh, the comment
      above the PnP defines was referring to some obsolete or out-of-tree
      driver so removed it, and removed my own comments telling where each
      header segment came from as well, we have moved everything around by
      this point anyway.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c80174f3
    • L
      usb/net: rndis: delete duplicate packet types · b1019432
      Linus Walleij 提交于
      The NDIS_*-prefixed packet types have equivalent RNDIS_*-
      prefixed types, besides nothing in the kernel use these defines.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1019432
    • L
      usb/net: rndis: merge media type definitions · 17c51b6c
      Linus Walleij 提交于
      Let's have a unified table of RNDIS media. We used to have a similar
      table with NDIS_* prefix from the gadget driver, but since we're only
      using RNDIS in the kernel (IIRC NDIS, non-remote, is for the windows-
      internal network drivers so what do we care) let's prefix everything
      with RNDIS. Some of the definitions were conflicting, in one of the
      defines 0x0B is bearer "CO WAN" and in two others "BPC". Well I took
      the majority vote. Two definition of medium 0x09 calls it "wireless
      WAN" but one vote for "wireless LAN" but in this case I am sticking
      with the minority, "Wide Area Network" does not make much sense in
      this case as far as I can tell.
      
      NOTE: latin singular and plural is so screwed up in these defines
      that it makes my eyes bleed. But I will not attempt to submit a
      patch converting all use of _MEDIA_ to _MEDIUM_ while I can probably
      tell from the semantics of the code that RNDIS_MEDIA_STATE_CONNECTED
      is most probably (erroneously) referring to a singular, unless it
      can return an array of connected media. I suspect these erroneous
      plurals are used in documentation and such so I don't want to
      mess around with things for no functional change.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17c51b6c
    • L
      usb/net: rndis: group all status codes together · 91d6aef7
      Linus Walleij 提交于
      Move all RNDIS status codes so they appear in rising order and
      in one place of the header file.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91d6aef7
    • L
      usb/net: rndis: delete surplus defines · c3ef5eae
      Linus Walleij 提交于
      These defines are not used in the kernel, and they have duplicate
      definitions under the RNDIS_* prefix.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3ef5eae
    • L
      usb/net: rndis: merge duplicate 802_* OIDs · 4cc6c4d5
      Linus Walleij 提交于
      The 802_* network OIDs were duplicated, so let's merge them and
      use the RNDIS_* prefixed definitions from the hyperV driver.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cc6c4d5
    • L
      usb/net: rndis: eliminate first set of duplicate OIDs · 8cdddc3f
      Linus Walleij 提交于
      The RNDIS protocol contains a vast number of Object ID:s (OIDs).
      The current definitions had multiple definitions of these ID:s,
      let's use the nicely RNDIS_*-prefixed defines from the HyperV
      implementation, rename everywhere they're used, and copy+rename
      the few that were missing from this list of objects.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cdddc3f
    • L
      usb/net: rndis: remove ambigous status codes · 007e5c8e
      Linus Walleij 提交于
      The RNDIS status codes are redefined with much stranged ifdeffery
      and only one of these codes was used in the hyperv driver, and
      there it is very clearly referring to the RNDIS variant, not some
      other status. So clarify this by explictly using the RNDIS_*
      prefixed status code in the hyperv drivera and delete the
      duplicate defines.
      Reviewed-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      007e5c8e
    • L
      usb/net: rndis: break out <linux/rndis.h> defines · 7591157e
      Linus Walleij 提交于
      As a first step to consolidate the RNDIS implementations, break out
      a common file with all the #defines and move it to <linux/rndis.h>.
      
      This also deletes the immediate duplicated defines in the
      <linux/rndis.h> file that yields a lot of compilation warnings.
      Reviewed-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7591157e
    • L
      usb/net: rndis: inline the cpu_to_le32() macro · 7390e8b0
      Linus Walleij 提交于
      The header file <linux/usb/rndis_host.h> used a number of #defines
      that included the cpu_to_le32() macro to assure the result will be
      in LE endianness. Inlining this into the code instead of using it
      in the code definitions yields consolidation opportunities later
      on as you will see in the following patches. The individual
      drivers also used local defines - all are switched over to the
      pattern of doing the conversion at the call sites instead.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7390e8b0
  4. 11 5月, 2012 3 次提交
    • E
      codel: Controlled Delay AQM · 76e3cc12
      Eric Dumazet 提交于
      An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
      
      http://queue.acm.org/detail.cfm?id=2209336
      
      This AQM main input is no longer queue size in bytes or packets, but the
      delay packets stay in (FIFO) queue.
      
      As we don't have infinite memory, we still can drop packets in enqueue()
      in case of massive load, but mean of CoDel is to drop packets in
      dequeue(), using a control law based on two simple parameters :
      
      target : target sojourn time (default 5ms)
      interval : width of moving time window (default 100ms)
      
      Based on initial work from Dave Taht.
      
      Refactored to help future codel inclusion as a plugin for other linux
      qdisc (FQ_CODEL, ...), like RED.
      
      include/net/codel.h contains codel algorithm as close as possible than
      Kathleen reference.
      
      net/sched/sch_codel.c contains the linux qdisc specific glue.
      
      Separate structures permit a memory efficient implementation of fq_codel
      (to be sent as a separate work) : Each flow has its own struct
      codel_vars.
      
      timestamps are taken at enqueue() time with 1024 ns precision, allowing
      a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
      usec as base unit.
      
      Selected packets are dropped, unless ECN is enabled and packets can get
      ECN mark instead.
      
      Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
      tg3 drivers (BQL enabled).
      
      Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                                [ interval TIME ] [ ecn ]
      
      qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
       Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
       rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
        count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
        maxpacket 1514 ecn_mark 84399 drop_overlimit 0
      
      CoDel must be seen as a base module, and should be used keeping in mind
      there is still a FIFO queue. So a typical setup will probably need a
      hierarchy of several qdiscs and packet classifiers to be able to meet
      whatever constraints a user might have.
      
      One possible example would be to use fq_codel, which combines Fair
      Queueing and CoDel, in replacement of sfq / sfq_red.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDave Taht <dave.taht@bufferbloat.net>
      Cc: Kathleen Nichols <nichols@pollere.com>
      Cc: Van Jacobson <van@pollere.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e3cc12
    • J
      etherdevice.h: Add ether_addr_equal_64bits · baf523c9
      Joe Perches 提交于
      Add an optimized boolean function to check if
      2 ethernet addresses are the same.
      
      This is to avoid any confusion about compare_ether_addr_64bits
      returning an unsigned, and not being able to use the
      compare_ether_addr_64bits function for sorting ala memcmp.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baf523c9
    • P
      tcp: Move rcvq sending to tcp_input.c · 292e8d8c
      Pavel Emelyanov 提交于
      It actually works on the input queue and will use its read mem
      routines, thus it's better to have in in the tcp_input.c file.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      292e8d8c
  5. 10 5月, 2012 2 次提交
  6. 09 5月, 2012 13 次提交
    • F
      netfilter: hashlimit: byte-based limit mode · 0197dee7
      Florian Westphal 提交于
      can be used e.g. for ingress traffic policing or
      to detect when a host/port consumes more bandwidth than expected.
      
      This is done by optionally making cost to mean
      "cost per 16-byte-chunk-of-data" instead of "cost per packet".
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0197dee7
    • H
      netfilter: add xt_hmark target for hash-based skb marking · cf308a1f
      Hans Schillstrom 提交于
      The target allows you to create rules in the "raw" and "mangle" tables
      which set the skbuff mark by means of hash calculation within a given
      range. The nfmark can influence the routing method (see "Use netfilter
      MARK value as routing key") and can also be used by other subsystems to
      change their behaviour.
      
      [ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]
      Signed-off-by: NHans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cf308a1f
    • H
      netfilter: ip6_tables: add flags parameter to ipv6_find_hdr() · 84018f55
      Hans Schillstrom 提交于
      This patch adds the flags parameter to ipv6_find_hdr. This flags
      allows us to:
      
      * know if this is a fragment.
      * stop at the AH header, so the information contained in that header
        can be used for some specific packet handling.
      
      This patch also adds the offset parameter for inspection of one
      inner IPv6 header that is contained in error messages.
      Signed-off-by: NHans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      84018f55
    • A
      {nl,cfg,mac}80211: Allow user to see/configure HT protection mode · 70c33eaa
      Ashok Nagarajan 提交于
      This patch introduces a new mesh configuration parameter "ht_opmode" and will
      allow user to check the current HT protection mode selected. Users could
      configure the protection mode by the command "iw mesh_iface set mesh_param
      mesh_ht_protection_mode=2". The default protection mode of mesh is set to
      non-HT mixed mode.
      Signed-off-by: NAshok Nagarajan <ashok@cozybit.com>
      Reviewed-by: NThomas Pedersen <thomas@cozybit.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      70c33eaa
    • B
      mac80211: Framework to get wifi-driver stats via ethtool. · e352114f
      Ben Greear 提交于
      This adds hooks to call into the driver to get additional
      stats for the ethtool API.
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      e352114f
    • B
      d6199218
    • P
      netfilter: remove ip_queue support · d16cf20e
      Pablo Neira Ayuso 提交于
      This patch removes ip_queue support which was marked as obsolete
      years ago. The nfnetlink_queue modules provides more advanced
      user-space packet queueing mechanism.
      
      This patch also removes capability code included in SELinux that
      refers to ip_queue. Otherwise, we break compilation.
      
      Several warning has been sent regarding this to the mailing list
      in the past month without anyone rising the hand to stop this
      with some strong argument.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d16cf20e
    • P
      netfilter: nf_conntrack: fix explicit helper attachment and NAT · 6714cf54
      Pablo Neira Ayuso 提交于
      Explicit helper attachment via the CT target is broken with NAT
      if non-standard ports are used. This problem was hidden behind
      the automatic helper assignment routine. Thus, it becomes more
      noticeable now that we can disable the automatic helper assignment
      with Eric Leblond's:
      
      9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment
      
      Basically, nf_conntrack_alter_reply asks for looking up the helper
      up if NAT is enabled. Unfortunately, we don't have the conntrack
      template at that point anymore.
      
      Since we don't want to rely on the automatic helper assignment,
      we can skip the second look-up and stick to the helper that was
      attached by iptables. With the CT target, the user is in full
      control of helper attachment, thus, the policy is to trust what
      the user explicitly configures via iptables (no automatic magic
      anymore).
      
      Interestingly, this bug was hidden by the automatic helper look-up
      code. But it can be easily trigger if you attach the helper in
      a non-standard port, eg.
      
      iptables -I PREROUTING -t raw -p tcp --dport 8888 \
      	-j CT --helper ftp
      
      And you disabled the automatic helper assignment.
      
      I added the IPS_HELPER_BIT that allows us to differenciate between
      a helper that has been explicitly attached and those that have been
      automatically assigned. I didn't come up with a better solution
      (having backward compatibility in mind).
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6714cf54
    • P
      ipvs: add support for sync threads · f73181c8
      Pablo Neira Ayuso 提交于
      	Allow master and backup servers to use many threads
      for sync traffic. Add sysctl var "sync_ports" to define the
      number of threads. Every thread will use single UDP port,
      thread 0 will use the default port 8848 while last thread
      will use port 8848+sync_ports-1.
      
      	The sync traffic for connections is scheduled to many
      master threads based on the cp address but one connection is
      always assigned to same thread to avoid reordering of the
      sync messages.
      
      	Remove ip_vs_sync_switch_mode because this check
      for sync mode change is still risky. Instead, check for mode
      change under sync_buff_lock.
      
      	Make sure the backup socks do not block on reading.
      
      Special thanks to Aleksey Chudov for helping in all tests.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      f73181c8
    • J
      ipvs: reduce sync rate with time thresholds · 749c42b6
      Julian Anastasov 提交于
      	Add two new sysctl vars to control the sync rate with the
      main idea to reduce the rate for connection templates because
      currently it depends on the packet rate for controlled connections.
      This mechanism should be useful also for normal connections
      with high traffic.
      
      sync_refresh_period: in seconds, difference in reported connection
      	timer that triggers new sync message. It can be used to
      	avoid sync messages for the specified period (or half of
      	the connection timeout if it is lower) if connection state
      	is not changed from last sync.
      
      sync_retries: integer, 0..3, defines sync retries with period of
      	sync_refresh_period/8. Useful to protect against loss of
      	sync messages.
      
      	Allow sysctl_sync_threshold to be used with
      sysctl_sync_period=0, so that only single sync message is sent
      if sync_refresh_period is also 0.
      
      	Add new field "sync_endtime" in connection structure to
      hold the reported time when connection expires. The 2 lowest
      bits will represent the retry count.
      
      	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
      avoid division by zero.
      
      	Special thanks to Aleksey Chudov for being patient with me,
      for his extensive reports and helping in all tests.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      749c42b6
    • P
      ipvs: wakeup master thread · 1c003b15
      Pablo Neira Ayuso 提交于
      	High rate of sync messages in master can lead to
      overflowing the socket buffer and dropping the messages.
      Fixed sleep of 1 second without wakeup events is not suitable
      for loaded masters,
      
      	Use delayed_work to schedule sending for queued messages
      and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
      reduce the rate of wakeups but to avoid sending long bursts we
      wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
      
      	Add hard limit for the queued messages before sending
      by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
      the memory pages but actually represents number of messages.
      It will protect us from allocating large parts of memory
      when the sending rate is lower than the queuing rate.
      
      	As suggested by Pablo, add new sysctl var
      "sync_sock_size" to configure the SNDBUF (master) or
      RCVBUF (slave) socket limit. Default value is 0 (preserve
      system defaults).
      
      	Change the master thread to detect and block on
      SNDBUF overflow, so that we do not drop messages when
      the socket limit is low but the sync_qlen_max limit is
      not reached. On ENOBUFS or other errors just drop the
      messages.
      
      	Change master thread to enter TASK_INTERRUPTIBLE
      state early, so that we do not miss wakeups due to messages or
      kthread_should_stop event.
      
      Thanks to Pablo Neira Ayuso for his valuable feedback!
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1c003b15
    • J
      ipvs: always update some of the flags bits in backup · cdcc5e90
      Julian Anastasov 提交于
      	As the goal is to mirror the inactconns/activeconns
      counters in the backup server, make sure the cp->flags are
      updated even if cp is still not bound to dest. If cp->flags
      are not updated ip_vs_bind_dest will rely only on the initial
      flags when updating the counters. To avoid mistakes and
      complicated checks for protocol state rely only on the
      IP_VS_CONN_F_INACTIVE bit when updating the counters.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      cdcc5e90
    • E
      netfilter: nf_conntrack: use this_cpu_inc() · ac3a546a
      Eric Dumazet 提交于
      this_cpu_inc() is IRQ safe and faster than
      local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Tejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ac3a546a