1. 20 12月, 2013 1 次提交
    • T
      net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc · 10239edf
      Terry Lam 提交于
      This patch implements the first size-based qdisc that attempts to
      differentiate between small flows and heavy-hitters.  The goal is to
      catch the heavy-hitters and move them to a separate queue with less
      priority so that bulk traffic does not affect the latency of critical
      traffic.  Currently "less priority" means less weight (2:1 in
      particular) in a Weighted Deficit Round Robin (WDRR) scheduler.
      
      In essence, this patch addresses the "delay-bloat" problem due to
      bloated buffers. In some systems, large queues may be necessary for
      obtaining CPU efficiency, or due to the presence of unresponsive
      traffic like UDP, or just a large number of connections with each
      having a small amount of outstanding traffic. In these circumstances,
      HHF aims to reduce the HoL blocking for latency sensitive traffic,
      while not impacting the queues built up by bulk traffic.  HHF can also
      be used in conjunction with other AQM mechanisms such as CoDel.
      
      To capture heavy-hitters, we implement the "multi-stage filter" design
      in the following paper:
      C. Estan and G. Varghese, "New Directions in Traffic Measurement and
      Accounting", in ACM SIGCOMM, 2002.
      
      Some configurable qdisc settings through 'tc':
      - hhf_reset_timeout: period to reset counter values in the multi-stage
                           filter (default 40ms)
      - hhf_admit_bytes:   threshold to classify heavy-hitters
                           (default 128KB)
      - hhf_evict_timeout: threshold to evict idle heavy-hitters
                           (default 1s)
      - hhf_non_hh_weight: Weighted Deficit Round Robin (WDRR) weight for
                           non-heavy-hitters (default 2)
      - hh_flows_limit:    max number of heavy-hitter flow entries
                           (default 2048)
      
      Note that the ratio between hhf_admit_bytes and hhf_reset_timeout
      reflects the bandwidth of heavy-hitters that we attempt to capture
      (25Mbps with the above default settings).
      
      The false negative rate (heavy-hitter flows getting away unclassified)
      is zero by the design of the multi-stage filter algorithm.
      With 100 heavy-hitter flows, using four hashes and 4000 counters yields
      a false positive rate (non-heavy-hitters mistakenly classified as
      heavy-hitters) of less than 1e-4.
      Signed-off-by: NTerry Lam <vtlam@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10239edf
  2. 19 12月, 2013 1 次提交
  3. 18 12月, 2013 7 次提交
  4. 14 12月, 2013 9 次提交
  5. 10 12月, 2013 2 次提交
    • T
      ALSA: compress: Fix 64bit ABI incompatibility · 6733cf57
      Takashi Iwai 提交于
      snd_pcm_uframes_t is defined as unsigned long so it would take
      different sizes depending on 32 or 64bit architectures.  As we don't
      want this ABI incompatibility, and there is no real 64bit user yet,
      let's make it the fixed size with __u32.
      
      Also bump the protocol version number to 0.1.2.
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      6733cf57
    • D
      packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa
      Daniel Borkmann 提交于
      This patch introduces a PACKET_QDISC_BYPASS socket option, that
      allows for using a similar xmit() function as in pktgen instead
      of taking the dev_queue_xmit() path. This can be very useful when
      PF_PACKET applications are required to be used in a similar
      scenario as pktgen, but with full, flexible packet payload that
      needs to be provided, for example.
      
      On default, nothing changes in behaviour for normal PF_PACKET
      TX users, so everything stays as is for applications. New users,
      however, can now set PACKET_QDISC_BYPASS if needed to prevent
      own packets from i) reentering packet_rcv() and ii) to directly
      push the frame to the driver.
      
      In doing so we can increase pps (here 64 byte packets) for
      PF_PACKET a bit:
      
        # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
        1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
        2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
        3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
        4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
        5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
        6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
        7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
        8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
        9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
       10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
       11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
       [...]
       20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081
      
       [**]: qdisc path with packet_rcv(), how probably most people
             seem to use it (hopefully not anymore if not needed)
      
      The test was done using a modified trafgen, sending a simple
      static 64 bytes packet, on all CPUs.  The trick in the fast
      "qdisc path" case, is to avoid reentering packet_rcv() by
      setting the RAW socket protocol to zero, like:
      socket(PF_PACKET, SOCK_RAW, 0);
      
      Tradeoffs are documented as well in this patch, clearly, if
      queues are busy, we will drop more packets, tc disciplines are
      ignored, and these packets are not visible to taps anymore. For
      a pktgen like scenario, we argue that this is acceptable.
      
      The pointer to the xmit function has been placed in packet
      socket structure hole between cached_dev and prot_hook that
      is hot anyway as we're working on cached_dev in each send path.
      
      Done in joint work together with Jesper Dangaard Brouer.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d346a3fa
  6. 09 12月, 2013 1 次提交
  7. 07 12月, 2013 5 次提交
    • J
      ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses · 53bd6749
      Jiri Pirko 提交于
      Creating an address with this flag set will result in kernel taking care
      of temporary addresses in the same way as if the address was created by
      kernel itself (after RA receive). This allows userspace applications
      implementing the autoconfiguration (NetworkManager for example) to
      implement ipv6 addresses privacy.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NThomas Haller <thaller@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53bd6749
    • J
      ipv6 addrconf: extend ifa_flags to u32 · 479840ff
      Jiri Pirko 提交于
      There is no more space in u8 ifa_flags. So do what davem suffested and
      add another netlink attr called IFA_FLAGS for carry more flags.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NThomas Haller <thaller@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      479840ff
    • E
      tcp: auto corking · f54b3111
      Eric Dumazet 提交于
      With the introduction of TCP Small Queues, TSO auto sizing, and TCP
      pacing, we can implement Automatic Corking in the kernel, to help
      applications doing small write()/sendmsg() to TCP sockets.
      
      Idea is to change tcp_push() to check if the current skb payload is
      under skb optimal size (a multiple of MSS bytes)
      
      If under 'size_goal', and at least one packet is still in Qdisc or
      NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
      will be delayed up to TX completion time.
      
      This delay might allow the application to coalesce more bytes
      in the skb in following write()/sendmsg()/sendfile() system calls.
      
      The exact duration of the delay is depending on the dynamics
      of the system, and might be zero if no packet for this flow
      is actually held in Qdisc or NIC TX ring.
      
      Using FQ/pacing is a way to increase the probability of
      autocorking being triggered.
      
      Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
      this feature and default it to 1 (enabled)
      
      Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
      This counter is incremented every time we detected skb was under used
      and its flush was deferred.
      
      Tested:
      
      Interesting effects when using line buffered commands under ssh.
      
      Excellent performance results in term of cpu usage and total throughput.
      
      lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
      lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
      9410.39
      
       Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      
            35209.439626 task-clock                #    2.901 CPUs utilized
                   2,294 context-switches          #    0.065 K/sec
                     101 CPU-migrations            #    0.003 K/sec
                   4,079 page-faults               #    0.116 K/sec
          97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
          51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
          25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
         102,225,978,536 instructions              #    1.04  insns per cycle
                                                   #    0.51  stalled cycles per insn [83.38%]
          18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
              91,679,646 branch-misses             #    0.49% of all branches         [83.40%]
      
            12.136204899 seconds time elapsed
      
      lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
      lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
      6624.89
      
       Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
            40045.864494 task-clock                #    3.301 CPUs utilized
                     171 context-switches          #    0.004 K/sec
                      53 CPU-migrations            #    0.001 K/sec
                   4,080 page-faults               #    0.102 K/sec
         111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
          61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
          29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
         108,654,349,355 instructions              #    0.98  insns per cycle
                                                   #    0.57  stalled cycles per insn [83.34%]
          19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
             157,875,417 branch-misses             #    0.81% of all branches         [83.34%]
      
            12.130267788 seconds time elapsed
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f54b3111
    • J
      netfilter: Fix FSF address in file headers · e664eabd
      Jeff Kirsher 提交于
      Several files refer to an old address for the Free Software Foundation
      in the file header comment.  Resolve by replacing the address with
      the URL <http://www.gnu.org/licenses/> so that we do not have to keep
      updating the header comments anytime the address changes.
      
      CC: netfilter@vger.kernel.org
      CC: Pablo Neira Ayuso <pablo@netfilter.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e664eabd
    • J
      sctp: Fix FSF address in file headers · 4b2f13a2
      Jeff Kirsher 提交于
      Several files refer to an old address for the Free Software Foundation
      in the file header comment.  Resolve by replacing the address with
      the URL <http://www.gnu.org/licenses/> so that we do not have to keep
      updating the header comments anytime the address changes.
      
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b2f13a2
  8. 06 12月, 2013 1 次提交
  9. 03 12月, 2013 1 次提交
  10. 02 12月, 2013 1 次提交
  11. 01 12月, 2013 1 次提交
  12. 29 11月, 2013 2 次提交
    • J
      genetlink/pmcraid: use proper genetlink multicast API · 5e53e689
      Johannes Berg 提交于
      The pmcraid driver is abusing the genetlink API and is using its
      family ID as the multicast group ID, which is invalid and may
      belong to somebody else (and likely will.)
      
      Make it use the correct API, but since this may already be used
      as-is by userspace, reserve a family ID for this code and also
      reserve that group ID to not break userspace assumptions.
      
      My previous patch broke event delivery in the driver as I missed
      that it wasn't using the right API and forgot to update it later
      in my series.
      
      While changing this, I noticed that the genetlink code could use
      the static group ID instead of a strcmp(), so also do that for
      the VFS_DQUOT family.
      
      Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e53e689
    • N
      diag: warn about missing first netlink attribute · 31e20bad
      Nicolas Dichtel 提交于
      The first netlink attribute (value 0) must always be defined as none/unspec.
      This is correctly done in inet_diag.h, but other diag interfaces are wrong.
      
      Because we cannot change an existing API, I add a comment to point the mistake
      and avoid to propagate it in a new diag API in the future.
      
      CC: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31e20bad
  13. 28 11月, 2013 2 次提交
  14. 27 11月, 2013 1 次提交
  15. 26 11月, 2013 2 次提交
    • L
      cfg80211: consolidate passive-scan and no-ibss flags · 8fe02e16
      Luis R. Rodriguez 提交于
      These two flags are used for the same purpose, just
      combine them into a no-ir flag to annotate no initiating
      radiation is allowed.
      
      Old userspace sending either flag will have it treated as
      the no-ir flag. To be considerate to older userspace we
      also send both the no-ir flag and the old no-ibss flags.
      Newer userspace will have to be aware of older kernels.
      
      Update all places in the tree using these flags with the
      following semantic patch:
      
      @@
      @@
      -NL80211_RRF_PASSIVE_SCAN
      +NL80211_RRF_NO_IR
      @@
      @@
      -NL80211_RRF_NO_IBSS
      +NL80211_RRF_NO_IR
      @@
      @@
      -IEEE80211_CHAN_PASSIVE_SCAN
      +IEEE80211_CHAN_NO_IR
      @@
      @@
      -IEEE80211_CHAN_NO_IBSS
      +IEEE80211_CHAN_NO_IR
      @@
      @@
      -NL80211_RRF_NO_IR | NL80211_RRF_NO_IR
      +NL80211_RRF_NO_IR
      @@
      @@
      -IEEE80211_CHAN_NO_IR | IEEE80211_CHAN_NO_IR
      +IEEE80211_CHAN_NO_IR
      @@
      @@
      -(NL80211_RRF_NO_IR)
      +NL80211_RRF_NO_IR
      @@
      @@
      -(IEEE80211_CHAN_NO_IR)
      +IEEE80211_CHAN_NO_IR
      
      Along with some hand-optimisations in documentation, to
      remove duplicates and to fix some indentation.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@do-not-panic.com>
      [do all the driver updates in one go]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      8fe02e16
    • A
      nl80211: better document NL80211_CMD_TDLS_MGMT · c17bff87
      Arik Nemtsov 提交于
      This command has different semantics depending on the action code sent.
      Document this fact and detail the supported action codes.
      Signed-off-by: NArik Nemtsov <arik@wizery.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      c17bff87
  16. 20 11月, 2013 2 次提交
  17. 19 11月, 2013 1 次提交
    • A
      UAPI: include <asm/byteorder.h> in linux/raid/md_p.h · c0f8bd14
      Aurelien Jarno 提交于
      linux/raid/md_p.h is using conditionals depending on endianess and fails
      with an error if neither of __BIG_ENDIAN, __LITTLE_ENDIAN or
      __BYTE_ORDER are defined, but it doesn't include any header which can
      define these constants. This make this header unusable alone.
      
      This patch adds a #include <asm/byteorder.h> at the beginning of this
      header to make it usable alone. This is needed to compile klibc on MIPS.
      Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c0f8bd14