1. 14 5月, 2015 4 次提交
  2. 13 5月, 2015 1 次提交
    • D
      net_sched: gred: add TCA_GRED_LIMIT attribute · a3eb95f8
      David Ward 提交于
      In a GRED qdisc, if the default "virtual queue" (VQ) does not have drop
      parameters configured, then packets for the default VQ are not subjected
      to RED and are only dropped if the queue is larger than the net_device's
      tx_queue_len. This behavior is useful for WRED mode, since these packets
      will still influence the calculated average queue length and (therefore)
      the drop probability for all of the other VQs. However, for some drivers
      tx_queue_len is zero. In other cases the user may wish to make the limit
      the same for all VQs (including the default VQ with no drop parameters).
      
      This change adds a TCA_GRED_LIMIT attribute to set the GRED queue limit,
      in bytes, during qdisc setup. (This limit is in bytes to be consistent
      with the drop parameters.) The default limit is the same as for a bfifo
      queue (tx_queue_len * psched_mtu). If the drop parameters of any VQ are
      configured with a smaller limit than the GRED queue limit, that VQ will
      still observe the smaller limit instead.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3eb95f8
  3. 11 5月, 2015 2 次提交
    • A
      bonding: add netlink support for sys prio, actor sys mac, and port key · 171a42c3
      Andy Gospodarek 提交于
      Adds netlink support for the following bonding options:
      * BOND_OPT_AD_ACTOR_SYS_PRIO
      * BOND_OPT_AD_ACTOR_SYSTEM
      * BOND_OPT_AD_USER_PORT_KEY
      
      When setting the actor system mac address we assume the netlink message
      contains a binary mac and not a string representation of a mac.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      [jt: completed the setting side of the netlink attributes]
      Signed-off-by: NJonathan Toppins <jtoppins@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      171a42c3
    • E
      codel: add ce_threshold attribute · 80ba92fa
      Eric Dumazet 提交于
      For DCTCP or similar ECN based deployments on fabrics with shallow
      buffers, hosts are responsible for a good part of the buffering.
      
      This patch adds an optional ce_threshold to codel & fq_codel qdiscs,
      so that DCTCP can have feedback from queuing in the host.
      
      A DCTCP enabled egress port simply have a queue occupancy threshold
      above which ECT packets get CE mark.
      
      In codel language this translates to a sojourn time, so that one doesn't
      have to worry about bytes or bandwidth but delays.
      
      This makes the host an active participant in the health of the whole
      network.
      
      This also helps experimenting DCTCP in a setup without DCTCP compliant
      fabric.
      
      On following example, ce_threshold is set to 1ms, and we can see from
      'ldelay xxx us' that TCP is not trying to go around the 5ms codel
      target.
      
      Queue has more capacity to absorb inelastic bursts (say from UDP
      traffic), as queues are maintained to an optimal level.
      
      lpaa23:~# ./tc -s -d qd sh dev eth1
      qdisc mq 1: dev eth1 root
       Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
       backlog 3108242b 364p requeues 42961
      qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
       rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
        count 0 lastcount 0 ldelay 1.0ms drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
      qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
       rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
        count 0 lastcount 0 ldelay 694us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
      qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
       rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
        count 0 lastcount 0 ldelay 889us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
      ...
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ba92fa
  4. 10 5月, 2015 3 次提交
  5. 06 5月, 2015 4 次提交
  6. 05 5月, 2015 1 次提交
  7. 03 5月, 2015 1 次提交
  8. 02 5月, 2015 1 次提交
  9. 30 4月, 2015 4 次提交
  10. 24 4月, 2015 1 次提交
  11. 22 4月, 2015 2 次提交
  12. 21 4月, 2015 1 次提交
    • M
      KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. · e928e9cb
      Michael Ellerman 提交于
      Some PowerNV systems include a hardware random-number generator.
      This HWRNG is present on POWER7+ and POWER8 chips and is capable of
      generating one 64-bit random number every microsecond.  The random
      numbers are produced by sampling a set of 64 unstable high-frequency
      oscillators and are almost completely entropic.
      
      PAPR defines an H_RANDOM hypercall which guests can use to obtain one
      64-bit random sample from the HWRNG.  This adds a real-mode
      implementation of the H_RANDOM hypercall.  This hypercall was
      implemented in real mode because the latency of reading the HWRNG is
      generally small compared to the latency of a guest exit and entry for
      all the threads in the same virtual core.
      
      Userspace can detect the presence of the HWRNG and the H_RANDOM
      implementation by querying the KVM_CAP_PPC_HWRNG capability.  The
      H_RANDOM hypercall implementation will only be invoked when the guest
      does an H_RANDOM hypercall if userspace first enables the in-kernel
      H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e928e9cb
  13. 20 4月, 2015 2 次提交
    • A
      target: Version 2 of TCMU ABI · 0ad46af8
      Andy Grover 提交于
      The initial version of TCMU (in 3.18) does not properly handle
      bidirectional SCSI commands -- those with both an in and out buffer. In
      looking to fix this it also became clear that TCMU's support for adding
      new types of entries (opcodes) to the command ring was broken. We need
      to fix this now, so that future issues can be handled properly by adding
      new opcodes.
      
      We make the most of this ABI break by enabling bidi cmd handling within
      TCMP_OP_CMD opcode. Add an iov_bidi_cnt field to tcmu_cmd_entry.req.
      This enables TCMU to describe bidi commands, but further kernel work is
      needed for full bidi support.
      
      Enlarge tcmu_cmd_entry_hdr by 32 bits by pulling in cmd_id and __pad1. Turn
      __pad1 into two 8 bit flags fields, for kernel-set and userspace-set flags,
      "kflags" and "uflags" respectively.
      
      Update version fields so userspace can tell the interface is changed.
      
      Update tcmu-design.txt with details of how new stuff works:
      - Specify an additional requirement for userspace to set UNKNOWN_OP
        (bit 0) in hdr.uflags for unknown/unhandled opcodes.
      - Define how Data-In and Data-Out fields are described in req.iov[]
      
      Changed in v2:
      - Change name of SKIPPED bit to UNKNOWN bit
      - PAD op does not set the bit any more
      - Change len_op helper functions to take just len_op, not the whole struct
      - Change version to 2 in missed spots, and use defines
      - Add 16 unused bytes to cmd_entry.req, in case additional SAM cmd
        parameters need to be included
      - Add iov_dif_cnt field to specify buffers used for DIF info in iov[]
      - Rearrange fields to naturally align cdb_off
      - Handle if userspace sets UNKNOWN_OP by indicating failure of the cmd
      - Wrap some overly long UPDATE_HEAD lines
      
      (Add missing req.iov_bidi_cnt + req.iov_dif_cnt zeroing - Ilias)
      Signed-off-by: NAndy Grover <agrover@redhat.com>
      Reviewed-by: NIlias Tsitsimpis <iliastsi@arrikto.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      0ad46af8
    • P
      media-bus: Fixup RGB444_1X12, RGB565_1X16, and YUV8_1X24 media bus format · cec32a47
      Philipp Zabel 提交于
      Change the constant values for RGB444_1X12, RGB565_1X16, and YUV8_1X24 media
      bus formats in anticipation of a merge conflict with the media tree, where
      the old values are already taken by RBG888_1X24, RGB888_1X32_PADHI, and
      VUY8_1X24, respectively.
      Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      cec32a47
  14. 17 4月, 2015 1 次提交
    • A
      bpf: fix bpf helpers to use skb->mac_header relative offsets · a166151c
      Alexei Starovoitov 提交于
      For the short-term solution, lets fix bpf helper functions to use
      skb->mac_header relative offsets instead of skb->data in order to
      get the same eBPF programs with cls_bpf and act_bpf work on ingress
      and egress qdisc path. We need to ensure that mac_header is set
      before calling into programs. This is effectively the first option
      from below referenced discussion.
      
      More long term solution for LD_ABS|LD_IND instructions will be more
      intrusive but also more beneficial than this, and implemented later
      as it's too risky at this point in time.
      
      I.e., we plan to look into the option of moving skb_pull() out of
      eth_type_trans() and into netif_receive_skb() as has been suggested
      as second option. Meanwhile, this solution ensures ingress can be
      used with eBPF, too, and that we won't run into ABI troubles later.
      For dealing with negative offsets inside eBPF helper functions,
      we've implemented bpf_skb_clone_unwritable() to test for unwriteable
      headers.
      
      Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Fixes: 91bc4822 ("tc: bpf: add checksum helpers")
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a166151c
  15. 16 4月, 2015 1 次提交
    • M
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer 提交于
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      bfebd1cd
  16. 15 4月, 2015 1 次提交
  17. 14 4月, 2015 4 次提交
    • P
      netfilter: nft_dynset: dynamic stateful expression instantiation · 3e135cd4
      Patrick McHardy 提交于
      Support instantiating stateful expressions based on a template that
      are associated with dynamically created set entries. The expressions
      are evaluated when adding or updating the set element.
      
      This allows to maintain per flow state using the existing set
      infrastructure and expression types, with arbitrary definitions of
      a flow.
      
      Usage is currently restricted to anonymous sets, meaning only a single
      binding can exist, since the desired semantics of multiple independant
      bindings haven't been defined so far.
      
      Examples (userspace syntax is still WIP):
      
      1. Limit the rate of new SSH connections per host, similar to iptables
         hashlimit:
      
      	flow ip saddr timeout 60s \
      	limit 10/second \
      	accept
      
      2. Account network traffic between each set of /24 networks:
      
      	flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
      	counter
      
      3. Account traffic to each host per user:
      
      	flow skuid . ip daddr \
      	counter
      
      4. Account traffic for each combination of source address and TCP flags:
      
      	flow ip saddr . tcp flags \
      	counter
      
      The resulting set content after a Xmas-scan look like this:
      
      {
      	192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
      	192.168.122.1 . ack : counter packets 74 bytes 3848,
      	192.168.122.1 . psh | ack : counter packets 35 bytes 3144
      }
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3e135cd4
    • P
      netfilter: nf_tables: add flag to indicate set contains expressions · 7c6c6e95
      Patrick McHardy 提交于
      Add a set flag to indicate that the set is used as a state table and
      contains expressions for evaluation. This operation is mutually
      exclusive with the mapping operation, so sets specifying both are
      rejected. The lookup expression also rejects binding to state tables
      since it only deals with loopup and map operations.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7c6c6e95
    • P
      netfilter: nf_tables: prepare for expressions associated to set elements · f25ad2e9
      Patrick McHardy 提交于
      Preparation to attach expressions to set elements: add a set extension
      type to hold an expression and dump the expression information with the
      set element.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f25ad2e9
    • P
      uapi: ebtables: don't include linux/if.h · 24477e57
      Pablo Neira Ayuso 提交于
      linux/if.h creates conflicts in userspace with net/if.h
      
      By using it here we force userspace to use linux/if.h while
      net/if.h may be needed.
      
      Note that:
      
      include/linux/netfilter_ipv4/ip_tables.h and
      include/linux/netfilter_ipv6/ip6_tables.h
      
      don't include linux/if.h and they also refer to IFNAMSIZ, so they are
      expecting userspace to include use net/if.h from the client program.
      Signed-off-by: NArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      24477e57
  18. 13 4月, 2015 3 次提交
  19. 11 4月, 2015 1 次提交
  20. 08 4月, 2015 2 次提交