1. 12 2月, 2015 2 次提交
  2. 10 2月, 2015 4 次提交
  3. 08 2月, 2015 2 次提交
    • N
      tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks · 032ee423
      Neal Cardwell 提交于
      Helpers for mitigating ACK loops by rate-limiting dupacks sent in
      response to incoming out-of-window packets.
      
      This patch includes:
      
      - rate-limiting logic
      - sysctl to control how often we allow dupacks to out-of-window packets
      - SNMP counter for cases where we rate-limited our dupack sending
      
      The rate-limiting logic in this patch decides to not send dupacks in
      response to out-of-window segments if (a) they are SYNs or pure ACKs
      and (b) the remote endpoint is sending them faster than the configured
      rate limit.
      
      We rate-limit our responses rather than blocking them entirely or
      resetting the connection, because legitimate connections can rely on
      dupacks in response to some out-of-window segments. For example, zero
      window probes are typically sent with a sequence number that is below
      the current window, and ZWPs thus expect to thus elicit a dupack in
      response.
      
      We allow dupacks in response to TCP segments with data, because these
      may be spurious retransmissions for which the remote endpoint wants to
      receive DSACKs. This is safe because segments with data can't
      realistically be part of ACK loops, which by their nature consist of
      each side sending pure/data-less ACKs to each other.
      
      The dupack interval is controlled by a new sysctl knob,
      tcp_invalid_ratelimit, given in milliseconds, in case an administrator
      needs to dial this upward in the face of a high-rate DoS attack. The
      name and units are chosen to be analogous to the existing analogous
      knob for ICMP, icmp_ratelimit.
      
      The default value for tcp_invalid_ratelimit is 500ms, which allows at
      most one such dupack per 500ms. This is chosen to be 2x faster than
      the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
      2.4). We allow the extra 2x factor because network delay variations
      can cause packets sent at 1 second intervals to be compressed and
      arrive much closer.
      Reported-by: NAvery Fay <avery@mixpanel.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      032ee423
    • J
      net: openvswitch: Support masked set actions. · 83d2b9ba
      Jarno Rajahalme 提交于
      OVS userspace already probes the openvswitch kernel module for
      OVS_ACTION_ATTR_SET_MASKED support.  This patch adds the kernel module
      implementation of masked set actions.
      
      The existing set action sets many fields at once.  When only a subset
      of the IP header fields, for example, should be modified, all the IP
      fields need to be exact matched so that the other field values can be
      copied to the set action.  A masked set action allows modification of
      an arbitrary subset of the supported header bits without requiring the
      rest to be matched.
      
      Masked set action is now supported for all writeable key types, except
      for the tunnel key.  The set tunnel action is an exception as any
      input tunnel info is cleared before action processing starts, so there
      is no tunnel info to mask.
      
      The kernel module converts all (non-tunnel) set actions to masked set
      actions.  This makes action processing more uniform, and results in
      less branching and duplicating the action processing code.  When
      returning actions to userspace, the fully masked set actions are
      converted back to normal set actions.  We use a kernel internal action
      code to be able to tell the userspace provided and converted masked
      set actions apart.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83d2b9ba
  4. 05 2月, 2015 1 次提交
    • E
      pkt_sched: fq: better control of DDOS traffic · 06eb395f
      Eric Dumazet 提交于
      FQ has a fast path for skb attached to a socket, as it does not
      have to compute a flow hash. But for other packets, FQ being non
      stochastic means that hosts exposed to random Internet traffic
      can allocate million of flows structure (104 bytes each) pretty
      easily. Not only host can OOM, but lookup in RB trees can take
      too much cpu and memory resources.
      
      This patch adds a new attribute, orphan_mask, that is adding
      possibility of having a stochastic hash for orphaned skb.
      
      Its default value is 1024 slots, to mimic SFQ behavior.
      
      Note: This does not apply to locally generated TCP traffic,
      and no locally generated traffic will share a flow structure
      with another perfect or stochastic flow.
      
      This patch also handles the specific case of SYNACK messages:
      
      They are attached to the listener socket, and therefore all map
      to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
      hoping to have new accepted socket inherit this rate, SYNACK
      might be paced and even dropped.
      
      This is very similar to an internal patch Google have used more
      than one year.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06eb395f
  5. 03 2月, 2015 2 次提交
    • W
      net-timestamp: no-payload option · 49ca0d8b
      Willem de Bruijn 提交于
      Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
      timestamps, this loops timestamps on top of empty packets.
      
      Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
      cmsg reception (aside from timestamps) are no longer possible. This
      works together with a follow on patch that allows administrators to
      only allow tx timestamping if it does not loop payload or metadata.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      Changes (rfc -> v1)
        - add documentation
        - remove unnecessary skb->len test (thanks to Richard Cochran)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49ca0d8b
    • C
      NFC: Forward NFC_EVT_TRANSACTION to user space · 447b27c4
      Christophe Ricard 提交于
      NFC_EVT_TRANSACTION is sent through netlink in order for a
      specific application running on a secure element to notify
      userspace of an event. Typically the secure element application
      counterpart on the host could interpret that event and act
      upon it.
      
      Forwarded information contains:
      - SE host generating the event
      - Application IDentifier doing the operation
      - Applications parameters
      Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
      Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
      447b27c4
  6. 02 2月, 2015 1 次提交
  7. 29 1月, 2015 2 次提交
  8. 28 1月, 2015 1 次提交
  9. 27 1月, 2015 2 次提交
    • R
      PCI: Add defines for PCIe Max_Read_Request_Size · 5929b8a3
      Rafał Miłecki 提交于
      There are a few drivers using magic numbers when operating with PCIe
      capabilities and PCI_EXP_DEVCTL_READRQ.  Define known values to allow
      cleaning their code a bit.
      Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      5929b8a3
    • J
      openvswitch: Add support for unique flow IDs. · 74ed7ab9
      Joe Stringer 提交于
      Previously, flows were manipulated by userspace specifying a full,
      unmasked flow key. This adds significant burden onto flow
      serialization/deserialization, particularly when dumping flows.
      
      This patch adds an alternative way to refer to flows using a
      variable-length "unique flow identifier" (UFID). At flow setup time,
      userspace may specify a UFID for a flow, which is stored with the flow
      and inserted into a separate table for lookup, in addition to the
      standard flow table. Flows created using a UFID must be fetched or
      deleted using the UFID.
      
      All flow dump operations may now be made more terse with OVS_UFID_F_*
      flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
      omit the flow key from a datapath operation if the flow has a
      corresponding UFID. This significantly reduces the time spent assembling
      and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
      enabled, the datapath only returns the UFID and statistics for each flow
      during flow dump, increasing ovs-vswitchd revalidator performance by 40%
      or more.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74ed7ab9
  10. 26 1月, 2015 1 次提交
  11. 24 1月, 2015 1 次提交
  12. 23 1月, 2015 2 次提交
  13. 22 1月, 2015 1 次提交
  14. 20 1月, 2015 4 次提交
  15. 18 1月, 2015 1 次提交
  16. 16 1月, 2015 1 次提交
    • J
      cfg80211: change bandwidth reporting to explicit field · b51f3bee
      Johannes Berg 提交于
      For some reason, we made the bandwidth separate flags, which
      is rather confusing - a single rate cannot have different
      bandwidths at the same time.
      
      Change this to no longer be flags but use a separate field
      for the bandwidth ('bw') instead.
      
      While at it, add support for 5 and 10 MHz rates - these are
      reported as regular legacy rates with their real bitrate,
      but tagged as 5/10 now to make it easier to distinguish them.
      
      In the nl80211 API, the flags are preserved, but the code
      now can also clearly only set a single one of the flags.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b51f3bee
  17. 15 1月, 2015 6 次提交
    • O
      can: m_can: tag current CAN FD controllers as non-ISO · 6cfda7fb
      Oliver Hartkopp 提交于
      During the CAN FD standardization process within the ISO it turned out that
      the failure detection capability has to be improved.
      
      The CAN in Automation organization (CiA) defined the already implemented CAN
      FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
      'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937
      
      Finally there will be three types of CAN FD controllers in the future:
      
      1. ISO compliant (fixed)
      2. non-ISO compliant (fixed, like the M_CAN IP v3.0.1 in m_can.c)
      3. ISO/non-ISO CAN FD controllers (switchable, like the PEAK USB FD)
      
      So the current M_CAN driver for the M_CAN IP v3.0.1 has to expose its non-ISO
      implementation by setting the CAN_CTRLMODE_FD_NON_ISO ctrlmode at startup.
      As this bit cannot be switched at configuration time CAN_CTRLMODE_FD_NON_ISO
      must not be set in ctrlmode_supported of the current M_CAN driver.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      6cfda7fb
    • J
      cfg80211: remove 80+80 MHz rate reporting · 97d910d0
      Johannes Berg 提交于
      These rates are treated the same as 160 MHz in the spec, so
      it makes no sense to distinguish them. As no driver uses them
      yet, this is also not a problem, just remove them.
      
      In the userspace API the field remains reserved to preserve
      API and ABI.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      97d910d0
    • T
      openvswitch: Support VXLAN Group Policy extension · 1dd144cf
      Thomas Graf 提交于
      Introduces support for the group policy extension to the VXLAN virtual
      port. The extension is disabled by default and only enabled if the user
      has provided the respective configuration.
      
        ovs-vsctl add-port br0 vxlan0 -- \
           set Interface vxlan0 type=vxlan options:exts=gbp
      
      The configuration interface to enable the extension is based on a new
      attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
      which can carry additional extensions as needed in the future.
      
      The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
      internally just like Geneve options but transported as nested Netlink
      attributes to user space.
      
      Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
      binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.
      
      The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
      OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dd144cf
    • T
      vxlan: Group Policy extension · 3511494c
      Thomas Graf 提交于
      Implements supports for the Group Policy VXLAN extension [0] to provide
      a lightweight and simple security label mechanism across network peers
      based on VXLAN. The security context and associated metadata is mapped
      to/from skb->mark. This allows further mapping to a SELinux context
      using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
      tc, etc.
      
      The group membership is defined by the lower 16 bits of skb->mark, the
      upper 16 bits are used for flags.
      
      SELinux allows to manage label to secure local resources. However,
      distributed applications require ACLs to implemented across hosts. This
      is typically achieved by matching on L2-L4 fields to identify the
      original sending host and process on the receiver. On top of that,
      netlabel and specifically CIPSO [1] allow to map security contexts to
      universal labels.  However, netlabel and CIPSO are relatively complex.
      This patch provides a lightweight alternative for overlay network
      environments with a trusted underlay. No additional control protocol
      is required.
      
                 Host 1:                       Host 2:
      
            Group A        Group B        Group B     Group A
            +-----+   +-------------+    +-------+   +-----+
            | lxc |   | SELinux CTX |    | httpd |   | VM  |
            +--+--+   +--+----------+    +---+---+   +--+--+
      	  \---+---/                     \----+---/
      	      |                              |
      	  +---+---+                      +---+---+
      	  | vxlan |                      | vxlan |
      	  +---+---+                      +---+---+
      	      +------------------------------+
      
      Backwards compatibility:
      A VXLAN-GBP socket can receive standard VXLAN frames and will assign
      the default group 0x0000 to such frames. A Linux VXLAN socket will
      drop VXLAN-GBP  frames. The extension is therefore disabled by default
      and needs to be specifically enabled:
      
         ip link add [...] type vxlan [...] gbp
      
      In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
      must run on a separate port number.
      
      Examples:
       iptables:
        host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
        host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
      
       OVS:
        # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
        # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
      
      [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
      [1] http://lwn.net/Articles/204905/Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3511494c
    • T
      openvswitch: packet messages need their own probe attribtue · 1ba39804
      Thomas Graf 提交于
      User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
      and packet messages. This leads to an out-of-bounds access in
      ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
      OVS_PACKET_ATTR_MAX.
      
      Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
      as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
      while maintaining to be binary compatible with existing OVS binaries.
      
      Fixes: 05da5898 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Tracked-down-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ba39804
    • T
      vxlan: Remote checksum offload · dfd8645e
      Tom Herbert 提交于
      Add support for remote checksum offload in VXLAN. This uses a
      reserved bit to indicate that RCO is being done, and uses the low order
      reserved eight bits of the VNI to hold the start and offset values in a
      compressed manner.
      
      Start is encoded in the low order seven bits of VNI. This is start >> 1
      so that the checksum start offset is 0-254 using even values only.
      Checksum offset (transport checksum field) is indicated in the high
      order bit in the low order byte of the VNI. If the bit is set, the
      checksum field is for UDP (so offset = start + 6), else checksum
      field is for TCP (so offset = start + 16). Only TCP and UDP are
      supported in this implementation.
      
      Remote checksum offload for VXLAN is described in:
      
      https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
      
      Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
      
      With UDP checksums and Remote Checksum Offload
        IPv4
            Client
              11.84% CPU utilization
            Server
              12.96% CPU utilization
            9197 Mbps
        IPv6
            Client
              12.46% CPU utilization
            Server
              14.48% CPU utilization
            8963 Mbps
      
      With UDP checksums, no remote checksum offload
        IPv4
            Client
              15.67% CPU utilization
            Server
              14.83% CPU utilization
            9094 Mbps
        IPv6
            Client
              16.21% CPU utilization
            Server
              14.32% CPU utilization
            9058 Mbps
      
      No UDP checksums
        IPv4
            Client
              15.03% CPU utilization
            Server
              23.09% CPU utilization
            9089 Mbps
        IPv6
            Client
              16.18% CPU utilization
            Server
              26.57% CPU utilization
             8954 Mbps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfd8645e
  18. 14 1月, 2015 2 次提交
  19. 13 1月, 2015 2 次提交
  20. 12 1月, 2015 1 次提交
  21. 09 1月, 2015 1 次提交
    • W
      ipv6: fix redefinition of in6_pktinfo and ip6_mtuinfo · 3b50d902
      WANG Cong 提交于
      Both netinet/in.h and linux/ipv6.h define these two structs,
      if we include both of them, we got:
      
      	/usr/include/linux/ipv6.h:19:8: error: redefinition of ‘struct in6_pktinfo’
      	 struct in6_pktinfo {
      		^
      	In file included from /usr/include/arpa/inet.h:22:0,
      			 from txtimestamp.c:33:
      	/usr/include/netinet/in.h:524:8: note: originally defined here
      	 struct in6_pktinfo
      		^
      	In file included from txtimestamp.c:40:0:
      	/usr/include/linux/ipv6.h:24:8: error: redefinition of ‘struct ip6_mtuinfo’
      	 struct ip6_mtuinfo {
      		^
      	In file included from /usr/include/arpa/inet.h:22:0,
      			 from txtimestamp.c:33:
      	/usr/include/netinet/in.h:531:8: note: originally defined here
      	 struct ip6_mtuinfo
      		^
      So similarly to what we did for in6_addr, we need to sync with
      libc header on their definitions.
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b50d902