1. 20 1月, 2015 1 次提交
  2. 18 1月, 2015 1 次提交
  3. 15 1月, 2015 4 次提交
    • T
      openvswitch: Support VXLAN Group Policy extension · 1dd144cf
      Thomas Graf 提交于
      Introduces support for the group policy extension to the VXLAN virtual
      port. The extension is disabled by default and only enabled if the user
      has provided the respective configuration.
      
        ovs-vsctl add-port br0 vxlan0 -- \
           set Interface vxlan0 type=vxlan options:exts=gbp
      
      The configuration interface to enable the extension is based on a new
      attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
      which can carry additional extensions as needed in the future.
      
      The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
      internally just like Geneve options but transported as nested Netlink
      attributes to user space.
      
      Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
      binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.
      
      The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
      OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dd144cf
    • T
      vxlan: Group Policy extension · 3511494c
      Thomas Graf 提交于
      Implements supports for the Group Policy VXLAN extension [0] to provide
      a lightweight and simple security label mechanism across network peers
      based on VXLAN. The security context and associated metadata is mapped
      to/from skb->mark. This allows further mapping to a SELinux context
      using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
      tc, etc.
      
      The group membership is defined by the lower 16 bits of skb->mark, the
      upper 16 bits are used for flags.
      
      SELinux allows to manage label to secure local resources. However,
      distributed applications require ACLs to implemented across hosts. This
      is typically achieved by matching on L2-L4 fields to identify the
      original sending host and process on the receiver. On top of that,
      netlabel and specifically CIPSO [1] allow to map security contexts to
      universal labels.  However, netlabel and CIPSO are relatively complex.
      This patch provides a lightweight alternative for overlay network
      environments with a trusted underlay. No additional control protocol
      is required.
      
                 Host 1:                       Host 2:
      
            Group A        Group B        Group B     Group A
            +-----+   +-------------+    +-------+   +-----+
            | lxc |   | SELinux CTX |    | httpd |   | VM  |
            +--+--+   +--+----------+    +---+---+   +--+--+
      	  \---+---/                     \----+---/
      	      |                              |
      	  +---+---+                      +---+---+
      	  | vxlan |                      | vxlan |
      	  +---+---+                      +---+---+
      	      +------------------------------+
      
      Backwards compatibility:
      A VXLAN-GBP socket can receive standard VXLAN frames and will assign
      the default group 0x0000 to such frames. A Linux VXLAN socket will
      drop VXLAN-GBP  frames. The extension is therefore disabled by default
      and needs to be specifically enabled:
      
         ip link add [...] type vxlan [...] gbp
      
      In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
      must run on a separate port number.
      
      Examples:
       iptables:
        host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
        host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
      
       OVS:
        # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
        # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
      
      [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
      [1] http://lwn.net/Articles/204905/Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3511494c
    • T
      openvswitch: packet messages need their own probe attribtue · 1ba39804
      Thomas Graf 提交于
      User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
      and packet messages. This leads to an out-of-bounds access in
      ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
      OVS_PACKET_ATTR_MAX.
      
      Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
      as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
      while maintaining to be binary compatible with existing OVS binaries.
      
      Fixes: 05da5898 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Tracked-down-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ba39804
    • T
      vxlan: Remote checksum offload · dfd8645e
      Tom Herbert 提交于
      Add support for remote checksum offload in VXLAN. This uses a
      reserved bit to indicate that RCO is being done, and uses the low order
      reserved eight bits of the VNI to hold the start and offset values in a
      compressed manner.
      
      Start is encoded in the low order seven bits of VNI. This is start >> 1
      so that the checksum start offset is 0-254 using even values only.
      Checksum offset (transport checksum field) is indicated in the high
      order bit in the low order byte of the VNI. If the bit is set, the
      checksum field is for UDP (so offset = start + 6), else checksum
      field is for TCP (so offset = start + 16). Only TCP and UDP are
      supported in this implementation.
      
      Remote checksum offload for VXLAN is described in:
      
      https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
      
      Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
      
      With UDP checksums and Remote Checksum Offload
        IPv4
            Client
              11.84% CPU utilization
            Server
              12.96% CPU utilization
            9197 Mbps
        IPv6
            Client
              12.46% CPU utilization
            Server
              14.48% CPU utilization
            8963 Mbps
      
      With UDP checksums, no remote checksum offload
        IPv4
            Client
              15.67% CPU utilization
            Server
              14.83% CPU utilization
            9094 Mbps
        IPv6
            Client
              16.21% CPU utilization
            Server
              14.32% CPU utilization
            9058 Mbps
      
      No UDP checksums
        IPv4
            Client
              15.03% CPU utilization
            Server
              23.09% CPU utilization
            9089 Mbps
        IPv6
            Client
              16.18% CPU utilization
            Server
              26.57% CPU utilization
             8954 Mbps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfd8645e
  4. 14 1月, 2015 2 次提交
  5. 13 1月, 2015 2 次提交
  6. 09 1月, 2015 1 次提交
    • W
      ipv6: fix redefinition of in6_pktinfo and ip6_mtuinfo · 3b50d902
      WANG Cong 提交于
      Both netinet/in.h and linux/ipv6.h define these two structs,
      if we include both of them, we got:
      
      	/usr/include/linux/ipv6.h:19:8: error: redefinition of ‘struct in6_pktinfo’
      	 struct in6_pktinfo {
      		^
      	In file included from /usr/include/arpa/inet.h:22:0,
      			 from txtimestamp.c:33:
      	/usr/include/netinet/in.h:524:8: note: originally defined here
      	 struct in6_pktinfo
      		^
      	In file included from txtimestamp.c:40:0:
      	/usr/include/linux/ipv6.h:24:8: error: redefinition of ‘struct ip6_mtuinfo’
      	 struct ip6_mtuinfo {
      		^
      	In file included from /usr/include/arpa/inet.h:22:0,
      			 from txtimestamp.c:33:
      	/usr/include/netinet/in.h:531:8: note: originally defined here
      	 struct ip6_mtuinfo
      		^
      So similarly to what we did for in6_addr, we need to sync with
      libc header on their definitions.
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b50d902
  7. 08 1月, 2015 7 次提交
  8. 07 1月, 2015 1 次提交
  9. 06 1月, 2015 4 次提交
  10. 05 1月, 2015 1 次提交
  11. 01 1月, 2015 1 次提交
  12. 29 12月, 2014 1 次提交
  13. 23 12月, 2014 1 次提交
  14. 19 12月, 2014 1 次提交
  15. 18 12月, 2014 2 次提交
  16. 17 12月, 2014 6 次提交
  17. 14 12月, 2014 3 次提交
    • M
      virtio_pci: add VIRTIO_PCI_NO_LEGACY · 0dce3771
      Michael S. Tsirkin 提交于
      Add macro to disable all legacy register defines.
      Helpful to make sure legacy macros don't leak
      through into modern code.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      0dce3771
    • M
      ipc/msg: increase MSGMNI, remove scaling · 0050ee05
      Manfred Spraul 提交于
      SysV can be abused to allocate locked kernel memory.  For most systems, a
      small limit doesn't make sense, see the discussion with regards to SHMMAX.
      
      Therefore: increase MSGMNI to the maximum supported.
      
      And: If we ignore the risk of locking too much memory, then an automatic
      scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.
      
      The code preserves auto_msgmni to avoid breaking any user space applications
      that expect that the value exists.
      
      Notes:
      1) If an administrator must limit the memory allocations, then he can set
      MSGMNI as necessary.
      
      Or he can disable sysv entirely (as e.g. done by Android).
      
      2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
      to control latency vs. throughput:
      If MSGMNB is large, then msgsnd() just returns and more messages can be queued
      before a task switch to a task that calls msgrcv() is forced.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0050ee05
    • M
      ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM · e843e7d2
      Manfred Spraul 提交于
      a)
      
      SysV can be abused to allocate locked kernel memory.  For most systems, a
      small limit doesn't make sense, see the discussion with regards to SHMMAX.
      
      Therefore: Increase the sysv sem limits so that all known applications
      will work with these defaults.
      
      b)
      
      With regards to the maximum supported:
      Some of the specified hard limits are not correct anymore, therefore the
      patch updates the documentation.
      
      - SEMMNI must stay below IPCMNI, which is 32768.
        As for SHMMAX: Stay a bit below this limit.
      
      - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array
        was limited to 16 kB (order=2)
      
        This doesn't apply anymore:
         - the allocation size isn't sizeof(short)*nsems anymore.
         - ipc_alloc falls back to vmalloc
      
      - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an
        order=1 allocation.
        Therefore: Leave it at 500 (order=0 allocation).
      
      Note:
      If an administrator must limit the memory allocations, then he can set the
      values as necessary.
      
      Or he can disable sysv entirely (as e.g. done by Android).
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e843e7d2
  18. 12 12月, 2014 1 次提交