1. 18 1月, 2015 2 次提交
  2. 16 1月, 2015 2 次提交
    • Y
      rhashtable: Fix race in rhashtable_destroy() and use regular work_struct · 57699a40
      Ying Xue 提交于
      When we put our declared work task in the global workqueue with
      schedule_delayed_work(), its delay parameter is always zero.
      Therefore, we should define a regular work in rhashtable structure
      instead of a delayed work.
      
      By the way, we add a condition to check whether resizing functions
      are NULL before cancelling the work, avoiding to cancel an
      uninitialized work.
      
      Lastly, while we wait for all work items we submitted before to run
      to completion with cancel_delayed_work(), ht->mutex has been taken in
      rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
      until all work items are accomplished, and when work items are
      scheduled, the work's function - rht_deferred_worker() will be called.
      However, as rht_deferred_worker() also needs to acquire the lock,
      deadlock might happen at the moment as the lock is already held before.
      So if the cancel work function is moved out of the lock covered scope,
      this will avoid the deadlock.
      
      Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57699a40
    • E
      ipv4: per cpu uncached list · 5055c371
      Eric Dumazet 提交于
      RAW sockets with hdrinc suffer from contention on rt_uncached_lock
      spinlock.
      
      One solution is to use percpu lists, since most routes are destroyed
      by the cpu that created them.
      
      It is unclear why we even have to put these routes in uncached_list,
      as all outgoing packets should be freed when a device is dismantled.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: caacf05e ("ipv4: Properly purge netdev references on uncached routes.")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5055c371
  3. 15 1月, 2015 7 次提交
    • T
      openvswitch: Support VXLAN Group Policy extension · 1dd144cf
      Thomas Graf 提交于
      Introduces support for the group policy extension to the VXLAN virtual
      port. The extension is disabled by default and only enabled if the user
      has provided the respective configuration.
      
        ovs-vsctl add-port br0 vxlan0 -- \
           set Interface vxlan0 type=vxlan options:exts=gbp
      
      The configuration interface to enable the extension is based on a new
      attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
      which can carry additional extensions as needed in the future.
      
      The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
      internally just like Geneve options but transported as nested Netlink
      attributes to user space.
      
      Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
      binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.
      
      The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
      OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dd144cf
    • T
      vxlan: Only bind to sockets with compatible flags enabled · ac5132d1
      Thomas Graf 提交于
      A VXLAN net_device looking for an appropriate socket may only consider
      a socket which has a matching set of flags/extensions enabled. If
      incompatible flags are enabled, return a conflict to have the caller
      create a distinct socket with distinct port.
      
      The OVS VXLAN port is kept unaware of extensions at this point.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac5132d1
    • T
      vxlan: Group Policy extension · 3511494c
      Thomas Graf 提交于
      Implements supports for the Group Policy VXLAN extension [0] to provide
      a lightweight and simple security label mechanism across network peers
      based on VXLAN. The security context and associated metadata is mapped
      to/from skb->mark. This allows further mapping to a SELinux context
      using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
      tc, etc.
      
      The group membership is defined by the lower 16 bits of skb->mark, the
      upper 16 bits are used for flags.
      
      SELinux allows to manage label to secure local resources. However,
      distributed applications require ACLs to implemented across hosts. This
      is typically achieved by matching on L2-L4 fields to identify the
      original sending host and process on the receiver. On top of that,
      netlabel and specifically CIPSO [1] allow to map security contexts to
      universal labels.  However, netlabel and CIPSO are relatively complex.
      This patch provides a lightweight alternative for overlay network
      environments with a trusted underlay. No additional control protocol
      is required.
      
                 Host 1:                       Host 2:
      
            Group A        Group B        Group B     Group A
            +-----+   +-------------+    +-------+   +-----+
            | lxc |   | SELinux CTX |    | httpd |   | VM  |
            +--+--+   +--+----------+    +---+---+   +--+--+
      	  \---+---/                     \----+---/
      	      |                              |
      	  +---+---+                      +---+---+
      	  | vxlan |                      | vxlan |
      	  +---+---+                      +---+---+
      	      +------------------------------+
      
      Backwards compatibility:
      A VXLAN-GBP socket can receive standard VXLAN frames and will assign
      the default group 0x0000 to such frames. A Linux VXLAN socket will
      drop VXLAN-GBP  frames. The extension is therefore disabled by default
      and needs to be specifically enabled:
      
         ip link add [...] type vxlan [...] gbp
      
      In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
      must run on a separate port number.
      
      Examples:
       iptables:
        host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
        host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
      
       OVS:
        # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
        # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
      
      [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
      [1] http://lwn.net/Articles/204905/Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3511494c
    • T
      openvswitch: packet messages need their own probe attribtue · 1ba39804
      Thomas Graf 提交于
      User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
      and packet messages. This leads to an out-of-bounds access in
      ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
      OVS_PACKET_ATTR_MAX.
      
      Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
      as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
      while maintaining to be binary compatible with existing OVS binaries.
      
      Fixes: 05da5898 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Tracked-down-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ba39804
    • B
      netdevice: Add missing parentheses in macro · 4ccce02e
      Benjamin Poirier 提交于
      For example, one could conceivably call
      	for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
      and get an unexpected result.
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ccce02e
    • T
      vxlan: Remote checksum offload · dfd8645e
      Tom Herbert 提交于
      Add support for remote checksum offload in VXLAN. This uses a
      reserved bit to indicate that RCO is being done, and uses the low order
      reserved eight bits of the VNI to hold the start and offset values in a
      compressed manner.
      
      Start is encoded in the low order seven bits of VNI. This is start >> 1
      so that the checksum start offset is 0-254 using even values only.
      Checksum offset (transport checksum field) is indicated in the high
      order bit in the low order byte of the VNI. If the bit is set, the
      checksum field is for UDP (so offset = start + 6), else checksum
      field is for TCP (so offset = start + 16). Only TCP and UDP are
      supported in this implementation.
      
      Remote checksum offload for VXLAN is described in:
      
      https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
      
      Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
      
      With UDP checksums and Remote Checksum Offload
        IPv4
            Client
              11.84% CPU utilization
            Server
              12.96% CPU utilization
            9197 Mbps
        IPv6
            Client
              12.46% CPU utilization
            Server
              14.48% CPU utilization
            8963 Mbps
      
      With UDP checksums, no remote checksum offload
        IPv4
            Client
              15.67% CPU utilization
            Server
              14.83% CPU utilization
            9094 Mbps
        IPv6
            Client
              16.21% CPU utilization
            Server
              14.32% CPU utilization
            9058 Mbps
      
      No UDP checksums
        IPv4
            Client
              15.03% CPU utilization
            Server
              23.09% CPU utilization
            9089 Mbps
        IPv6
            Client
              16.18% CPU utilization
            Server
              26.57% CPU utilization
             8954 Mbps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfd8645e
    • T
      udp: pass udp_offload struct to UDP gro callbacks · a2b12f3c
      Tom Herbert 提交于
      This patch introduces udp_offload_callbacks which has the same
      GRO functions (but not a GSO function) as offload_callbacks,
      except there is an argument to a udp_offload struct passed to
      gro_receive and gro_complete functions. This additional argument
      can be used to retrieve the per port structure of the encapsulation
      for use in gro processing (mostly by doing container_of on the
      structure).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2b12f3c
  4. 14 1月, 2015 10 次提交
  5. 13 1月, 2015 5 次提交
    • J
      x86/xen: properly retrieve NMI reason · f221b04f
      Jan Beulich 提交于
      Using the native code here can't work properly, as the hypervisor would
      normally have cleared the two reason bits by the time Dom0 gets to see
      the NMI (if passed to it at all). There's a shared info field for this,
      and there's an existing hook to use - just fit the two together. This
      is particularly relevant so that NMIs intended to be handled by APEI /
      GHES actually make it to the respective handler.
      
      Note that the hook can (and should) be used irrespective of whether
      being in Dom0, as accessing port 0x61 in a DomU would be even worse,
      while the shared info field would just hold zero all the time. Note
      further that hardware NMI handling for PVH doesn't currently work
      anyway due to missing code in the hypervisor (but it is expected to
      work the native rather than the PV way).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      f221b04f
    • W
      mm: mmu_gather: use tlb->end != 0 only for TLB invalidation · 721c21c1
      Will Deacon 提交于
      When batching up address ranges for TLB invalidation, we check tlb->end
      != 0 to indicate that some pages have actually been unmapped.
      
      As of commit f045bbb9 ("mmu_gather: fix over-eager
      tlb_flush_mmu_free() calling"), we use the same check for freeing these
      pages in order to avoid a performance regression where we call
      free_pages_and_swap_cache even when no pages are actually queued up.
      
      Unfortunately, the range could have been reset (tlb->end = 0) by
      tlb_end_vma, which has been shown to cause memory leaks on arm64.
      Furthermore, investigation into these leaks revealed that the fullmm
      case on task exit no longer invalidates the TLB, by virtue of tlb->end
       == 0 (in 3.18, need_flush would have been set).
      
      This patch resolves the problem by reverting commit f045bbb9, using
      instead tlb->local.nr as the predicate for page freeing in
      tlb_flush_mmu_free and ensuring that tlb->end is initialised to a
      non-zero value in the fullmm case.
      Tested-by: NMark Langsdorf <mlangsdo@redhat.com>
      Tested-by: NDave Hansen <dave@sr71.net>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      721c21c1
    • R
      rtnetlink: new filter RTEXT_FILTER_BRVLAN_COMPRESSED · 35a27cee
      Roopa Prabhu 提交于
      This filter is same as RTEXT_FILTER_BRVLAN except that it tries
      to compress the consecutive vlans into ranges.
      
      This helps on systems with large number of configured vlans.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35a27cee
    • R
      bridge: support for multiple vlans and vlan ranges in setlink and dellink requests · bdced7ef
      Roopa Prabhu 提交于
      This patch changes bridge IFLA_AF_SPEC netlink attribute parser to
      look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows
      userspace to pack more than one vlan in the setlink msg.
      
      The dumps were already sending more than one vlan info in the getlink msg.
      
      This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
      BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range
      
      This patch also deletes unused ifla_br_policy.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdced7ef
    • T
      vxlan: Improve support for header flags · 3bf39475
      Tom Herbert 提交于
      This patch cleans up the header flags of VXLAN in anticipation of
      defining some new ones:
      
      - Move header related definitions from vxlan.c to vxlan.h
      - Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag)
      - Move check for unknown flags to after we find vxlan_sock, this
        assumes that some flags may be processed based on tunnel
        configuration
      - Add a comment about why the stack treating unknown set flags as an
        error instead of ignoring them
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bf39475
  6. 12 1月, 2015 1 次提交
    • A
      mmc: sdhci: Disable re-tuning for HS400 · b5540ce1
      Adrian Hunter 提交于
      Re-tuning for HS400 mode must be done in HS200
      mode. Currently there is no support for that.
      That needs to be reflected in the code.
      Specifically, if tuning is executed in HS400 mode
      then return an error, and do not start the
      tuning timer if HS200 tuning is being done prior
      to switching to HS400.
      
      Note that periodic re-tuning is not expected
      to be needed for HS400 but re-tuning is still
      needed after the host controller has lost power.
      In the case of suspend/resume that is not necessary
      because the card is fully re-initialised. That
      just leaves runtime suspend/resume with no support
      for HS400 re-tuning.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      b5540ce1
  7. 10 1月, 2015 1 次提交
  8. 09 1月, 2015 8 次提交
  9. 08 1月, 2015 4 次提交