1. 20 10月, 2014 1 次提交
  2. 15 10月, 2014 1 次提交
  3. 14 10月, 2014 1 次提交
  4. 10 10月, 2014 2 次提交
    • K
      mm/balloon_compaction: add vmstat counters and kpageflags bit · 09316c09
      Konstantin Khlebnikov 提交于
      Always mark pages with PageBalloon even if balloon compaction is disabled
      and expose this mark in /proc/kpageflags as KPF_BALLOON.
      
      Also this patch adds three counters into /proc/vmstat: "balloon_inflate",
      "balloon_deflate" and "balloon_migrate".  They accumulate balloon
      activity.  Current size of balloon is (balloon_inflate - balloon_deflate)
      pages.
      
      All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON.
      It should be selected by ballooning driver which wants use this feature.
      Currently virtio-balloon is the only user.
      Signed-off-by: NKonstantin Khlebnikov <k.khlebnikov@samsung.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09316c09
    • C
      prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation · f606b77f
      Cyrill Gorcunov 提交于
      During development of c/r we've noticed that in case if we need to support
      user namespaces we face a problem with capabilities in prctl(PR_SET_MM,
      ...) call, in particular once new user namespace is created
      capable(CAP_SYS_RESOURCE) no longer passes.
      
      A approach is to eliminate CAP_SYS_RESOURCE check but pass all new values
      in one bundle, which would allow the kernel to make more intensive test
      for sanity of values and same time allow us to support checkpoint/restore
      of user namespaces.
      
      Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
      prctl_mm_map structure which carries all the members to be updated.
      
      	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)
      
      	struct prctl_mm_map {
      		__u64	start_code;
      		__u64	end_code;
      		__u64	start_data;
      		__u64	end_data;
      		__u64	start_brk;
      		__u64	brk;
      		__u64	start_stack;
      		__u64	arg_start;
      		__u64	arg_end;
      		__u64	env_start;
      		__u64	env_end;
      		__u64	*auxv;
      		__u32	auxv_size;
      		__u32	exe_fd;
      	};
      
      All members except @exe_fd correspond ones of struct mm_struct.  To figure
      out which available values these members may take here are meanings of the
      members.
      
       - start_code, end_code: represent bounds of executable code area
       - start_data, end_data: represent bounds of data area
       - start_brk, brk: used to calculate bounds for brk() syscall
       - start_stack: used when accounting space needed for command
         line arguments, environment and shmat() syscall
       - arg_start, arg_end, env_start, env_end: represent memory area
         supplied for command line arguments and environment variables
       - auxv, auxv_size: carries auxiliary vector, Elf format specifics
       - exe_fd: file descriptor number for executable link (/proc/self/exe)
      
      Thus we apply the following requirements to the values
      
      1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
         in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
         interval.
      
      2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
         VMAs (say a program maps own new .text and .data segments during execution)
         the rest of members should belong to VMA which must exist.
      
      3) Addresses must be ordered, ie @start_ member must not be greater or
         equal to appropriate @end_ member.
      
      4) As in regular Elf loading procedure we require that @start_brk and
         @brk be greater than @end_data.
      
      5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
         exceed existing limit. Same applies to RLIMIT_STACK.
      
      6) Auxiliary vector size must not exceed existing one (which is
         predefined as AT_VECTOR_SIZE and depends on architecture).
      
      7) File descriptor passed in @exe_file should be pointing
         to executable file (because we use existing prctl_set_mm_exe_file_locked
         helper it ensures that the file we are going to use as exe link has all
         required permission granted).
      
      Now about where these members are involved inside kernel code:
      
       - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;
      
       - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
         also they are considered if there enough space for brk() syscall
         result if RLIMIT_DATA is set;
      
       - @start_brk shown in /proc/$pid/stat output and accounted in brk()
         syscall if RLIMIT_DATA is set; also this member is tested to
         find a symbolic name of mmap event for perf system (we choose
         if event is generated for "heap" area); one more aplication is
         selinux -- we test if a process has PROCESS__EXECHEAP permission
         if trying to make heap area being executable with mprotect() syscall;
      
       - @brk is a current value for brk() syscall which lays inside heap
         area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
         provides new memory area to a user space upon brk() completion the
         mm::brk is updated to carry new value;
      
         Both @start_brk and @brk are actively used in /proc/$pid/maps
         and /proc/$pid/smaps output to find a symbolic name "heap" for
         VMA being scanned;
      
       - @start_stack is printed out in /proc/$pid/stat and used to
         find a symbolic name "stack" for task and threads in
         /proc/$pid/maps and /proc/$pid/smaps output, and as the same
         as with @start_brk -- perf system uses it for event naming.
         Also kernel treat this member as a start address of where
         to map vDSO pages and to check if there is enough space
         for shmat() syscall;
      
       - @arg_start, @arg_end, @env_start and @env_end are printed out
         in /proc/$pid/stat. Another access to the data these members
         represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
         Any attempt to read these areas kernel tests with access_process_vm
         helper so a user must have enough rights for this action;
      
       - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
         speaking kernel doesn't care much about which exactly data is
         sitting there because it is solely for userspace;
      
       - @exe_fd is referred from /proc/$pid/exe and when generating
         coredump. We uses prctl_set_mm_exe_file_locked helper to update
         this member, so exe-file link modification remains one-shot
         action.
      
      Still note that updating exe-file link now doesn't require sys-resource
      capability anymore, after all there is no much profit in preventing setup
      own file link (there are a number of ways to execute own code -- ptrace,
      ld-preload, so that the only reliable way to find which exactly code is
      executed is to inspect running program memory).  Still we require the
      caller to be at least user-namespace root user.
      
      I believe the old interface should be deprecated and ripped off in a
      couple of kernel releases if no one against.
      
      To test if new interface is implemented in the kernel one can pass
      PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently
      supported struct prctl_mm_map.
      
      [akpm@linux-foundation.org: fix 80-col wordwrap in macro definitions]
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAndrew Vagin <avagin@openvz.org>
      Tested-by: NAndrew Vagin <avagin@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Julien Tinnes <jln@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f606b77f
  5. 09 10月, 2014 1 次提交
    • M
      s390: add support for vector extension · 80703617
      Martin Schwidefsky 提交于
      The vector extension introduces 32 128-bit vector registers and a set of
      instruction to operate on the vector registers.
      
      The kernel can control the use of vector registers for the problem state
      program with a bit in control register 0. Once enabled for a process the
      kernel needs to retain the content of the vector registers on context
      switch. The signal frame is extended to include the vector registers.
      Two new register sets NT_S390_VXRS_LOW and NT_S390_VXRS_HIGH are added
      to the regset interface for the debugger and core dumps.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      80703617
  6. 08 10月, 2014 1 次提交
  7. 06 10月, 2014 5 次提交
  8. 04 10月, 2014 2 次提交
  9. 03 10月, 2014 2 次提交
    • V
      wil6210: atomic I/O for the card memory · dba4b74d
      Vladimir Kondratiev 提交于
      Introduce netdev IOCTLs, to be used by the debug tools.
      
      Allows to read/write single dword value or
      memory block, aligned to dword
      Different address modes supported:
      - BAR offset
      - Firmware "linker" address
      - target's AHB bus
      Signed-off-by: NVladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      dba4b74d
    • P
      netfilter: nft_reject: introduce icmp code abstraction for inet and bridge · 51b0a5d8
      Pablo Neira Ayuso 提交于
      This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
      an abstraction to the ICMP and ICMPv6 codes that you can use from the
      inet and bridge tables, they are:
      
      * NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
      * NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
      * NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
      * NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited
      
      You can still use the specific codes when restricting the rule to match
      the corresponding layer 3 protocol.
      
      I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
      different semantics depending on the table family and to allow the user
      to specify ICMP family specific codes if they restrict it to the
      corresponding family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      51b0a5d8
  10. 30 9月, 2014 2 次提交
    • M
      macvlan: add source mode · 79cf79ab
      Michael Braun 提交于
      This patch adds a new mode of operation to macvlan, called "source".
      It allows one to set a list of allowed mac address, which is used
      to match against source mac address from received frames on underlying
      interface.
      This enables creating mac based VLAN associations, instead of standard
      port or tag based. The feature is useful to deploy 802.1x mac based
      behavior, where drivers of underlying interfaces doesn't allows that.
      
      Configuration is done through the netlink interface using e.g.:
       ip link add link eth0 name macvlan0 type macvlan mode source
       ip link add link eth0 name macvlan1 type macvlan mode source
       ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
       ip link set link dev macvlan0 type macvlan macaddr add 00:22:22:22:22:22
       ip link set link dev macvlan0 type macvlan macaddr add 00:33:33:33:33:33
       ip link set link dev macvlan1 type macvlan macaddr add 00:33:33:33:33:33
       ip link set link dev macvlan1 type macvlan macaddr add 00:44:44:44:44:44
      
      This allows clients with MAC addresses 00:11:11:11:11:11,
      00:22:22:22:22:22 to be part of only VLAN associated with macvlan0
      interface. Clients with MAC addresses 00:44:44:44:44:44 with only VLAN
      associated with macvlan1 interface. And client with MAC address
      00:33:33:33:33:33 to be associated with both VLANs.
      
      Based on work of Stefan Gula <steweg@gmail.com>
      
      v8: last version of Stefan Gula for Kernel 3.2.1
      v9: rework onto linux-next 2014-03-12 by Michael Braun
          add MACADDR_SET command, enable to configure mac for source mode
          while creating interface
      v10:
        - reduce indention level
        - rename source_list to source_entry
        - use aligned 64bit ether address
        - use hash_64 instead of addr[5]
      v11:
        - rebase for 3.14 / linux-next 20.04.2014
      v12
        - rebase for linux-next 2014-09-25
      Signed-off-by: NMichael Braun <michael-dev@fami-braun.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79cf79ab
    • W
      vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type · f5c9eceb
      Will Deacon 提交于
      VFIO allows devices to be safely handed off to userspace by putting
      them behind an IOMMU configured to ensure DMA and interrupt isolation.
      This enables userspace KVM clients, such as kvmtool and qemu, to further
      map the device into a virtual machine.
      
      With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU
      translation services to the guest operating system, which are nested
      with the existing translation installed by VFIO. However, enabling this
      feature means that the IOMMU driver must be informed that the VFIO domain
      is being created for the purposes of nested translation.
      
      This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO
      type-1 driver. The new IOMMU type acts identically to the
      VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING
      attribute on its IOMMU domains.
      
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      f5c9eceb
  11. 29 9月, 2014 1 次提交
    • D
      net: tcp: add DCTCP congestion control algorithm · e3118e83
      Daniel Borkmann 提交于
      This work adds the DataCenter TCP (DCTCP) congestion control
      algorithm [1], which has been first published at SIGCOMM 2010 [2],
      resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more
      recently as an informational IETF draft available at [4]).
      
      DCTCP is an enhancement to the TCP congestion control algorithm for
      data center networks. Typical data center workloads are i.e.
      i) partition/aggregate (queries; bursty, delay sensitive), ii) short
      messages e.g. 50KB-1MB (for coordination and control state; delay
      sensitive), and iii) large flows e.g. 1MB-100MB (data update;
      throughput sensitive). DCTCP has therefore been designed for such
      environments to provide/achieve the following three requirements:
      
        * High burst tolerance (incast due to partition/aggregate)
        * Low latency (short flows, queries)
        * High throughput (continuous data updates, large file
          transfers) with commodity, shallow buffered switches
      
      The basic idea of its design consists of two fundamentals: i) on the
      switch side, packets are being marked when its internal queue
      length > threshold K (K is chosen so that a large enough headroom
      for marked traffic is still available in the switch queue); ii) the
      sender/host side maintains a moving average of the fraction of marked
      packets, so each RTT, F is being updated as follows:
      
       F := X / Y, where X is # of marked ACKs, Y is total # of ACKs
       alpha := (1 - g) * alpha + g * F, where g is a smoothing constant
      
      The resulting alpha (iow: probability that switch queue is congested)
      is then being used in order to adaptively decrease the congestion
      window W:
      
       W := (1 - (alpha / 2)) * W
      
      The means for receiving marked packets resp. marking them on switch
      side in DCTCP is the use of ECN.
      
      RFC3168 describes a mechanism for using Explicit Congestion Notification
      from the switch for early detection of congestion, rather than waiting
      for segment loss to occur.
      
      However, this method only detects the presence of congestion, not
      the *extent*. In the presence of mild congestion, it reduces the TCP
      congestion window too aggressively and unnecessarily affects the
      throughput of long flows [4].
      
      DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN)
      processing to estimate the fraction of bytes that encounter congestion,
      rather than simply detecting that some congestion has occurred. DCTCP
      then scales the TCP congestion window based on this estimate [4],
      thus it can derive multibit feedback from the information present in
      the single-bit sequence of marks in its control law. And thus act in
      *proportion* to the extent of congestion, not its *presence*.
      
      Switches therefore set the Congestion Experienced (CE) codepoint in
      packets when internal queue lengths exceed threshold K. Resulting,
      DCTCP delivers the same or better throughput than normal TCP, while
      using 90% less buffer space.
      
      It was found in [2] that DCTCP enables the applications to handle 10x
      the current background traffic, without impacting foreground traffic.
      Moreover, a 10x increase in foreground traffic did not cause any
      timeouts, and thus largely eliminates TCP incast collapse problems.
      
      The algorithm itself has already seen deployments in large production
      data centers since then.
      
      We did a long-term stress-test and analysis in a data center, short
      summary of our TCP incast tests with iperf compared to cubic:
      
      This test measured DCTCP throughput and latency and compared it with
      CUBIC throughput and latency for an incast scenario. In this test, 19
      senders sent at maximum rate to a single receiver. The receiver simply
      ran iperf -s.
      
      The senders ran iperf -c <receiver> -t 30. All senders started
      simultaneously (using local clocks synchronized by ntp).
      
      This test was repeated multiple times. Below shows the results from a
      single test. Other tests are similar. (DCTCP results were extremely
      consistent, CUBIC results show some variance induced by the TCP timeouts
      that CUBIC encountered.)
      
      For this test, we report statistics on the number of TCP timeouts,
      flow throughput, and traffic latency.
      
      1) Timeouts (total over all flows, and per flow summaries):
      
                  CUBIC            DCTCP
        Total     3227             25
        Mean       169.842          1.316
        Median     183              1
        Max        207              5
        Min        123              0
        Stddev      28.991          1.600
      
      Timeout data is taken by measuring the net change in netstat -s
      "other TCP timeouts" reported. As a result, the timeout measurements
      above are not restricted to the test traffic, and we believe that it
      is likely that all of the "DCTCP timeouts" are actually timeouts for
      non-test traffic. We report them nevertheless. CUBIC will also include
      some non-test timeouts, but they are drawfed by bona fide test traffic
      timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing
      TCP timeouts. DCTCP reduces timeouts by at least two orders of
      magnitude and may well have eliminated them in this scenario.
      
      2) Throughput (per flow in Mbps):
      
                  CUBIC            DCTCP
        Mean      521.684          521.895
        Median    464              523
        Max       776              527
        Min       403              519
        Stddev    105.891            2.601
        Fairness    0.962            0.999
      
      Throughput data was simply the average throughput for each flow
      reported by iperf. By avoiding TCP timeouts, DCTCP is able to
      achieve much better per-flow results. In CUBIC, many flows
      experience TCP timeouts which makes flow throughput unpredictable and
      unfair. DCTCP, on the other hand, provides very clean predictable
      throughput without incurring TCP timeouts. Thus, the standard deviation
      of CUBIC throughput is dramatically higher than the standard deviation
      of DCTCP throughput.
      
      Mean throughput is nearly identical because even though cubic flows
      suffer TCP timeouts, other flows will step in and fill the unused
      bandwidth. Note that this test is something of a best case scenario
      for incast under CUBIC: it allows other flows to fill in for flows
      experiencing a timeout. Under situations where the receiver is issuing
      requests and then waiting for all flows to complete, flows cannot fill
      in for timed out flows and throughput will drop dramatically.
      
      3) Latency (in ms):
      
                  CUBIC            DCTCP
        Mean      4.0088           0.04219
        Median    4.055            0.0395
        Max       4.2              0.085
        Min       3.32             0.028
        Stddev    0.1666           0.01064
      
      Latency for each protocol was computed by running "ping -i 0.2
      <receiver>" from a single sender to the receiver during the incast
      test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure
      that traffic traversed the DCTCP queue and was not dropped when the
      queue size was greater than the marking threshold. The summary
      statistics above are over all ping metrics measured between the single
      sender, receiver pair.
      
      The latency results for this test show a dramatic difference between
      CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer
      which incurs the maximum queue latency (more buffer memory will lead
      to high latency.) DCTCP, on the other hand, deliberately attempts to
      keep queue occupancy low. The result is a two orders of magnitude
      reduction of latency with DCTCP - even with a switch with relatively
      little RAM. Switches with larger amounts of RAM will incur increasing
      amounts of latency for CUBIC, but not for DCTCP.
      
      4) Convergence and stability test:
      
      This test measured the time that DCTCP took to fairly redistribute
      bandwidth when a new flow commences. It also measured DCTCP's ability
      to remain stable at a fair bandwidth distribution. DCTCP is compared
      with CUBIC for this test.
      
      At the commencement of this test, a single flow is sending at maximum
      rate (near 10 Gbps) to a single receiver. One second after that first
      flow commences, a new flow from a distinct server begins sending to
      the same receiver as the first flow. After the second flow has sent
      data for 10 seconds, the second flow is terminated. The first flow
      sends for an additional second. Ideally, the bandwidth would be evenly
      shared as soon as the second flow starts, and recover as soon as it
      stops.
      
      The results of this test are shown below. Note that the flow bandwidth
      for the two flows was measured near the same time, but not
      simultaneously.
      
      DCTCP performs nearly perfectly within the measurement limitations
      of this test: bandwidth is quickly distributed fairly between the two
      flows, remains stable throughout the duration of the test, and
      recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth
      fairly, and has trouble remaining stable.
      
        CUBIC                      DCTCP
      
        Seconds  Flow 1  Flow 2    Seconds  Flow 1  Flow 2
         0       9.93    0          0       9.92    0
         0.5     9.87    0          0.5     9.86    0
         1       8.73    2.25       1       6.46    4.88
         1.5     7.29    2.8        1.5     4.9     4.99
         2       6.96    3.1        2       4.92    4.94
         2.5     6.67    3.34       2.5     4.93    5
         3       6.39    3.57       3       4.92    4.99
         3.5     6.24    3.75       3.5     4.94    4.74
         4       6       3.94       4       5.34    4.71
         4.5     5.88    4.09       4.5     4.99    4.97
         5       5.27    4.98       5       4.83    5.01
         5.5     4.93    5.04       5.5     4.89    4.99
         6       4.9     4.99       6       4.92    5.04
         6.5     4.93    5.1        6.5     4.91    4.97
         7       4.28    5.8        7       4.97    4.97
         7.5     4.62    4.91       7.5     4.99    4.82
         8       5.05    4.45       8       5.16    4.76
         8.5     5.93    4.09       8.5     4.94    4.98
         9       5.73    4.2        9       4.92    5.02
         9.5     5.62    4.32       9.5     4.87    5.03
        10       6.12    3.2       10       4.91    5.01
        10.5     6.91    3.11      10.5     4.87    5.04
        11       8.48    0         11       8.49    4.94
        11.5     9.87    0         11.5     9.9     0
      
      SYN/ACK ECT test:
      
      This test demonstrates the importance of ECT on SYN and SYN-ACK packets
      by measuring the connection probability in the presence of competing
      flows for a DCTCP connection attempt *without* ECT in the SYN packet.
      The test was repeated five times for each number of competing flows.
      
                    Competing Flows  1 |    2 |    4 |    8 |   16
                                     ------------------------------
      Mean Connection Probability    1 | 0.67 | 0.45 | 0.28 |    0
      Median Connection Probability  1 | 0.65 | 0.45 | 0.25 |    0
      
      As the number of competing flows moves beyond 1, the connection
      probability drops rapidly.
      
      Enabling DCTCP with this patch requires the following steps:
      
      DCTCP must be running both on the sender and receiver side in your
      data center, i.e.:
      
        sysctl -w net.ipv4.tcp_congestion_control=dctcp
      
      Also, ECN functionality must be enabled on all switches in your
      data center for DCTCP to work. The default ECN marking threshold (K)
      heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at
      1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]).
      
      In above tests, for each switch port, traffic was segregated into two
      queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of
      0x04 - the packet was placed into the DCTCP queue. All other packets
      were placed into the default drop-tail queue. For the DCTCP queue,
      RED/ECN marking was enabled, here, with a marking threshold of 75 KB.
      More details however, we refer you to the paper [2] under section 3).
      
      There are no code changes required to applications running in user
      space. DCTCP has been implemented in full *isolation* of the rest of
      the TCP code as its own congestion control module, so that it can run
      without a need to expose code to the core of the TCP stack, and thus
      nothing changes for non-DCTCP users.
      
      Changes in the CA framework code are minimal, and DCTCP algorithm
      operates on mechanisms that are already available in most Silicon.
      The gain (dctcp_shift_g) is currently a fixed constant (1/16) from
      the paper, but we leave the option that it can be chosen carefully
      to a different value by the user.
      
      In case DCTCP is being used and ECN support on peer site is off,
      DCTCP falls back after 3WHS to operate in normal TCP Reno mode.
      
      ss {-4,-6} -t -i diag interface:
      
        ... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054
        ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584
        send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15
        reordering:101 rcv_space:29200
      
        ... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448
        cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate
        325.5Mbps rcv_rtt:1.5 rcv_space:29200
      
      More information about DCTCP can be found in [1-4].
      
        [1] http://simula.stanford.edu/~alizade/Site/DCTCP.html
        [2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
        [3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
        [4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00
      
      Joint work with Florian Westphal and Glenn Judd.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NGlenn Judd <glenn.judd@morganstanley.com>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3118e83
  12. 27 9月, 2014 5 次提交
    • A
      bpf: verifier (add ability to receive verification log) · cbd35700
      Alexei Starovoitov 提交于
      add optional attributes for BPF_PROG_LOAD syscall:
      union bpf_attr {
          struct {
      	...
      	__u32         log_level; /* verbosity level of eBPF verifier */
      	__u32         log_size;  /* size of user buffer */
      	__aligned_u64 log_buf;   /* user supplied 'char *buffer' */
          };
      };
      
      when log_level > 0 the verifier will return its verification log in the user
      supplied buffer 'log_buf' which can be used by program author to analyze why
      verifier rejected given program.
      
      'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt
      provides several examples of these messages, like the program:
      
        BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
        BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
        BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
        BPF_LD_MAP_FD(BPF_REG_1, 0),
        BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
        BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
        BPF_EXIT_INSN(),
      
      will be rejected with the following multi-line message in log_buf:
      
        0: (7a) *(u64 *)(r10 -8) = 0
        1: (bf) r2 = r10
        2: (07) r2 += -8
        3: (b7) r1 = 0
        4: (85) call 1
        5: (15) if r0 == 0x0 goto pc+1
         R0=map_ptr R10=fp
        6: (7a) *(u64 *)(r0 +4) = 0
        misaligned access off 4 size 8
      
      The format of the output can change at any time as verifier evolves.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbd35700
    • A
      bpf: expand BPF syscall with program load/unload · 09756af4
      Alexei Starovoitov 提交于
      eBPF programs are similar to kernel modules. They are loaded by the user
      process and automatically unloaded when process exits. Each eBPF program is
      a safe run-to-completion set of instructions. eBPF verifier statically
      determines that the program terminates and is safe to execute.
      
      The following syscall wrapper can be used to load the program:
      int bpf_prog_load(enum bpf_prog_type prog_type,
                        const struct bpf_insn *insns, int insn_cnt,
                        const char *license)
      {
          union bpf_attr attr = {
              .prog_type = prog_type,
              .insns = ptr_to_u64(insns),
              .insn_cnt = insn_cnt,
              .license = ptr_to_u64(license),
          };
      
          return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
      }
      where 'insns' is an array of eBPF instructions and 'license' is a string
      that must be GPL compatible to call helper functions marked gpl_only
      
      Upon succesful load the syscall returns prog_fd.
      Use close(prog_fd) to unload the program.
      
      User space tests and examples follow in the later patches
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09756af4
    • A
      bpf: add lookup/update/delete/iterate methods to BPF maps · db20fd2b
      Alexei Starovoitov 提交于
      'maps' is a generic storage of different types for sharing data between kernel
      and userspace.
      
      The maps are accessed from user space via BPF syscall, which has commands:
      
      - create a map with given type and attributes
        fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
        returns fd or negative error
      
      - lookup key in a given map referenced by fd
        err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key, attr->value
        returns zero and stores found elem into value or negative error
      
      - create or update key/value pair in a given map
        err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key, attr->value
        returns zero or negative error
      
      - find and delete element by key in a given map
        err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key
      
      - iterate map elements (based on input key return next_key)
        err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key, attr->next_key
      
      - close(fd) deletes the map
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db20fd2b
    • A
      bpf: introduce BPF syscall and maps · 99c55f7d
      Alexei Starovoitov 提交于
      BPF syscall is a multiplexor for a range of different operations on eBPF.
      This patch introduces syscall with single command to create a map.
      Next patch adds commands to access maps.
      
      'maps' is a generic storage of different types for sharing data between kernel
      and userspace.
      
      Userspace example:
      /* this syscall wrapper creates a map with given type and attributes
       * and returns map_fd on success.
       * use close(map_fd) to delete the map
       */
      int bpf_create_map(enum bpf_map_type map_type, int key_size,
                         int value_size, int max_entries)
      {
          union bpf_attr attr = {
              .map_type = map_type,
              .key_size = key_size,
              .value_size = value_size,
              .max_entries = max_entries
          };
      
          return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
      }
      
      'union bpf_attr' is backwards compatible with future extensions.
      
      More details in Documentation/networking/filter.txt and in manpage
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99c55f7d
    • G
      next: openrisc: Fix build · 29075fea
      Guenter Roeck 提交于
      openrisc:defconfig fails to build in next-20140926 with the following error.
      
      In file included from arch/openrisc/kernel/signal.c:31:0:
      ./arch/openrisc/include/asm/syscall.h: In function 'syscall_get_arch':
      ./arch/openrisc/include/asm/syscall.h:77:9: error: 'EM_OPENRISC' undeclared
      
      Fix by moving EM_OPENRISC to include/uapi/linux/elf-em.h.
      
      Fixes: ce5d1128 ("ARCH: AUDIT: implement syscall_get_arch for all arches")
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      29075fea
  13. 26 9月, 2014 1 次提交
    • M
      [media] v4l2-dv-timings: fix a sparse warning · 5c2cacc1
      Mauro Carvalho Chehab 提交于
      This is detected with:
      	gcc-4.8.3-7.fc20.x86_64
      	sparse-0.5.0-3.fc20.x86_64
      
      drivers/media/v4l2-core/v4l2-dv-timings.c:34:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:35:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:36:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:37:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:38:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:39:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:40:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:41:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:42:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:43:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:44:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:45:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:46:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:47:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:48:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:49:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:50:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:51:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:52:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:53:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:54:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:55:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:56:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:57:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:58:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:59:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:60:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:61:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:62:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:63:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:64:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:65:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:66:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:67:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:68:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:69:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:70:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:71:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:72:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:73:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:74:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:75:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:76:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:77:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:78:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:79:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:80:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:81:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:82:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:83:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:84:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:85:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:86:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:87:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:88:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:89:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:90:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:91:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:92:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:93:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:94:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:95:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:96:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:97:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:98:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:99:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:100:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:101:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:102:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:103:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:104:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:105:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:106:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:107:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:108:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:109:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:110:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:111:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:112:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:113:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:114:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:115:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:116:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:117:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:118:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:119:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:120:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:121:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:122:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:123:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:124:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:125:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:126:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:127:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:128:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:129:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:130:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:131:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:132:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:133:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:134:9: error: unknown field name in initializer
      drivers/media/v4l2-core/v4l2-dv-timings.c:135:9: error: too many errors
      drivers/media/usb/hdpvr/hdpvr-video.c:42:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:43:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:44:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:45:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:46:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:47:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:48:9: error: unknown field name in initializer
      drivers/media/usb/hdpvr/hdpvr-video.c:49:9: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:484:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:485:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:486:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:487:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:488:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:489:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:490:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:491:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:492:18: error: unknown field name in initializer
      drivers/media/platform/s5p-tv/hdmi_drv.c:493:18: error: unknown field name in initializer
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      5c2cacc1
  14. 25 9月, 2014 1 次提交
  15. 24 9月, 2014 4 次提交
  16. 22 9月, 2014 2 次提交
  17. 20 9月, 2014 3 次提交
  18. 19 9月, 2014 1 次提交
    • P
      netfilter: nf_tables: export rule-set generation ID · 84d7fce6
      Pablo Neira Ayuso 提交于
      This patch exposes the ruleset generation ID in three ways:
      
      1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset
         generation ID. This ID is incremented in every commit and it
         should be large enough to avoid wraparound problems.
      
      2) The less significant 16-bits of the generation ID are exposed through
         the nfgenmsg->res_id header field. This allows us to quickly catch
         if the ruleset has change between two consecutive list dumps from
         different object lists (in this specific case I think the risk of
         wraparound is unlikely).
      
      3) Userspace subscribers may receive notifications of new rule-set
         generation after every commit. This also provides an alternative
         way to monitor the generation ID. If the events are lost, the
         userspace process hits a overrun error, so it knows that it is
         working with a stale ruleset anyway.
      
      Patrick spotted that rule-set transformations in userspace may take
      quite some time. In that case, it annotates the 32-bits generation ID
      before fetching the rule-set, then:
      
      1) it compares it to what we obtain after the transformation to
         make sure it is not working with a stale rule-set and no wraparound
         has ocurred.
      
      2) it subscribes to ruleset notifications, so it can watch for new
         generation ID.
      
      This is complementary to the NLM_F_DUMP_INTR approach, which allows
      us to detect an interference in the middle one single list dumping.
      There is no way to explicitly check that an interference has occurred
      between two list dumps from the kernel, since it doesn't know how
      many lists the userspace client is actually going to dump.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      84d7fce6
  19. 17 9月, 2014 1 次提交
  20. 16 9月, 2014 3 次提交
    • R
      usb: gadget: f_fs: virtual endpoint address mapping · 1b0bf88f
      Robert Baldyga 提交于
      This patch introduces virtual endpoint address mapping. It separates
      function logic form physical endpoint addresses making it more hardware
      independent.
      
      Following modifications changes user space API, so to enable them user
      have to switch on the FUNCTIONFS_VIRTUAL_ADDR flag in descriptors.
      
      Endpoints are now refered using virtual endpoint addresses chosen by
      user in endpoint descpriptors. This applies to each context when endpoint
      address can be used:
      - when accessing endpoint files in FunctionFS filesystemi (in file name),
      - in setup requests directed to specific endpoint (in wIndex field),
      - in descriptors returned by FUNCTIONFS_ENDPOINT_DESC ioctl.
      
      In endpoint file names the endpoint address number is formatted as
      double-digit hexadecimal value ("ep%02x") which has few advantages -
      it is easy to parse, allows to easly recognize endpoint direction basing
      on its name (IN endpoint number starts with digit 8, and OUT with 0)
      which can be useful for debugging purpose, and it makes easier to introduce
      further features allowing to use each endpoint number in both directions
      to have more endpoints available for function if hardware supports this
      (for example we could have ep01 which is endpoint 1 with OUT direction,
      and ep81 which is endpoint 1 with IN direction).
      
      Physical endpoint address can be still obtained using ioctl named
      FUNCTIONFS_ENDPOINT_REVMAP, but now it's not neccesary to handle
      USB transactions properly.
      Signed-off-by: NRobert Baldyga <r.baldyga@samsung.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NFelipe Balbi <balbi@ti.com>
      1b0bf88f
    • A
      openvswitch: Add recirc and hash action. · 971427f3
      Andy Zhou 提交于
      Recirc action allows a packet to reenter openvswitch processing.
      currently openvswitch lookup flow for packet received and execute
      set of actions on that packet, with help of recirc action we can
      process/modify the packet and recirculate it back in openvswitch
      for another pass.
      
      OVS hash action calculates 5-tupple hash and set hash in flow-key
      hash. This can be used along with recirculation for distributing
      packets among different ports for bond devices.
      For example:
      OVS bonding can use following actions:
      Match on: bond flow; Action: hash, recirc(id)
      Match on: recirc-id == id and hash lower bits == a;
                Action: output port_bond_a
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Acked-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      971427f3
    • A
      ipvs: Add destination address family to netlink interface · 6cff339b
      Alex Gartrell 提交于
      This is necessary to support heterogeneous pools.  For example, if you have
      an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
      into it.
      
      This patch enforces that destination address family is the same as service
      family, as none of the forwarding mechanisms support anything else.
      
      For the old setsockopt mechanism, we simply set the dest address family to
      AF_INET as we do with the service.
      Signed-off-by: NAlex Gartrell <agartrell@fb.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      6cff339b