1. 13 10月, 2015 1 次提交
    • A
      can: avoid using timeval for uapi · ba61a8d9
      Arnd Bergmann 提交于
      The can subsystem communicates with user space using a bcm_msg_head
      header, which contains two timestamps. This is problematic for
      multiple reasons:
      
      a) The structure layout is currently incompatible between 64-bit
         user space and 32-bit user space, and cannot work in compat
         mode (other than x32).
      
      b) The timeval structure layout will change in 32-bit user
         space when we fix the y2038 overflow problem by redefining
         time_t to 64-bit, making new 32-bit user space incompatible
         with the current kernel interface.
         Cars last a long time and often use old kernels, so the actual
         users of this code are the most likely ones to migrate to y2038
         safe user space.
      
      This tries to work around part of the problem by changing the
      publicly visible user interface in the header, but not the binary
      interface. Fortunately, the values passed around in the structure
      are relative times and do not actually suffer from the y2038
      overflow, so 32-bit is enough here.
      
      We replace the use of 'struct timeval' with a newly defined
      'struct bcm_timeval' that uses the exact same binary layout
      as before and that still suffers from problem a) but not problem
      b).
      
      The downside of this approach is that any user space program
      that currently assigns a timeval structure to these members
      rather than writing the tv_sec/tv_usec portions individually
      will suffer a compile-time error when built with an updated
      kernel header. Fixing this error makes it work fine with old
      and new headers though.
      
      We could address problem a) by using '__u32' or 'int' members
      rather than 'long', but that would have a more significant
      downside in also breaking support for all existing 64-bit user
      binaries that might be using this interface, which is likely
      not acceptable.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-can@vger.kernel.org
      Cc: linux-api@vger.kernel.org
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      ba61a8d9
  2. 07 10月, 2015 9 次提交
  3. 05 10月, 2015 21 次提交
  4. 03 10月, 2015 1 次提交
    • D
      sched, bpf: add helper for retrieving routing realms · c46646d0
      Daniel Borkmann 提交于
      Using routing realms as part of the classifier is quite useful, it
      can be viewed as a tag for one or multiple routing entries (think of
      an analogy to net_cls cgroup for processes), set by user space routing
      daemons or via iproute2 as an indicator for traffic classifiers and
      later on processed in the eBPF program.
      
      Unlike actions, the classifier can inspect device flags and enable
      netif_keep_dst() if necessary. tc actions don't have that possibility,
      but in case people know what they are doing, it can be used from there
      as well (e.g. via devs that must keep dsts by design anyway).
      
      If a realm is set, the handler returns the non-zero realm. User space
      can set the full 32bit realm for the dst.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c46646d0
  5. 02 10月, 2015 1 次提交
  6. 30 9月, 2015 2 次提交
    • D
      net: Add support for filtering neigh dump by master device · 21fdd092
      David Ahern 提交于
      Add support for filtering neighbor dumps by master device by adding
      the NDA_MASTER attribute to the dump request. A new netlink flag,
      NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the
      request and output is filtered as requested.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21fdd092
    • N
      bridge: vlan: add per-vlan struct and move to rhashtables · 2594e906
      Nikolay Aleksandrov 提交于
      This patch changes the bridge vlan implementation to use rhashtables
      instead of bitmaps. The main motivation behind this change is that we
      need extensible per-vlan structures (both per-port and global) so more
      advanced features can be introduced and the vlan support can be
      extended. I've tried to break this up but the moment net_port_vlans is
      changed and the whole API goes away, thus this is a larger patch.
      A few short goals of this patch are:
      - Extensible per-vlan structs stored in rhashtables and a sorted list
      - Keep user-visible behaviour (compressed vlans etc)
      - Keep fastpath ingress/egress logic the same (optimizations to come
        later)
      
      Here's a brief list of some of the new features we'd like to introduce:
      - per-vlan counters
      - vlan ingress/egress mapping
      - per-vlan igmp configuration
      - vlan priorities
      - avoid fdb entries replication (e.g. local fdb scaling issues)
      
      The structure is kept single for both global and per-port entries so to
      avoid code duplication where possible and also because we'll soon introduce
      "port0 / aka bridge as port" which should simplify things further
      (thanks to Vlad for the suggestion!).
      
      Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
      rhashtable, if an entry is added to a port it'll get a pointer to its
      global context so it can be quickly accessed later. There's also a
      sorted vlan list which is used for stable walks and some user-visible
      behaviour such as the vlan ranges, also for error paths.
      VLANs are stored in a "vlan group" which currently contains the
      rhashtable, sorted vlan list and the number of "real" vlan entries.
      A good side-effect of this change is that it resembles how hw keeps
      per-vlan data.
      One important note after this change is that if a VLAN is being looked up
      in the bridge's rhashtable for filtering purposes (or to check if it's an
      existing usable entry, not just a global context) then the new helper
      br_vlan_should_use() needs to be used if the vlan is found. In case the
      lookup is done only with a port's vlan group, then this check can be
      skipped.
      
      Things tested so far:
      - basic vlan ingress/egress
      - pvids
      - untagged vlans
      - undef CONFIG_BRIDGE_VLAN_FILTERING
      - adding/deleting vlans in different scenarios (with/without global ctx,
        while transmitting traffic, in ranges etc)
      - loading/removing the module while having/adding/deleting vlans
      - extracting bridge vlan information (user ABI), compressed requests
      - adding/deleting fdbs on vlans
      - bridge mac change, promisc mode
      - default pvid change
      - kmemleak ON during the whole time
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2594e906
  7. 25 9月, 2015 1 次提交
    • J
      lwtunnel: remove source and destination UDP port config option · b194f30c
      Jiri Benc 提交于
      The UDP tunnel config is asymmetric wrt. to the ports used. The source and
      destination ports from one direction of the tunnel are not related to the
      ports of the other direction. We need to be able to respond to ARP requests
      using the correct ports without involving routing.
      
      As the consequence, UDP ports need to be fixed property of the tunnel
      interface and cannot be set per route. Remove the ability to set ports per
      route. This is still okay to do, as no kernel has been released with these
      attributes yet.
      
      Note that the ability to specify source and destination ports is preserved
      for other users of the lwtunnel API which don't use routes for tunnel key
      specification (like openvswitch).
      
      If in the future we rework ARP handling to allow port specification, the
      attributes can be added back.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b194f30c
  8. 23 9月, 2015 1 次提交
  9. 18 9月, 2015 2 次提交
    • A
      bpf: add bpf_redirect() helper · 27b29f63
      Alexei Starovoitov 提交于
      Existing bpf_clone_redirect() helper clones skb before redirecting
      it to RX or TX of destination netdev.
      Introduce bpf_redirect() helper that does that without cloning.
      
      Benchmarked with two hosts using 10G ixgbe NICs.
      One host is doing line rate pktgen.
      Another host is configured as:
      $ tc qdisc add dev $dev ingress
      $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
         action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
      so it receives the packet on $dev and immediately xmits it on $dev + 1
      The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
      that does bpf_clone_redirect() and performance is 2.0 Mpps
      
      $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
         action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
      which is using bpf_redirect() - 2.4 Mpps
      
      and using cls_bpf with integrated actions as:
      $ tc filter add dev $dev root pref 10 \
        bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
      performance is 2.5 Mpps
      
      To summarize:
      u32+act_bpf using clone_redirect - 2.0 Mpps
      u32+act_bpf using redirect - 2.4 Mpps
      cls_bpf using redirect - 2.5 Mpps
      
      For comparison linux bridge in this setup is doing 2.1 Mpps
      and ixgbe rx + drop in ip_rcv - 7.8 Mpps
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27b29f63
    • D
      cls_bpf: introduce integrated actions · 045efa82
      Daniel Borkmann 提交于
      Often cls_bpf classifier is used with single action drop attached.
      Optimize this use case and let cls_bpf return both classid and action.
      For backwards compatibility reasons enable this feature under
      TCA_BPF_FLAG_ACT_DIRECT flag.
      
      Then more interesting programs like the following are easier to write:
      int cls_bpf_prog(struct __sk_buff *skb)
      {
        /* classify arp, ip, ipv6 into different traffic classes
         * and drop all other packets
         */
        switch (skb->protocol) {
        case htons(ETH_P_ARP):
          skb->tc_classid = 1;
          break;
        case htons(ETH_P_IP):
          skb->tc_classid = 2;
          break;
        case htons(ETH_P_IPV6):
          skb->tc_classid = 3;
          break;
        default:
          return TC_ACT_SHOT;
        }
      
        return TC_ACT_OK;
      }
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      045efa82
  10. 16 9月, 2015 1 次提交