1. 09 6月, 2015 9 次提交
  2. 08 6月, 2015 15 次提交
  3. 07 6月, 2015 6 次提交
    • A
      bpf: allow programs to write to certain skb fields · d691f9e8
      Alexei Starovoitov 提交于
      allow programs read/write skb->mark, tc_index fields and
      ((struct qdisc_skb_cb *)cb)->data.
      
      mark and tc_index are generically useful in TC.
      cb[0]-cb[4] are primarily used to pass arguments from one
      program to another called via bpf_tail_call() which can
      be seen in sockex3_kern.c example.
      
      All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
      mark, tc_index are writeable from tc_cls_act only.
      cb[0]-cb[4] are writeable by both sockets and tc_cls_act.
      
      Add verifier tests and improve sample code.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d691f9e8
    • A
      bpf: make programs see skb->data == L2 for ingress and egress · 3431205e
      Alexei Starovoitov 提交于
      eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
      For ingress L2 header is already pulled, whereas for egress it's present.
      This is known to program writers which are currently forced to use
      BPF_LL_OFF workaround.
      Since programs don't change skb internal pointers it is safe to do
      pull/push right around invocation of the program and earlier taps and
      later pt->func() will not be affected.
      Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
      around run_filter/BPF_PROG_RUN even if skb_shared.
      
      This fix finally allows programs to use optimized LD_ABS/IND instructions
      without BPF_LL_OFF for higher performance.
      tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
             w/o JIT   w/JIT
      before  20.5     23.6 Mpps
      after   21.8     26.6 Mpps
      
      Old programs with BPF_LL_OFF will still work as-is.
      
      We can now undo most of the earlier workaround commit:
      a166151c ("bpf: fix bpf helpers to use skb->mac_header relative offsets")
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3431205e
    • E
      tcp: remove redundant checks II · 98da81a4
      Eric Dumazet 提交于
      For same reasons than in commit 12e25e10 ("tcp: remove redundant
      checks"), we can remove redundant checks done for timewait sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98da81a4
    • C
      fddi: print an address with %p format specifier rather than %x · 908e80d6
      Colin Ian King 提交于
      The debug is printing the struct smt_header * address using
      the %x format specifier. Fix it to use %p instead.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      908e80d6
    • N
      wan: dscc4: fix build warning Wunused-but-set-variable · c5726d26
      Nicholas Mc Guire 提交于
      Fix:
      drivers/net/wan/dscc4.c: In function 'dscc4_open':
      drivers/net/wan/dscc4.c:1049:25: warning: variable 'ppriv' set but not used
      [-Wunused-but-set-variable]
      
      This has been in there unused since 1da177e4 (Linux-2.6.12-rc2) simply
      remove it.
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5726d26
    • E
      inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations · 90c337da
      Eric Dumazet 提交于
      When an application needs to force a source IP on an active TCP socket
      it has to use bind(IP, port=x).
      
      As most applications do not want to deal with already used ports, x is
      often set to 0, meaning the kernel is in charge to find an available
      port.
      But kernel does not know yet if this socket is going to be a listener or
      be connected.
      It has very limited choices (no full knowledge of final 4-tuple for a
      connect())
      
      With limited ephemeral port range (about 32K ports), it is very easy to
      fill the space.
      
      This patch adds a new SOL_IP socket option, asking kernel to ignore
      the 0 port provided by application in bind(IP, port=0) and only
      remember the given IP address.
      
      The port will be automatically chosen at connect() time, in a way
      that allows sharing a source port as long as the 4-tuples are unique.
      
      This new feature is available for both IPv4 and IPv6 (Thanks Neal)
      
      Tested:
      
      Wrote a test program and checked its behavior on IPv4 and IPv6.
      
      strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
      connect().
      Also getsockname() show that the port is still 0 right after bind()
      but properly allocated after connect().
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
      setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
      bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
      getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
      connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
      getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
      
      IPv6 test :
      
      socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
      setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
      bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
      getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
      connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
      getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
      
      I was able to bind()/connect() a million concurrent IPv4 sockets,
      instead of ~32000 before patch.
      
      lpaa23:~# ulimit -n 1000010
      lpaa23:~# ./bind --connect --num-flows=1000000 &
      1000000 sockets
      
      lpaa23:~# grep TCP /proc/net/sockstat
      TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66
      
      Check that a given source port is indeed used by many different
      connections :
      
      lpaa23:~# ss -t src :40000 | head -10
      State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
      ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
      ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
      ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
      ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
      ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
      ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
      ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
      ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
      ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90c337da
  4. 06 6月, 2015 8 次提交
  5. 05 6月, 2015 2 次提交
    • D
      Merge branch 'mlx5-next' · bb62f791
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      mlx5: Add Interface Step Sequence ID support
      
      ISSI (Interface Step Sequence ID) defines the step sequence ID of the
      interface between the driver to the firmware and is incremented by
      steps of one. ISSI is used to enable deprecating/modifying features,
      command interfaces and such, while maintaining compatibility.
      
      As the driver serves both ConnectIB (CIB) and ConnectX4, we carefully
      made sure that the IB functionality keeps running also on older CIB
      firmware releases that don't support ISSI.
      
      The Ethernet functionailty is available only on ConnectX4 where all
      firmware releases support the feature since the very basic ISSI level.
      So at this point no need for compatility code there.
      
      As done prior to this series, when the Ethernet functionlity is enabled,
      during the initialization flow, the core driver performs a query of the
      supported ISSIs using the QUERY_ISSI command, and then, if ISSI is supported,
      sets the actual issi value informing the firmware on which ISSI level to run,
      using SET_ISSI command.
      
      Previously, the IB driver wasn't ready to work on that mode, and hence
      building both the IB driver and the Ethernet functionality in the core
      driver were disallowed by Kconfigs, with this series, we allow users to
      enable them both.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb62f791
    • H
      mlx5: Enable mutual support for IB and Ethernet · 4aa17b28
      Haggai Abramonvsky 提交于
      Ethernet functionality is only available when working in ISSI > 0 mode.
      
      Previously, the IB driver wasn't ready to work on that mode, and hence
      building both the IB driver and the Ethernet functionality in the core
      driver were disallowed by Kconfigs.
      
      Now, once we have all the pre-steps in place, we can remove this limitation.
      
      The last steps in the IB driver for getting that setup to work are:
      create dummy SRQ for the driver's use (until now we could use XRC_SRQ
      as SRQ and XRC_SRQ, after moving to ISSI > 0, we separate XRC SRQs from
      basic SRQs) and adapt the create QP function to be compatible with ISSI > 0.
      Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4aa17b28