1. 28 5月, 2015 11 次提交
    • S
      pci: Add Cavium PCI vendor id · e5c4708b
      Sunil Goutham 提交于
      This vendor id will be used by network (vNIC), USB (xHCI),
      SATA (AHCI), GPIO, I2C, MMC and maybe other drivers
      for ThunderX SoC.
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5c4708b
    • D
      test_bpf: add similarly conflicting jump test case only for classic · bde28bc6
      Daniel Borkmann 提交于
      While 3b529602 ("test_bpf: add more eBPF jump torture cases")
      added the int3 bug test case only for eBPF, which needs exactly 11
      passes to converge, here's a version for classic BPF with 11 passes,
      and one that would need 70 passes on x86_64 to actually converge for
      being successfully JITed. Effectively, all jumps are being optimized
      out resulting in a JIT image of just 89 bytes (from originally max
      BPF insns), only returning K.
      
      Might be useful as a receipe for folks wanting to craft a test case
      when backporting the fix in commit 3f7352bf ("x86: bpf_jit: fix
      compilation of large bpf programs") while not having eBPF. The 2nd
      one is delegated to the interpreter as the last pass still results
      in shrinking, in other words, this one won't be JITed on x86_64.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde28bc6
    • D
      Merge branch 'sfc-next' · 5474b132
      David S. Miller 提交于
      Edward Cree says:
      
      ====================
      sfc: add MCDI tracing
      
      This patchset adds support for logging MCDI (Management-Controller-to-
       Driver Interface) interactions between the sfc driver and a bound device,
       to aid in debugging.
      Solarflare has a tool to decode the resulting traces and will look to
       open-source this if there is any external interest, but the protocol is
       already detailed in drivers/net/ethernet/sfc/mcdi_pcol.h.
      The logging buffer we allocate per MCDI context is a work area for
       constructing each individual message before logging it with netif_info.
      The reason the buffer is long-lived is simply to avoid the overhead of
       allocating and freeing it every MCDI call, since MCDIs are already known
       to be serialised for other reasons.
      
      --
      v4: remove patch #4, which has already been applied via sshah
      v3: add some explanations to cover letter and patch #4
      v2: avoid long lines in cover letter; fix multiline comment style
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5474b132
    • E
      sfc: add module parameter to enable MCDI logging on new functions · 42ca087f
      Edward Cree 提交于
      As many issues are encountered at probe time, where MCDI logging can't be
       enabled through the sysfs node, this change adds a module parameter
       'mcdi_logging_default', which defaults to false.  When set to true, newly-
       probed functions will have MCDI logging enabled.  The setting can
       subsequently be changed as normal through the sysfs node.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42ca087f
    • E
      sfc: add sysfs entry to control MCDI tracing · e7fef9b4
      Edward Cree 提交于
      MCDI tracing is enabled per-function with a sysfs file
          /sys/class/net/<NET_DEV>/device/mcdi_logging
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7fef9b4
    • E
      sfc: add tracing of MCDI commands · 75aba2a5
      Edward Cree 提交于
      MCDI tracing is conditional on CONFIG_SFC_MCDI_LOGGING, which is enabled
       by default.
      
      Each MCDI command will produce a console line like
          sfc dom:bus:dev:fn ifname: MCDI RPC REQ: xxxxxxxx [yyyyyyyy...]
      where xxxxxxxx etc. are the raw MCDI payload in 32-bit hex chunks.
      The response will then produce a similar line with "RESP" instead of "REQ",
       and containing the MCDI response payload (if any).
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75aba2a5
    • S
      vxlan: release lock after each bucket in vxlan_cleanup · 14e1d0fa
      Sorin Dumitru 提交于
      We're seeing some softlockups from this function when there
      are a lot fdb entries on a vxlan device. Taking the lock for
      each bucket instead of the whole table is enough to fix that.
      Signed-off-by: NSorin Dumitru <sdumitru@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14e1d0fa
    • E
      tcp/dccp: try to not exhaust ip_local_port_range in connect() · 07f4c900
      Eric Dumazet 提交于
      A long standing problem on busy servers is the tiny available TCP port
      range (/proc/sys/net/ipv4/ip_local_port_range) and the default
      sequential allocation of source ports in connect() system call.
      
      If a host is having a lot of active TCP sessions, chances are
      very high that all ports are in use by at least one flow,
      and subsequent bind(0) attempts fail, or have to scan a big portion of
      space to find a slot.
      
      In this patch, I changed the starting point in __inet_hash_connect()
      so that we try to favor even [1] ports, leaving odd ports for bind()
      users.
      
      We still perform a sequential search, so there is no guarantee, but
      if connect() targets are very different, end result is we leave
      more ports available to bind(), and we spread them all over the range,
      lowering time for both connect() and bind() to find a slot.
      
      This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
      is even, ie if start/end values have different parity.
      
      Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
      32768 - 60999 (instead of 32768 - 61000)
      
      There is no change on security aspects here, only some poor hashing
      schemes could be eventually impacted by this change.
      
      [1] : The odd/even property depends on ip_local_port_range values parity
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07f4c900
    • D
      Merge branch 'ip_frag_next' · 837b9955
      David S. Miller 提交于
      Florian Westphal says:
      
      ====================
      net: force refragmentation for DF reassembed skbs
      
      output path tests:
      
          if (skb->len > mtu) ip_fragment()
      
      This breaks connectivity in one corner case:
       If the skb was reassembled, but has the DF bit set and ..
       .. its reassembled size is <= outdev mtu ..
       .. we will forward a DF packet larger than what the sender
          transmitted on wire.
      
      If a router later in the path can't forward this packet, it will send an
      icmp error in response to an mtu that the original sender never exceeded.
      
      This changes ipv4 defrag/output path to
      
      a) force refragmentation for DF reassembled skbs and
      b) set DF bit on all fragments when refragmenting if it was set on original
      frags.
      
      tested via:
      from scapy.all import *
      dip="10.23.42.2"
      payload="A"*1400
      packet=IP(dst=dip,id=12345,flags='DF')/UDP(sport=42,dport=42)/payload
      frags=fragment(packet,fragsize=1200)
      for fragment in frags:
          send(fragment)
      
      Without this patch, we generate fragments without df bit set based
      on the outgoing device mtu when fragmenting after forwarding, ie.
      
      IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204)
          192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
      IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244)
          192.168.7.1 > 10.23.42.2: ip-proto-17
      
      on ingress will either turn into
      
      IP (ttl 63, id 12345, offset 0, flags [+], proto UDP (17), length 1396)
          192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
      IP (ttl 63, id 12345, offset 1376, flags [none], proto UDP (17), length 52)
      
      (mtu 1400: We strip df and send larger fragment), or
      
      IP (ttl 63, id 12345, offset 0, flags [DF], proto UDP (17), length 1428)
          192.168.7.1.42 > 10.23.42.2.42: [udp sum ok] UDP, length 1400
      
      if mtu is 1500.  And in this case things break; router with a smaller mtu
      will send icmp error, but original sender only sent packets <= 1204 byte.
      
      With patch, we keep intent of such fragments and will emit DF-fragments
      that won't exceed 1204 byte in size.
      
      Joint work with Hannes Frederic Sowa.
      
      Changes since v2:
       - split unrelated patches from series
       - rework changelog of patch #2 to better illustrate breakage
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      837b9955
    • F
      ip_fragment: don't forward defragmented DF packet · d6b915e2
      Florian Westphal 提交于
      We currently always send fragments without DF bit set.
      
      Thus, given following setup:
      
      mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
         A           R1              R2         B
      
      Where R1 and R2 run linux with netfilter defragmentation/conntrack
      enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
      will respond with icmp too big error if one of these fragments exceeded
      1400 bytes.
      
      However, if R1 receives fragment sizes 1200 and 100, it would
      forward the reassembled packet without refragmenting, i.e.
      R2 will send an icmp error in response to a packet that was never sent,
      citing mtu that the original sender never exceeded.
      
      The other minor issue is that a refragmentation on R1 will conceal the
      MTU of R2-B since refragmentation does not set DF bit on the fragments.
      
      This modifies ip_fragment so that we track largest fragment size seen
      both for DF and non-DF packets, and set frag_max_size to the largest
      value.
      
      If the DF fragment size is larger or equal to the non-df one, we will
      consider the packet a path mtu probe:
      We set DF bit on the reassembled skb and also tag it with a new IPCB flag
      to force refragmentation even if skb fits outdev mtu.
      
      We will also set DF bit on each fragment in this case.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6b915e2
    • F
      net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper · c5501eb3
      Florian Westphal 提交于
      ip_skb_dst_mtu is small inline helper, but its called in several places.
      
      before: 17061      44       0   17105    42d1 net/ipv4/ip_output.o
      after:  16805      44       0   16849    41d1 net/ipv4/ip_output.o
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5501eb3
  2. 27 5月, 2015 7 次提交
  3. 26 5月, 2015 22 次提交