1. 06 9月, 2014 4 次提交
    • G
      ethtool: Add generic options for tunables · f0db9b07
      Govindarajulu Varadarajan 提交于
      This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting
      tunable values from driver.
      
      Add get_tunable and set_tunable to ethtool_ops. Driver implements these
      functions for getting/setting tunable value.
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0db9b07
    • G
      enic: implement rx_copybreak · a03bb56e
      Govindarajulu Varadarajan 提交于
      Calling dma_map_single()/dma_unmap_single() is quite expensive compared
      to copying a small packet. So let's copy short frames and keep the buffers
      mapped.
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a03bb56e
    • D
      dev_ioctl: remove dev_load() CAP_SYS_MODULE message · e020836d
      Daniel Borkmann 提交于
      Marcel reported to see the following message when autoloading
      is being triggered when adding nlmon device:
      
        Loading kernel module for a network device with
        CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias
        netdev-nlmon instead.
      
      This false-positive happens despite with having correct
      capabilities set, e.g. through issuing `ip link del dev nlmon`
      more than once on a valid device with name nlmon, but Marcel
      has also seen it on creation time when no nlmon module is
      previously compiled-in or loaded as module and the device
      name equals a link type name (e.g. nlmon, vxlan, team).
      
      Stephen says:
      
        The netdev module alias is a hold over from the past. For
        normal devices, people used to create a alias eth0 to and
        point it to the type of network device used, that was back
        in the bad old ISA days before real discovery.
      
        Also, the tunnels create module alias for the control device
        and ip used to use this to autoload the tunnel device.
      
        The message is bogus and should just be removed, I also see
        it in a couple of other cases where tap devices are renamed
        for other usese.
      
      As mentioned in 8909c9ad ("net: don't allow CAP_NET_ADMIN
      to load non-netdev kernel modules"), we nevertheless still
      might want to leave the old autoloading behaviour in place
      as it could break old scripts, so for now, lets just remove
      the log message as Stephen suggests.
      
      Reference: http://thread.gmane.org/gmane.linux.kernel/1105168Reported-by: NMarcel Holtmann <marcel@holtmann.org>
      Suggested-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e020836d
    • D
      net: bpf: make eBPF interpreter images read-only · 60a3b225
      Daniel Borkmann 提交于
      With eBPF getting more extended and exposure to user space is on it's way,
      hardening the memory range the interpreter uses to steer its command flow
      seems appropriate.  This patch moves the to be interpreted bytecode to
      read-only pages.
      
      In case we execute a corrupted BPF interpreter image for some reason e.g.
      caused by an attacker which got past a verifier stage, it would not only
      provide arbitrary read/write memory access but arbitrary function calls
      as well. After setting up the BPF interpreter image, its contents do not
      change until destruction time, thus we can setup the image on immutable
      made pages in order to mitigate modifications to that code. The idea
      is derived from commit 314beb9b ("x86: bpf_jit_comp: secure bpf jit
      against spraying attacks").
      
      This is possible because bpf_prog is not part of sk_filter anymore.
      After setup bpf_prog cannot be altered during its life-time. This prevents
      any modifications to the entire bpf_prog structure (incl. function/JIT
      image pointer).
      
      Every eBPF program (including classic BPF that are migrated) have to call
      bpf_prog_select_runtime() to select either interpreter or a JIT image
      as a last setup step, and they all are being freed via bpf_prog_free(),
      including non-JIT. Therefore, we can easily integrate this into the
      eBPF life-time, plus since we directly allocate a bpf_prog, we have no
      performance penalty.
      
      Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
      inspection of kernel_page_tables.  Brad Spengler proposed the same idea
      via Twitter during development of this patch.
      
      Joint work with Hannes Frederic Sowa.
      Suggested-by: NBrad Spengler <spender@grsecurity.net>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Kees Cook <keescook@chromium.org>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60a3b225
  2. 05 9月, 2014 3 次提交
  3. 04 9月, 2014 4 次提交
  4. 03 9月, 2014 21 次提交
  5. 02 9月, 2014 8 次提交
    • W
      sock: deduplicate errqueue dequeue · 364a9e93
      Willem de Bruijn 提交于
      sk->sk_error_queue is dequeued in four locations. All share the
      exact same logic. Deduplicate.
      
      Also collapse the two critical sections for dequeue (at the top of
      the recv handler) and signal (at the bottom).
      
      This moves signal generation for the next packet forward, which should
      be harmless.
      
      It also changes the behavior if the recv handler exits early with an
      error. Previously, a signal for follow-up packets on the errqueue
      would then not be scheduled. The new behavior, to always signal, is
      arguably a bug fix.
      
      For rxrpc, the change causes the same function to be called repeatedly
      for each queued packet (because the recv handler == sk_error_report).
      It is likely that all packets will fail for the same reason (e.g.,
      memory exhaustion).
      
      This code runs without sk_lock held, so it is not safe to trust that
      sk->sk_err is immutable inbetween releasing q->lock and the subsequent
      test. Introduce int err just to avoid this potential race.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364a9e93
    • W
      net-timestamp: expand documentation · 8fe2f761
      Willem de Bruijn 提交于
      Expand Documentation/networking/timestamping.txt with new
      interfaces and bytestream timestamping. Also minor
      cleanup of the other text.
      
      Import txtimestamp.c test of the new features.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fe2f761
    • D
      Merge branch 'csums-next' · c5a65680
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      net: Checksum offload changes - Part VI
      
      I am working on overhauling RX checksum offload. Goals of this effort
      are:
      
      - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
      - Preserve CHECKSUM_COMPLETE through encapsulation layers
      - Don't do skb_checksum more than once per packet
      - Unify GRO and non-GRO csum verification as much as possible
      - Unify the checksum functions (checksum_init)
      - Simplify code
      
      What is in this seventh patch set:
      
      - Add skb->csum. This allows a device or GRO to indicate that an
        invalid checksum was detected.
      - Checksum unncessary to checksum complete conversions.
      
      With these changes, I believe that the third goal of the overhaul is
      now mostly achieved. In the case of no encapsulation or one layer of
      encapsulation, there should only be at most one skb_checksum over
      each packet (between GRO and normal path). In the case of two layers
      of encapsulation, it is still possible with the right combination of
      non-zero and zero UDP checksums to have >1 skb_checksum. For instance:
      IP>GRE(with csum)>IP>UDP(zero csum)>VXLAN>IP>UDP(non-zero csum),
      would likely necessiate an skb_checksum in GRO and normal path.
      This doesn't seem like a common scenario at all so I'm inclined to
      not address this now, if multiple layers of encapsulation becomes
      popular we can reassess.
      
      Note that checksum conversion shows a nice improvement for RX VXLAN when
      outer UDP checksum is enabled (12.65% CPU compared to 20.94%). This
      is not only from the fact that we don't need checksum calculation on
      the host, but also allows GRO for VXLAN in this case. Checksum
      conversion does not help send side (which still needs to perform
      a checksum on host). For that we will implement remote checksum offload
      in a later patch
      (http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00).
      
      Please review carefully and test if possible, mucking with basic
      checksum functions is always a little precarious :-)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5a65680
    • T
    • T
    • T
      gre: Add support for checksum unnecessary conversions · 884d338c
      Tom Herbert 提交于
      Call skb_checksum_try_convert and skb_gro_checksum_try_convert
      after checksum is found present and validated in the GRE header
      for normal and GRO paths respectively.
      
      In GRO path, call skb_gro_checksum_try_convert
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      884d338c
    • T
      udp: Add support for doing checksum unnecessary conversion · 2abb7cdc
      Tom Herbert 提交于
      Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE
      conversion in UDP tunneling path.
      
      In the normal UDP path, we call skb_checksum_try_convert after locating
      the UDP socket. The check is that checksum conversion is enabled for
      the socket (new flag in UDP socket) and that checksum field is
      non-zero.
      
      In the UDP GRO path, we call skb_gro_checksum_try_convert after
      checksum is validated and checksum field is non-zero. Since this is
      already in GRO we assume that checksum conversion is always wanted.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2abb7cdc
    • T
      net: Infrastructure for checksum unnecessary conversions · d96535a1
      Tom Herbert 提交于
      For normal path, added skb_checksum_try_convert which is called
      to attempt to convert CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE. The
      primary condition to allow this is that ip_summed is CHECKSUM_NONE
      and csum_valid is true, which will be the state after consuming
      a CHECKSUM_UNNECESSARY.
      
      For GRO path, added skb_gro_checksum_try_convert which is the GRO
      analogue of skb_checksum_try_convert. The primary condition to allow
      this is that NAPI_GRO_CB(skb)->csum_cnt == 0 and
      NAPI_GRO_CB(skb)->csum_valid is set. This implies that we have consumed
      all available CHECKSUM_UNNECESSARY checksums in the GRO path.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d96535a1