1. 28 12月, 2017 1 次提交
  2. 22 12月, 2017 1 次提交
  3. 19 12月, 2017 1 次提交
    • Y
      bpf/cgroup: fix a verification error for a CGROUP_DEVICE type prog · 06ef0ccb
      Yonghong Song 提交于
      The tools/testing/selftests/bpf test program
      test_dev_cgroup fails with the following error
      when compiled with llvm 6.0. (I did not try
      with earlier versions.)
      
        libbpf: load bpf program failed: Permission denied
        libbpf: -- BEGIN DUMP LOG ---
        libbpf:
        0: (61) r2 = *(u32 *)(r1 +4)
        1: (b7) r0 = 0
        2: (55) if r2 != 0x1 goto pc+8
         R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv1 R10=fp0
        3: (69) r2 = *(u16 *)(r1 +0)
        invalid bpf_context access off=0 size=2
        ...
      
      The culprit is the following statement in dev_cgroup.c:
        short type = ctx->access_type & 0xFFFF;
      This code is typical as the ctx->access_type is assigned
      as below in kernel/bpf/cgroup.c:
        struct bpf_cgroup_dev_ctx ctx = {
              .access_type = (access << 16) | dev_type,
              .major = major,
              .minor = minor,
        };
      
      The compiler converts it to u16 access while
      the verifier cgroup_dev_is_valid_access rejects
      any non u32 access.
      
      This patch permits the field access_type to be accessible
      with type u16 and u8 as well.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Tested-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      06ef0ccb
  4. 18 12月, 2017 1 次提交
    • A
      bpf: introduce function calls (function boundaries) · cc8b0b92
      Alexei Starovoitov 提交于
      Allow arbitrary function calls from bpf function to another bpf function.
      
      Since the beginning of bpf all bpf programs were represented as a single function
      and program authors were forced to use always_inline for all functions
      in their C code. That was causing llvm to unnecessary inflate the code size
      and forcing developers to move code to header files with little code reuse.
      
      With a bit of additional complexity teach verifier to recognize
      arbitrary function calls from one bpf function to another as long as
      all of functions are presented to the verifier as a single bpf program.
      New program layout:
      r6 = r1    // some code
      ..
      r1 = ..    // arg1
      r2 = ..    // arg2
      call pc+1  // function call pc-relative
      exit
      .. = r1    // access arg1
      .. = r2    // access arg2
      ..
      call pc+20 // second level of function call
      ...
      
      It allows for better optimized code and finally allows to introduce
      the core bpf libraries that can be reused in different projects,
      since programs are no longer limited by single elf file.
      With function calls bpf can be compiled into multiple .o files.
      
      This patch is the first step. It detects programs that contain
      multiple functions and checks that calls between them are valid.
      It splits the sequence of bpf instructions (one program) into a set
      of bpf functions that call each other. Calls to only known
      functions are allowed. In the future the verifier may allow
      calls to unresolved functions and will do dynamic linking.
      This logic supports statically linked bpf functions only.
      
      Such function boundary detection could have been done as part of
      control flow graph building in check_cfg(), but it's cleaner to
      separate function boundary detection vs control flow checks within
      a subprogram (function) into logically indepedent steps.
      Follow up patches may split check_cfg() further, but not check_subprogs().
      
      Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs.
      These restrictions can be relaxed in the future.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      cc8b0b92
  5. 16 12月, 2017 5 次提交
  6. 13 12月, 2017 2 次提交
    • J
      bpf: add a bpf_override_function helper · 9802d865
      Josef Bacik 提交于
      Error injection is sloppy and very ad-hoc.  BPF could fill this niche
      perfectly with it's kprobe functionality.  We could make sure errors are
      only triggered in specific call chains that we care about with very
      specific situations.  Accomplish this with the bpf_override_funciton
      helper.  This will modify the probe'd callers return value to the
      specified value and set the PC to an override function that simply
      returns, bypassing the originally probed function.  This gives us a nice
      clean way to implement systematic error injection for all of our code
      paths.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9802d865
    • Y
      bpf/tracing: allow user space to query prog array on the same tp · f371b304
      Yonghong Song 提交于
      Commit e87c6bc3 ("bpf: permit multiple bpf attachments
      for a single perf event") added support to attach multiple
      bpf programs to a single perf event.
      Although this provides flexibility, users may want to know
      what other bpf programs attached to the same tp interface.
      Besides getting visibility for the underlying bpf system,
      such information may also help consolidate multiple bpf programs,
      understand potential performance issues due to a large array,
      and debug (e.g., one bpf program which overwrites return code
      may impact subsequent program results).
      
      Commit 2541517c ("tracing, perf: Implement BPF programs
      attached to kprobes") utilized the existing perf ioctl
      interface and added the command PERF_EVENT_IOC_SET_BPF
      to attach a bpf program to a tracepoint. This patch adds a new
      ioctl command, given a perf event fd, to query the bpf program
      array attached to the same perf tracepoint event.
      
      The new uapi ioctl command:
        PERF_EVENT_IOC_QUERY_BPF
      
      The new uapi/linux/perf_event.h structure:
        struct perf_event_query_bpf {
             __u32	ids_len;
             __u32	prog_cnt;
             __u32	ids[0];
        };
      
      User space provides buffer "ids" for kernel to copy to.
      When returning from the kernel, the number of available
      programs in the array is set in "prog_cnt".
      
      The usage:
        struct perf_event_query_bpf *query =
          malloc(sizeof(*query) + sizeof(u32) * ids_len);
        query.ids_len = ids_len;
        err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query);
        if (err == 0) {
          /* query.prog_cnt is the number of available progs,
           * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt
           */
        } else if (errno == ENOSPC) {
          /* query.ids_len number of progs copied,
           * query.prog_cnt is the number of available progs
           */
        } else {
            /* other errors */
        }
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f371b304
  7. 12 12月, 2017 2 次提交
  8. 06 12月, 2017 2 次提交
  9. 05 12月, 2017 2 次提交
    • H
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner 提交于
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      Reported-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
    • L
      bpf: Add access to snd_cwnd and others in sock_ops · f19397a5
      Lawrence Brakmo 提交于
      Adds read access to snd_cwnd and srtt_us fields of tcp_sock. Since these
      fields are only valid if the socket associated with the sock_ops program
      call is a full socket, the field is_fullsock is also added to the
      bpf_sock_ops struct. If the socket is not a full socket, reading these
      fields returns 0.
      
      Note that in most cases it will not be necessary to check is_fullsock to
      know if there is a full socket. The context of the call, as specified by
      the 'op' field, can sometimes determine whether there is a full socket.
      
      The struct bpf_sock_ops has the following fields added:
      
        __u32 is_fullsock;      /* Some TCP fields are only valid if
                                 * there is a full socket. If not, the
                                 * fields read as zero.
      			   */
        __u32 snd_cwnd;
        __u32 srtt_us;          /* Averaged RTT << 3 in usecs */
      
      There is a new macro, SOCK_OPS_GET_TCP32(NAME), to make it easier to add
      read access to more 32 bit tcp_sock fields.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f19397a5
  10. 02 12月, 2017 1 次提交
  11. 28 11月, 2017 2 次提交
    • M
      USB: core: Add type-specific length check of BOS descriptors · 81cf4a45
      Masakazu Mokuno 提交于
      As most of BOS descriptors are longer in length than their header
      'struct usb_dev_cap_header', comparing solely with it is not sufficient
      to avoid out-of-bounds access to BOS descriptors.
      
      This patch adds descriptor type specific length check in
      usb_get_bos_descriptor() to fix the issue.
      Signed-off-by: NMasakazu Mokuno <masakazu.mokuno@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81cf4a45
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  12. 26 11月, 2017 2 次提交
    • D
      uapi: fix linux/kfd_ioctl.h userspace compilation errors · b4d08520
      Dmitry V. Levin 提交于
      Consistently use types provided by <linux/types.h> via <drm/drm.h>
      to fix the following linux/kfd_ioctl.h userspace compilation errors:
      
      /usr/include/linux/kfd_ioctl.h:236:2: error: unknown type name 'uint64_t'
        uint64_t va_addr; /* to KFD */
      /usr/include/linux/kfd_ioctl.h:237:2: error: unknown type name 'uint32_t'
        uint32_t gpu_id; /* to KFD */
      /usr/include/linux/kfd_ioctl.h:238:2: error: unknown type name 'uint32_t'
        uint32_t pad;
      /usr/include/linux/kfd_ioctl.h:243:2: error: unknown type name 'uint64_t'
        uint64_t tile_config_ptr;
      /usr/include/linux/kfd_ioctl.h:245:2: error: unknown type name 'uint64_t'
        uint64_t macro_tile_config_ptr;
      /usr/include/linux/kfd_ioctl.h:249:2: error: unknown type name 'uint32_t'
        uint32_t num_tile_configs;
      /usr/include/linux/kfd_ioctl.h:253:2: error: unknown type name 'uint32_t'
        uint32_t num_macro_tile_configs;
      /usr/include/linux/kfd_ioctl.h:255:2: error: unknown type name 'uint32_t'
        uint32_t gpu_id;  /* to KFD */
      /usr/include/linux/kfd_ioctl.h:256:2: error: unknown type name 'uint32_t'
        uint32_t gb_addr_config; /* from KFD */
      /usr/include/linux/kfd_ioctl.h:257:2: error: unknown type name 'uint32_t'
        uint32_t num_banks;  /* from KFD */
      /usr/include/linux/kfd_ioctl.h:258:2: error: unknown type name 'uint32_t'
        uint32_t num_ranks;  /* from KFD */
      
      Fixes: 6a1c9510 ("drm/amdkfd: Adding new IOCTL for scratch memory v2")
      Fixes: 5d71dbc3 ("drm/amdkfd: Implement image tiling mode support v2")
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      b4d08520
    • S
      uapi: add SPDX identifier to vm_sockets_diag.h · 7bbefcfa
      Stephen Hemminger 提交于
      New file seems to have missed the SPDX license scan and update.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bbefcfa
  13. 25 11月, 2017 1 次提交
  14. 24 11月, 2017 1 次提交
    • D
      rxrpc: Fix call timeouts · a158bdd3
      David Howells 提交于
      Fix the rxrpc call expiration timeouts and make them settable from
      userspace.  By analogy with other rx implementations, there should be three
      timeouts:
      
       (1) "Normal timeout"
      
           This is set for all calls and is triggered if we haven't received any
           packets from the peer in a while.  It is measured from the last time
           we received any packet on that call.  This is not reset by any
           connection packets (such as CHALLENGE/RESPONSE packets).
      
           If a service operation takes a long time, the server should generate
           PING ACKs at a duration that's substantially less than the normal
           timeout so is to keep both sides alive.  This is set at 1/6 of normal
           timeout.
      
       (2) "Idle timeout"
      
           This is set only for a service call and is triggered if we stop
           receiving the DATA packets that comprise the request data.  It is
           measured from the last time we received a DATA packet.
      
       (3) "Hard timeout"
      
           This can be set for a call and specified the maximum lifetime of that
           call.  It should not be specified by default.  Some operations (such
           as volume transfer) take a long time.
      
      Allow userspace to set/change the timeouts on a call with sendmsg, using a
      control message:
      
      	RXRPC_SET_CALL_TIMEOUTS
      
      The data to the message is a number of 32-bit words, not all of which need
      be given:
      
      	u32 hard_timeout;	/* sec from first packet */
      	u32 idle_timeout;	/* msec from packet Rx */
      	u32 normal_timeout;	/* msec from data Rx */
      
      This can be set in combination with any other sendmsg() that affects a
      call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a158bdd3
  15. 21 11月, 2017 2 次提交
  16. 18 11月, 2017 1 次提交
  17. 16 11月, 2017 1 次提交
    • A
      ipv6: sr: update the struct ipv6_sr_hdr · 6ab6a0dd
      Ahmed Abdelsalam 提交于
      The IPv6 Segment Routing Header (SRH) format has been updated (revision 6
      of the SRH ietf draft). The update includes the following SRH fields:
      
      (1) The "First Segment" field changed to be "Last Entry" which contains
      the index, in the Segment List, of the last element of the Segment List.
      
      (2) The 16 bit "reserved" field now is used as a "tag" which tags a packet
      as part of a class or group of packets, e.g.,packets sharing the same
      set of properties.
      
      This patch updates the struct ipv6_sr_hdr, so it complies with the updated
      SRH draft. The 16 bit "reserved" field is changed to be "tag", In addition
      a comment is added to the "first_segment" field, showing that it represents
      the "Last Entry" field of the SRH.
      Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ab6a0dd
  18. 15 11月, 2017 2 次提交
    • D
      uapi: fix linux/tls.h userspace compilation error · b9f3eb49
      Dmitry V. Levin 提交于
      Move inclusion of a private kernel header <net/tcp.h>
      from uapi/linux/tls.h to its only user - net/tls.h,
      to fix the following linux/tls.h userspace compilation error:
      
      /usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory
      
      As to this point uapi/linux/tls.h was totaly unusuable for userspace,
      cleanup this header file further by moving other redundant includes
      to net/tls.h.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Cc: <stable@vger.kernel.org> # v4.13+
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9f3eb49
    • D
      uapi: fix linux/rxrpc.h userspace compilation errors · 0eef304b
      Dmitry V. Levin 提交于
      Consistently use types provided by <linux/types.h> to fix the following
      linux/rxrpc.h userspace compilation errors:
      
      /usr/include/linux/rxrpc.h:24:2: error: unknown type name 'u16'
        u16  srx_service; /* service desired */
      /usr/include/linux/rxrpc.h:25:2: error: unknown type name 'u16'
        u16  transport_type; /* type of transport socket (SOCK_DGRAM) */
      /usr/include/linux/rxrpc.h:26:2: error: unknown type name 'u16'
        u16  transport_len; /* length of transport address */
      
      Use __kernel_sa_family_t instead of sa_family_t the same way
      as uapi/linux/in.h does, to fix the following
      linux/rxrpc.h userspace compilation errors:
      
      /usr/include/linux/rxrpc.h:23:2: error: unknown type name 'sa_family_t'
        sa_family_t srx_family; /* address family */
      /usr/include/linux/rxrpc.h:28:3: error: unknown type name 'sa_family_t'
        sa_family_t family;  /* transport address family */
      
      Fixes: 727f8914 ("rxrpc: Expose UAPI definitions to userspace")
      Cc: <stable@vger.kernel.org> # v4.14
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eef304b
  19. 14 11月, 2017 2 次提交
  20. 13 11月, 2017 5 次提交
    • D
      afs: Lay the groundwork for supporting network namespaces · f044c884
      David Howells 提交于
      Lay the groundwork for supporting network namespaces (netns) to the AFS
      filesystem by moving various global features to a network-namespace struct
      (afs_net) and providing an instance of this as a temporary global variable
      that everything uses via accessor functions for the moment.
      
      The following changes have been made:
      
       (1) Store the netns in the superblock info.  This will be obtained from
           the mounter's nsproxy on a manual mount and inherited from the parent
           superblock on an automount.
      
       (2) The cell list is made per-netns.  It can be viewed through
           /proc/net/afs/cells and also be modified by writing commands to that
           file.
      
       (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
           This is unset by default.
      
       (4) The 'rootcell' module parameter, which sets a cell and VL server list
           modifies the init net namespace, thereby allowing an AFS root fs to be
           theoretically used.
      
       (5) The volume location lists and the file lock manager are made
           per-netns.
      
       (6) The AF_RXRPC socket and associated I/O bits are made per-ns.
      
      The various workqueues remain global for the moment.
      
      Changes still to be made:
      
       (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
           from the old name.
      
       (2) A per-netns subsys needs to be registered for AFS into which it can
           store its per-netns data.
      
       (3) Rather than the AF_RXRPC socket being opened on module init, it needs
           to be opened on the creation of a superblock in that netns.
      
       (4) The socket needs to be closed when the last superblock using it is
           destroyed and all outstanding client calls on it have been completed.
           This prevents a reference loop on the namespace.
      
       (5) It is possible that several namespaces will want to use AFS, in which
           case each one will need its own UDP port.  These can either be set
           through /proc/net/afs/cm_port or the kernel can pick one at random.
           The init_ns gets 7001 by default.
      
      Other issues that need resolving:
      
       (1) The DNS keyring needs net-namespacing.
      
       (2) Where do upcalls go (eg. DNS request-key upcall)?
      
       (3) Need something like open_socket_in_file_ns() syscall so that AFS
           command line tools attempting to operate on an AFS file/volume have
           their RPC calls go to the right place.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f044c884
    • A
      openvswitch: Add meter action support · cd8a6c33
      Andy Zhou 提交于
      Implements OVS kernel meter action support.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd8a6c33
    • A
      openvswitch: Add meter netlink definitions · 57940406
      Andy Zhou 提交于
      Meter has its own netlink family. Define netlink messages and attributes
      for communicating with the user space programs.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57940406
    • D
      netem: support delivering packets in delayed time slots · 836af83b
      Dave Taht 提交于
      Slotting is a crude approximation of the behaviors of shared media such
      as cable, wifi, and LTE, which gather up a bunch of packets within a
      varying delay window and deliver them, relative to that, nearly all at
      once.
      
      It works within the existing loss, duplication, jitter and delay
      parameters of netem. Some amount of inherent latency must be specified,
      regardless.
      
      The new "slot" parameter specifies a minimum and maximum delay between
      transmission attempts.
      
      The "bytes" and "packets" parameters can be used to limit the amount of
      information transferred per slot.
      
      Examples of use:
      
      tc qdisc add dev eth0 root netem delay 200us \
               slot 800us 10ms bytes 64k packets 42
      
      A more correct example, using stacked netem instances and a packet limit
      to emulate a tail drop wifi queue with slots and variable packet
      delivery, with a 200Mbit isochronous underlying rate, and 20ms path
      delay:
      
      tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
               limit 10000
      tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
               slot 800us 10ms bytes 64k packets 42 limit 512
      Signed-off-by: NDave Taht <dave.taht@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      836af83b
    • D
      netem: add uapi to express delay and jitter in nanoseconds · 99803171
      Dave Taht 提交于
      netem userspace has long relied on a horrible /proc/net/psched hack
      to translate the current notion of "ticks" to nanoseconds.
      
      Expressing latency and jitter instead, in well defined nanoseconds,
      increases the dynamic range of emulated delays and jitter in netem.
      
      It will also ease a transition where reducing a tick to nsec
      equivalence would constrain the max delay in prior versions of
      netem to only 4.3 seconds.
      Signed-off-by: NDave Taht <dave.taht@gmail.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99803171
  21. 11 11月, 2017 3 次提交
    • Y
      tcp: retire FACK loss detection · 713bafea
      Yuchung Cheng 提交于
      FACK loss detection has been disabled by default and the
      successor RACK subsumed FACK and can handle reordering better.
      This patch removes FACK to simplify TCP loss recovery.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      713bafea
    • D
      bpf: Revert bpf_overrid_function() helper changes. · f3edacbd
      David S. Miller 提交于
      NACK'd by x86 maintainer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3edacbd
    • M
      net: ipv6: sysctl to specify IPv6 ND traffic class · 2210d6b2
      Maciej Żenczykowski 提交于
      Add a per-device sysctl to specify the default traffic class to use for
      kernel originated IPv6 Neighbour Discovery packets.
      
      Currently this includes:
      
        - Router Solicitation (ICMPv6 type 133)
          ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Neighbour Solicitation (ICMPv6 type 135)
          ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Neighbour Advertisement (ICMPv6 type 136)
          ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Redirect (ICMPv6 type 137)
          ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()
      
      and if the kernel ever gets around to generating RA's,
      it would presumably also include:
      
        - Router Advertisement (ICMPv6 type 134)
          (radvd daemon could pick up on the kernel setting and use it)
      
      Interface drivers may examine the Traffic Class value and translate
      the DiffServ Code Point into a link-layer appropriate traffic
      prioritization scheme.  An example of mapping IETF DSCP values to
      IEEE 802.11 User Priority values can be found here:
      
          https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11
      
      The expected primary use case is to properly prioritize ND over wifi.
      
      Testing:
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        0
        jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        -bash: echo: write error: Invalid argument
        jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        -bash: echo: write error: Invalid argument
        jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        255
        jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        34
      
        jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
        tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
        IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
        jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
        length 24, tgt is jzem22.pgc, Flags [solicited]
      
      (based on original change written by Erik Kline, with minor changes)
      
      v2: fix 'suspicious rcu_dereference_check() usage'
          by explicitly grabbing the rcu_read_lock.
      
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NErik Kline <ek@google.com>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2210d6b2