1. 22 10月, 2017 7 次提交
  2. 21 10月, 2017 7 次提交
  3. 20 10月, 2017 6 次提交
  4. 19 10月, 2017 1 次提交
  5. 18 10月, 2017 17 次提交
    • J
      bpf: move knowledge about post-translation offsets out of verifier · 4f9218aa
      Jakub Kicinski 提交于
      Use the fact that verifier ops are now separate from program
      ops to define a separate set of callbacks for verification of
      already translated programs.
      
      Since we expect the analyzer ops to be defined only for
      a small subset of all program types initialize their array
      by hand (don't use linux/bpf_types.h).
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f9218aa
    • J
      bpf: remove the verifier ops from program structure · 00176a34
      Jakub Kicinski 提交于
      Since the verifier ops don't have to be associated with
      the program for its entire lifetime we can move it to
      verifier's struct bpf_verifier_env.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00176a34
    • J
      bpf: split verifier and program ops · 7de16e3a
      Jakub Kicinski 提交于
      struct bpf_verifier_ops contains both verifier ops and operations
      used later during program's lifetime (test_run).  Split the runtime
      ops into a different structure.
      
      BPF_PROG_TYPE() will now append ## _prog_ops or ## _verifier_ops
      to the names.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7de16e3a
    • D
      tcp: Check daddr_cache before use in tracepoint · 386fd5da
      David Ahern 提交于
      Running perf in one window to capture tcp_retransmit_skb tracepoint:
          $ perf record -e tcp:tcp_retransmit_skb -a
      
      And causing a retransmission on an active TCP session (e.g., dropping
      packets in the receiver, changing MTU on the interface to 500 and back
      to 1500) triggers a panic:
      
      [   58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
      [   58.546770] PGD 0 P4D 0
      [   58.547472] Oops: 0000 [#1] SMP
      [   58.548328] Modules linked in: vrf
      [   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
      [   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
      [   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
      [   58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
      [   58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
      [   58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
      [   58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
      [   58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
      [   58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
      [   58.563845] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [   58.564873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
      [   58.566538] Call Trace:
      [   58.566865]  <IRQ>
      [   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
      [   58.567704]  ? tcp_set_ca_state+0x22/0x3f
      [   58.568231]  tcp_retransmit_skb+0x14/0xa3
      [   58.568754]  tcp_retransmit_timer+0x472/0x5e3
      [   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
      [   58.569946]  tcp_write_timer_handler+0x95/0x1e9
      [   58.570548]  tcp_write_timer+0x2a/0x58
      
      Check that daddr_cache is non-NULL before de-referencing.
      
      Fixes: e086101b ("tcp: add a tracepoint for tcp retransmission")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      386fd5da
    • D
      tcp: Use pI6c in tcp tracepoint · fb6ff75e
      David Ahern 提交于
      The compact form for IPv6 addresses is more user friendly than the full
      version. For example:
         compact: 2001:db8:1::1
            full: 2001:0db8:0001:0000:0000:0000:0000:0004i
      
      Update the tcp tracepoint to show the compact form.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb6ff75e
    • K
      inet: frags: Convert timers to use timer_setup() · 78802011
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: linux-wpan@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: netfilter-devel@vger.kernel.org
      Cc: coreteam@netfilter.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: Stefan Schmidt <stefan@osg.samsung.com> # for ieee802154
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78802011
    • K
      inet/connection_sock: Convert timers to use timer_setup() · 59f379f9
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Cc: dccp@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59f379f9
    • K
      net/decnet: Convert timers to use timer_setup() · eb4ddaf4
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Cc: linux-decnet-user@lists.sourceforge.net
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb4ddaf4
    • V
      net: dsa: add dsa_to_port helper · c8652c83
      Vivien Didelot 提交于
      The dsa_port structure is part of DSA core data and must only be updated
      by the later. It is OK and sometimes necessary for the DSA drivers to
      access this data, but this has to be read only.
      
      For that purpose, add a dsa_to_port() helper which returns a const
      pointer to a dsa_port structure which must be used by DSA drivers from
      now on instead of digging into ds->ports[] themselves.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8652c83
    • V
      net: dsa: split dsa_port's netdev member · f8b8b1cd
      Vivien Didelot 提交于
      The dsa_port structure has a "netdev" member, which can be used for
      either the master device, or the slave device, depending on its type.
      
      It is true that today, CPU port are not exposed to userspace, thus the
      port's netdev member can be used to point to its master interface.
      
      But it is still slightly confusing, so split it into more explicit
      "master" and "slave" members inside an anonymous union.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8b8b1cd
    • J
      bpf: cpumap add tracepoints · f9419f7b
      Jesper Dangaard Brouer 提交于
      This adds two tracepoint to the cpumap.  One for the enqueue side
      trace_xdp_cpumap_enqueue() and one for the kthread dequeue side
      trace_xdp_cpumap_kthread().
      
      To mitigate the tracepoint overhead, these are invoked during the
      enqueue/dequeue bulking phases, thus amortizing the cost.
      
      The obvious use-cases are for debugging and monitoring.  The
      non-intuitive use-case is using these as a feedback loop to know the
      system load.  One can imagine auto-scaling by reducing, adding or
      activating more worker CPUs on demand.
      
      V4: tracepoint remove time_limit info, instead add sched info
      
      V8: intro struct bpf_cpu_map_entry members cpu+map_id in this patch
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9419f7b
    • J
      bpf: cpumap xdp_buff to skb conversion and allocation · 1c601d82
      Jesper Dangaard Brouer 提交于
      This patch makes cpumap functional, by adding SKB allocation and
      invoking the network stack on the dequeuing CPU.
      
      For constructing the SKB on the remote CPU, the xdp_buff in converted
      into a struct xdp_pkt, and it mapped into the top headroom of the
      packet, to avoid allocating separate mem.  For now, struct xdp_pkt is
      just a cpumap internal data structure, with info carried between
      enqueue to dequeue.
      
      If a driver doesn't have enough headroom it is simply dropped, with
      return code -EOVERFLOW.  This will be picked up the xdp tracepoint
      infrastructure, to allow users to catch this.
      
      V2: take into account xdp->data_meta
      
      V4:
       - Drop busypoll tricks, keeping it more simple.
       - Skip RPS and Generic-XDP-recursive-reinjection, suggested by Alexei
      
      V5: correct RCU read protection around __netif_receive_skb_core.
      
      V6: Setting TASK_RUNNING vs TASK_INTERRUPTIBLE based on talk with Rik van Riel
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c601d82
    • J
      bpf: XDP_REDIRECT enable use of cpumap · 9c270af3
      Jesper Dangaard Brouer 提交于
      This patch connects cpumap to the xdp_do_redirect_map infrastructure.
      
      Still no SKB allocation are done yet.  The XDP frames are transferred
      to the other CPU, but they are simply refcnt decremented on the remote
      CPU.  This served as a good benchmark for measuring the overhead of
      remote refcnt decrement.  If driver page recycle cache is not
      efficient then this, exposes a bottleneck in the page allocator.
      
      A shout-out to MST's ptr_ring, which is the secret behind is being so
      efficient to transfer memory pointers between CPUs, without constantly
      bouncing cache-lines between CPUs.
      
      V3: Handle !CONFIG_BPF_SYSCALL pointed out by kbuild test robot.
      
      V4: Make Generic-XDP aware of cpumap type, but don't allow redirect yet,
       as implementation require a separate upstream discussion.
      
      V5:
       - Fix a maybe-uninitialized pointed out by kbuild test robot.
       - Restrict bpf-prog side access to cpumap, open when use-cases appear
       - Implement cpu_map_enqueue() as a more simple void pointer enqueue
      
      V6:
       - Allow cpumap type for usage in helper bpf_redirect_map,
         general bpf-prog side restriction moved to earlier patch.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c270af3
    • J
      bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP · 6710e112
      Jesper Dangaard Brouer 提交于
      The 'cpumap' is primarily used as a backend map for XDP BPF helper
      call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
      
      This patch implement the main part of the map.  It is not connected to
      the XDP redirect system yet, and no SKB allocation are done yet.
      
      The main concern in this patch is to ensure the datapath can run
      without any locking.  This adds complexity to the setup and tear-down
      procedure, which assumptions are extra carefully documented in the
      code comments.
      
      V2:
       - make sure array isn't larger than NR_CPUS
       - make sure CPUs added is a valid possible CPU
      
      V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl>
      
      V5:
       - Restrict map allocation to root / CAP_SYS_ADMIN
       - WARN_ON_ONCE if queue is not empty on tear-down
       - Return -EPERM on memlock limit instead of -ENOMEM
       - Error code in __cpu_map_entry_alloc() also handle ptr_ring_cleanup()
       - Moved cpu_map_enqueue() to next patch
      
      V6: all notice by Daniel Borkmann
       - Fix err return code in cpu_map_alloc() introduced in V5
       - Move cpu_possible() check after max_entries boundary check
       - Forbid usage initially in check_map_func_compatibility()
      
      V7:
       - Fix alloc error path spotted by Daniel Borkmann
       - Did stress test adding+removing CPUs from the map concurrently
       - Fixed refcnt issue on cpu_map_entry, kthread started too soon
       - Make sure packets are flushed during tear-down, involved use of
         rcu_barrier() and kthread_run only exit after queue is empty
       - Fix alloc error path in __cpu_map_entry_alloc() for ptr_ring
      
      V8:
       - Nitpicking comments and gramma by Edward Cree
       - Fix missing semi-colon introduced in V7 due to rebasing
       - Move struct bpf_cpu_map_entry members cpu+map_id to tracepoint patch
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6710e112
    • D
      rxrpc: Provide functions for allowing cleaner handling of signals · f4d15fb6
      David Howells 提交于
      Provide a couple of functions to allow cleaner handling of signals in a
      kernel service.  They are:
      
       (1) rxrpc_kernel_get_rtt()
      
           This allows the kernel service to find out the RTT time for a call, so
           as to better judge how large a timeout to employ.
      
           Note, though, that whilst this returns a value in nanoseconds, the
           timeouts can only actually be in jiffies.
      
       (2) rxrpc_kernel_check_life()
      
           This returns a number that is updated when ACKs are received from the
           peer (notably including PING RESPONSE ACKs which we can elicit by
           sending PING ACKs to see if the call still exists on the server).
      
           The caller should compare the numbers of two calls to see if the call
           is still alive.
      
      These can be used to provide an extending timeout rather than returning
      immediately in the case that a signal occurs that would otherwise abort an
      RPC operation.  The timeout would be extended if the server is still
      responsive and the call is still apparently alive on the server.
      
      For most operations this isn't that necessary - but for FS.StoreData it is:
      OpenAFS writes the data to storage as it comes in without making a backup,
      so if we immediately abort it when partially complete on a CTRL+C, say, we
      have no idea of the state of the file after the abort.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f4d15fb6
    • D
      rxrpc: Support service upgrade from a kernel service · a68f4a27
      David Howells 提交于
      Provide support for a kernel service to make use of the service upgrade
      facility.  This involves:
      
       (1) Pass an upgrade request flag to rxrpc_kernel_begin_call().
      
       (2) Make rxrpc_kernel_recv_data() return the call's current service ID so
           that the caller can detect service upgrade and see what the service
           was upgraded to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a68f4a27
    • A
      ethtool: add ethtool_intersect_link_masks · 5a6cd6de
      Alan Brady 提交于
      This function provides a way to intersect two link masks together to
      find the common ground between them.  For example in i40e, the driver
      first generates link masks for what is supported by the PHY type.  The
      driver then gets the link masks for what the NVM supports.  The
      resulting intersection between them yields what can truly be supported.
      Signed-off-by: NAlan Brady <alan.brady@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5a6cd6de
  6. 17 10月, 2017 2 次提交
    • S
      tracing: bpf: Hide bpf trace events when they are not used · 9185a610
      Steven Rostedt (VMware) 提交于
      All the trace events defined in include/trace/events/bpf.h are only
      used when CONFIG_BPF_SYSCALL is defined. But this file gets included by
      include/linux/bpf_trace.h which is included by the networking code with
      CREATE_TRACE_POINTS defined.
      
      If a trace event is created but not used it still has data structures
      and functions created for its use, even though nothing is using them.
      To not waste space, do not define the BPF trace events in bpf.h unless
      CONFIG_BPF_SYSCALL is defined.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9185a610
    • W
      ipv6: only update __use and lastusetime once per jiffy at most · 0da4af00
      Wei Wang 提交于
      In order to not dirty the cacheline too often, we try to only update
      dst->__use and dst->lastusetime at most once per jiffy.
      As dst->lastusetime is only used by ipv6 garbage collector, it should
      be good enough time resolution.
      And __use is only used in ipv6_route_seq_show() to show how many times a
      dst has been used. And as __use is not atomic_t right now, it does not
      show the precise number of usage times anyway. So we think it should be
      OK to only update it at most once per jiffy.
      
      According to my latest syn flood test on a machine with intel Xeon 6th
      gen processor and 2 10G mlx nics bonded together, each with 8 rx queues
      on 2 NUMA nodes:
      With this patch, the packet process rate increases from ~3.49Mpps to
      ~3.75Mpps with a 7% increase rate.
      
      Note: dst_use() is being renamed to dst_hold_and_use() to better specify
      the purpose of the function.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@googl.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0da4af00