1. 31 3月, 2021 1 次提交
  2. 27 3月, 2021 1 次提交
  3. 24 2月, 2021 1 次提交
  4. 03 2月, 2021 1 次提交
    • A
      net: ipv6: Emit notification when fib hardware flags are changed · 907eea48
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel,
      but not necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead
      to a routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      It is also possible for a route already installed in hardware to change
      its action and therefore its flags. For example, a host route that is
      trapping packets can be "promoted" to perform decapsulation following
      the installation of an IPinIP/VXLAN tunnel.
      
      Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed. The aim is to provide an indication to user-space
      (e.g., routing daemons) about the state of the route in hardware.
      
      Introduce a sysctl that controls this behavior.
      
      Keep the default value at 0 (i.e., do not emit notifications) for several
      reasons:
      - Multiple RTM_NEWROUTE notification per-route might confuse existing
        routing daemons.
      - Convergence reasons in routing daemons.
      - The extra notifications will negatively impact the insertion rate.
      - Not all users are interested in these notifications.
      
      Move fib6_info_hw_flags_set() to C file because it is no longer a short
      function.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      907eea48
  5. 28 1月, 2021 1 次提交
  6. 21 1月, 2021 1 次提交
  7. 03 12月, 2020 1 次提交
  8. 24 11月, 2020 1 次提交
  9. 01 9月, 2020 1 次提交
  10. 25 8月, 2020 1 次提交
  11. 20 7月, 2020 1 次提交
  12. 20 5月, 2020 1 次提交
    • D
      bpf: Add get{peer, sock}name attach types for sock_addr · 1b66d253
      Daniel Borkmann 提交于
      As stated in 983695fa ("bpf: fix unconnected udp hooks"), the objective
      for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
      transparent to applications. In Cilium we make use of these hooks [0] in
      order to enable E-W load balancing for existing Kubernetes service types
      for all Cilium managed nodes in the cluster. Those backends can be local
      or remote. The main advantage of this approach is that it operates as close
      as possible to the socket, and therefore allows to avoid packet-based NAT
      given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.
      
      This also allows to expose NodePort services on loopback addresses in the
      host namespace, for example. As another advantage, this also efficiently
      blocks bind requests for applications in the host namespace for exposed
      ports. However, one missing item is that we also need to perform reverse
      xlation for inet{,6}_getname() hooks such that we can return the service
      IP/port tuple back to the application instead of the remote peer address.
      
      The vast majority of applications does not bother about getpeername(), but
      in a few occasions we've seen breakage when validating the peer's address
      since it returns unexpectedly the backend tuple instead of the service one.
      Therefore, this trivial patch allows to customise and adds a getpeername()
      as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
      to address this situation.
      
      Simple example:
      
        # ./cilium/cilium service list
        ID   Frontend     Service Type   Backend
        1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80
      
      Before; curl's verbose output example, no getpeername() reverse xlation:
      
        # curl --verbose 1.2.3.4
        * Rebuilt URL to: 1.2.3.4/
        *   Trying 1.2.3.4...
        * TCP_NODELAY set
        * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
        > GET / HTTP/1.1
        > Host: 1.2.3.4
        > User-Agent: curl/7.58.0
        > Accept: */*
        [...]
      
      After; with getpeername() reverse xlation:
      
        # curl --verbose 1.2.3.4
        * Rebuilt URL to: 1.2.3.4/
        *   Trying 1.2.3.4...
        * TCP_NODELAY set
        * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
        > GET / HTTP/1.1
        >  Host: 1.2.3.4
        > User-Agent: curl/7.58.0
        > Accept: */*
        [...]
      
      Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
      peer to the context similar as in inet{,6}_getname() fashion, but API-wise
      this is suboptimal as it always enforces programs having to test for ctx->peer
      which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
      Similarly, the checked return code is on tnum_range(1, 1), but if a use case
      comes up in future, it can easily be changed to return an error code instead.
      Helper and ctx member access is the same as with connect/sendmsg/etc hooks.
      
        [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.cSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
      1b66d253
  13. 19 5月, 2020 2 次提交
  14. 09 5月, 2020 2 次提交
  15. 06 5月, 2020 1 次提交
  16. 28 4月, 2020 2 次提交
  17. 30 3月, 2020 1 次提交
  18. 05 12月, 2019 2 次提交
  19. 27 11月, 2019 1 次提交
    • M
      net: port < inet_prot_sock(net) --> inet_port_requires_bind_service(net, port) · 82f31ebf
      Maciej Żenczykowski 提交于
      Note that the sysctl write accessor functions guarantee that:
        net->ipv4.sysctl_ip_prot_sock <= net->ipv4.ip_local_ports.range[0]
      invariant is maintained, and as such the max() in selinux hooks is actually spurious.
      
      ie. even though
        if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) {
      per logic is the same as
        if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) {
      it is actually functionally equivalent to:
        if (snum < low || snum > high) {
      which is equivalent to:
        if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) {
      even though the first clause is spurious.
      
      But we want to hold on to it in case we ever want to change what what
      inet_port_requires_bind_service() means (for example by changing
      it from a, by default, [0..1024) range to some sort of set).
      
      Test: builds, git 'grep inet_prot_sock' finds no other references
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82f31ebf
  20. 04 7月, 2019 2 次提交
  21. 02 7月, 2019 1 次提交
  22. 06 6月, 2019 1 次提交
  23. 31 5月, 2019 1 次提交
  24. 23 5月, 2019 3 次提交
  25. 20 4月, 2019 1 次提交
  26. 19 4月, 2019 1 次提交
    • S
      ipv6: Add rate limit mask for ICMPv6 messages · 0bc19985
      Stephen Suryaputra 提交于
      To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP
      message types use larger numeric values, a simple bitmask doesn't fit.
      I use large bitmap. The input and output are the in form of list of
      ranges. Set the default to rate limit all error messages but Packet Too
      Big. For Packet Too Big, use ratemask instead of hard-coded.
      
      There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow()
      aren't called. This patch only adds them to icmpv6_echo_reply().
      
      Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says
      that it is also acceptable to rate limit informational messages. Thus,
      I removed the current hard-coded behavior of icmpv6_mask_allow() that
      doesn't rate limit informational messages.
      
      v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL
          isn't defined, expand the description in ip-sysctl.txt and remove
          unnecessary conditional before kfree().
      v3: Inline the bitmap instead of dynamically allocated. Still is a
          pointer to it is needed because of the way proc_do_large_bitmap work.
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bc19985
  27. 18 4月, 2019 1 次提交
    • D
      ipv6: Rename fib6_multipath_select and pass fib6_result · b1d40991
      David Ahern 提交于
      Add 'struct fib6_result' to hold the fib entry and fib6_nh from a fib
      lookup as separate entries, similar to what IPv4 now has with fib_result.
      
      Rename fib6_multipath_select to fib6_select_path, pass fib6_result to
      it, and set f6i and nh in the result once a path selection is done.
      Call fib6_select_path unconditionally for path selection which means
      moving the sibling and oif check to fib6_select_path. To handle the two
      different call paths (2 only call multipath_select if flowi6_oif == 0 and
      the other always calls it), add a new have_oif_match that controls the
      sibling walk if relevant.
      
      Update callers of fib6_multipath_select accordingly and have them use the
      fib6_info and fib6_nh from the result.
      
      This is needed for multipath nexthop objects where a single f6i can
      point to multiple fib6_nh (similar to IPv4).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1d40991
  28. 09 4月, 2019 1 次提交
  29. 30 3月, 2019 1 次提交
  30. 21 3月, 2019 1 次提交
  31. 20 3月, 2019 1 次提交
  32. 14 2月, 2019 1 次提交
  33. 06 1月, 2019 1 次提交