1. 04 6月, 2018 1 次提交
  2. 20 4月, 2018 1 次提交
  3. 18 4月, 2018 6 次提交
    • D
      net/ipv6: Flip FIB entries to fib6_info · 8d1c802b
      David Ahern 提交于
      Convert all code paths referencing a FIB entry from
      rt6_info to fib6_info.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d1c802b
    • D
      net/ipv6: separate handling of FIB entries from dst based routes · 93531c67
      David Ahern 提交于
      Last step before flipping the data type for FIB entries:
      - use fib6_info_alloc to create FIB entries in ip6_route_info_create
        and addrconf_dst_alloc
      - use fib6_info_release in place of dst_release, ip6_rt_put and
        rt6_release
      - remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
      - when purging routes, drop per-cpu routes
      - replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
      - use rt->from since it points to the FIB entry
      - drop references to exception bucket, fib6_metrics and per-cpu from
        dst entries (those are relevant for fib entries only)
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93531c67
    • D
      net/ipv6: Create a neigh_lookup for FIB entries · f8a1b43b
      David Ahern 提交于
      The router discovery code has a FIB entry and wants to validate the
      gateway has a neighbor entry. Refactor the existing dst_neigh_lookup
      for IPv6 and create a new function that takes the gateway and device
      and returns a neighbor entry. Use the new function in
      ndisc_router_discovery to validate the gateway.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8a1b43b
    • D
      net/ipv6: move expires into rt6_info · 14895687
      David Ahern 提交于
      Add expires to rt6_info for FIB entries, and add fib6 helpers to
      manage it. Data path use of dst.expires remains.
      
      The transition is fairly straightforward: when working with fib entries,
      rt->dst.expires is just rt->expires, rt6_clean_expires is replaced with
      fib6_clean_expires, rt6_set_expires becomes fib6_set_expires, and
      rt6_check_expired becomes fib6_check_expired, where the fib6 versions
      are added by this patch.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14895687
    • D
      net/ipv6: move metrics from dst to rt6_info · d4ead6b3
      David Ahern 提交于
      Similar to IPv4, add fib metrics to the fib struct, which at the moment
      is rt6_info. Will be moved to fib6_info in a later patch. Copy metrics
      into dst by reference using refcount.
      
      To make the transition:
      - add dst_metrics to rt6_info. Default to dst_default_metrics if no
        metrics are passed during route add. No need for a separate pmtu
        entry; it can reference the MTU slot in fib6_metrics
      
      - ip6_convert_metrics allocates memory in the FIB entry and uses
        ip_metrics_convert to copy from netlink attribute to metrics entry
      
      - the convert metrics call is done in ip6_route_info_create simplifying
        the route add path
        + fib6_commit_metrics and fib6_copy_metrics and the temporary
          mx6_config are no longer needed
      
      - add fib6_metric_set helper to change the value of a metric in the
        fib entry since dst_metric_set can no longer be used
      
      - cow_metrics for IPv6 can drop to dst_cow_metrics_generic
      
      - rt6_dst_from_metrics_check is no longer needed
      
      - rt6_fill_node needs the FIB entry and dst as separate arguments to
        keep compatibility with existing output. Current dst address is
        renamed to dest.
        (to be consistent with IPv4 rt6_fill_node really should be split
        into 2 functions similar to fib_dump_info and rt_fill_info)
      
      - rt6_fill_node no longer needs the temporary metrics variable
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4ead6b3
    • D
      net/ipv6: Pass net namespace to route functions · afb1d4b5
      David Ahern 提交于
      Pass network namespace reference into route add, delete and get
      functions.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afb1d4b5
  4. 28 3月, 2018 1 次提交
  5. 16 3月, 2018 1 次提交
    • D
      net/ipv6: Change address check to always take a device argument · 232378e8
      David Ahern 提交于
      ipv6_chk_addr_and_flags determines if an address is a local address and
      optionally if it is an address on a specific device. For example, it is
      called by ip6_route_info_create to determine if a given gateway address
      is a local address. The address check currently does not consider L3
      domains and as a result does not allow a route to be added in one VRF
      if the nexthop points to an address in a second VRF. e.g.,
      
          $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
          Error: Invalid gateway address.
      
      where 2001:db8:102::23 is an address on an interface in vrf r1.
      
      ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
      with a separate argument to not limit the address to the specific device.
      The device is used used to determine the L3 domain of interest.
      
      To that end add an argument to skip the device check and update callers
      to always pass a device where possible and use the new argument to mean
      any address in the domain.
      
      Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
      patch handles the change to these callers without adding the domain check.
      
      ip6_validate_gw needs to handle 2 cases - one where the device is given
      as part of the nexthop spec and the other where the device is resolved.
      There is at least 1 VRF case where deferring the check to only after
      the route lookup has resolved the device fails with an unintuitive error
      "RTNETLINK answers: No route to host" as opposed to the preferred
      "Error: Gateway can not be a local address." The 'no route to host'
      error is because of the fallback to a full lookup. The check is done
      twice to avoid this error.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232378e8
  6. 10 3月, 2018 1 次提交
    • L
      ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option() · 9f62c15f
      Lorenzo Bianconi 提交于
      Fix the following slab-out-of-bounds kasan report in
      ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not
      linear and the accessed data are not in the linear data region of orig_skb.
      
      [ 1503.122508] ==================================================================
      [ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990
      [ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932
      
      [ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124
      [ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
      [ 1503.123527] Call Trace:
      [ 1503.123579]  <IRQ>
      [ 1503.123638]  print_address_description+0x6e/0x280
      [ 1503.123849]  kasan_report+0x233/0x350
      [ 1503.123946]  memcpy+0x1f/0x50
      [ 1503.124037]  ndisc_send_redirect+0x94e/0x990
      [ 1503.125150]  ip6_forward+0x1242/0x13b0
      [...]
      [ 1503.153890] Allocated by task 1932:
      [ 1503.153982]  kasan_kmalloc+0x9f/0xd0
      [ 1503.154074]  __kmalloc_track_caller+0xb5/0x160
      [ 1503.154198]  __kmalloc_reserve.isra.41+0x24/0x70
      [ 1503.154324]  __alloc_skb+0x130/0x3e0
      [ 1503.154415]  sctp_packet_transmit+0x21a/0x1810
      [ 1503.154533]  sctp_outq_flush+0xc14/0x1db0
      [ 1503.154624]  sctp_do_sm+0x34e/0x2740
      [ 1503.154715]  sctp_primitive_SEND+0x57/0x70
      [ 1503.154807]  sctp_sendmsg+0xaa6/0x1b10
      [ 1503.154897]  sock_sendmsg+0x68/0x80
      [ 1503.154987]  ___sys_sendmsg+0x431/0x4b0
      [ 1503.155078]  __sys_sendmsg+0xa4/0x130
      [ 1503.155168]  do_syscall_64+0x171/0x3f0
      [ 1503.155259]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      [ 1503.155436] Freed by task 1932:
      [ 1503.155527]  __kasan_slab_free+0x134/0x180
      [ 1503.155618]  kfree+0xbc/0x180
      [ 1503.155709]  skb_release_data+0x27f/0x2c0
      [ 1503.155800]  consume_skb+0x94/0xe0
      [ 1503.155889]  sctp_chunk_put+0x1aa/0x1f0
      [ 1503.155979]  sctp_inq_pop+0x2f8/0x6e0
      [ 1503.156070]  sctp_assoc_bh_rcv+0x6a/0x230
      [ 1503.156164]  sctp_inq_push+0x117/0x150
      [ 1503.156255]  sctp_backlog_rcv+0xdf/0x4a0
      [ 1503.156346]  __release_sock+0x142/0x250
      [ 1503.156436]  release_sock+0x80/0x180
      [ 1503.156526]  sctp_sendmsg+0xbb0/0x1b10
      [ 1503.156617]  sock_sendmsg+0x68/0x80
      [ 1503.156708]  ___sys_sendmsg+0x431/0x4b0
      [ 1503.156799]  __sys_sendmsg+0xa4/0x130
      [ 1503.156889]  do_syscall_64+0x171/0x3f0
      [ 1503.156980]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      [ 1503.157158] The buggy address belongs to the object at ffff8800298ab600
                      which belongs to the cache kmalloc-1024 of size 1024
      [ 1503.157444] The buggy address is located 176 bytes inside of
                      1024-byte region [ffff8800298ab600, ffff8800298aba00)
      [ 1503.157702] The buggy address belongs to the page:
      [ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
      [ 1503.158053] flags: 0x4000000000008100(slab|head)
      [ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e
      [ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000
      [ 1503.158523] page dumped because: kasan: bad access detected
      
      [ 1503.158698] Memory state around the buggy address:
      [ 1503.158816]  ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 1503.158988]  ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1503.159338]                    ^
      [ 1503.159436]  ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 1503.159610]  ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 1503.159785] ==================================================================
      [ 1503.159964] Disabling lock debugging due to kernel taint
      
      The test scenario to trigger the issue consists of 4 devices:
      - H0: data sender, connected to LAN0
      - H1: data receiver, connected to LAN1
      - GW0 and GW1: routers between LAN0 and LAN1. Both of them have an
        ethernet connection on LAN0 and LAN1
      On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for
      data from LAN0 to LAN1.
      Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent
      data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send
      buffer size is set to 16K). While data streams are active flush the route
      cache on HA multiple times.
      I have not been able to identify a given commit that introduced the issue
      since, using the reproducer described above, the kasan report has been
      triggered from 4.14 and I have not gone back further.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f62c15f
  7. 08 3月, 2018 1 次提交
  8. 20 2月, 2018 1 次提交
    • K
      net: Convert icmpv6_sk_ops, ndisc_net_ops and igmp6_net_ops · 1a2e9332
      Kirill Tkhai 提交于
      These pernet_operations create and destroy net::ipv6.icmp_sk
      socket, used to send ICMP or error reply.
      
      Nobody can dereference the socket to handle a packet before
      net is initialized, as there is no routing; nobody can do
      that in parallel with exit, as all of devices are moved
      to init_net or destroyed and there are no packets it-flight.
      So, it's possible to mark these pernet_operations as async.
      
      The same for ndisc_net_ops and for igmp6_net_ops. The last
      one also creates and destroys /proc entries.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2e9332
  9. 30 1月, 2018 1 次提交
  10. 11 11月, 2017 1 次提交
    • M
      net: ipv6: sysctl to specify IPv6 ND traffic class · 2210d6b2
      Maciej Żenczykowski 提交于
      Add a per-device sysctl to specify the default traffic class to use for
      kernel originated IPv6 Neighbour Discovery packets.
      
      Currently this includes:
      
        - Router Solicitation (ICMPv6 type 133)
          ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Neighbour Solicitation (ICMPv6 type 135)
          ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Neighbour Advertisement (ICMPv6 type 136)
          ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Redirect (ICMPv6 type 137)
          ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()
      
      and if the kernel ever gets around to generating RA's,
      it would presumably also include:
      
        - Router Advertisement (ICMPv6 type 134)
          (radvd daemon could pick up on the kernel setting and use it)
      
      Interface drivers may examine the Traffic Class value and translate
      the DiffServ Code Point into a link-layer appropriate traffic
      prioritization scheme.  An example of mapping IETF DSCP values to
      IEEE 802.11 User Priority values can be found here:
      
          https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11
      
      The expected primary use case is to properly prioritize ND over wifi.
      
      Testing:
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        0
        jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        -bash: echo: write error: Invalid argument
        jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        -bash: echo: write error: Invalid argument
        jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        255
        jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        34
      
        jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
        tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
        IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
        jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
        length 24, tgt is jzem22.pgc, Flags [solicited]
      
      (based on original change written by Erik Kline, with minor changes)
      
      v2: fix 'suspicious rcu_dereference_check() usage'
          by explicitly grabbing the rcu_read_lock.
      
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NErik Kline <ek@google.com>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2210d6b2
  11. 01 11月, 2017 1 次提交
  12. 30 8月, 2017 1 次提交
  13. 10 8月, 2017 1 次提交
    • D
      net: ipv6: lower ndisc notifier priority below addrconf · 6eb79393
      David Ahern 提交于
      ndisc_notify is used to send unsolicited neighbor advertisements
      (e.g., on a link up). Currently, the ndisc notifier is run before the
      addrconf notifer which means NA's are not sent for link-local addresses
      which are added by the addrconf notifier.
      
      Fix by lowering the priority of the ndisc notifier. Setting the priority
      to ADDRCONF_NOTIFY_PRIORITY - 5 means it runs after addrconf and before
      the route notifier which is ADDRCONF_NOTIFY_PRIORITY - 10.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eb79393
  14. 16 6月, 2017 1 次提交
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
  15. 25 4月, 2017 1 次提交
  16. 18 4月, 2017 1 次提交
  17. 23 3月, 2017 2 次提交
  18. 04 12月, 2016 1 次提交
  19. 11 9月, 2016 1 次提交
  20. 16 6月, 2016 3 次提交
  21. 11 2月, 2016 1 次提交
  22. 24 12月, 2015 1 次提交
  23. 02 12月, 2015 1 次提交
  24. 13 10月, 2015 2 次提交
  25. 08 10月, 2015 1 次提交
  26. 25 9月, 2015 1 次提交
  27. 18 9月, 2015 3 次提交
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • E
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman 提交于
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a26a56
    • E
      net: Merge dst_output and dst_output_sk · 5a70649e
      Eric W. Biederman 提交于
      Add a sock paramter to dst_output making dst_output_sk superfluous.
      Add a skb->sk parameter to all of the callers of dst_output
      Have the callers of dst_output_sk call dst_output.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a70649e
  28. 01 9月, 2015 2 次提交