1. 21 6月, 2017 13 次提交
    • D
      net: introduce SO_PEERGROUPS getsockopt · 28b5ba2a
      David Herrmann 提交于
      This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
      retrieve the auxiliary groups of the remote peer. It is designed to
      naturally extend SO_PEERCRED. That is, the underlying data is from the
      same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
      is, if the provided buffer is too small, ERANGE is returned and @optlen
      is updated. Otherwise, the information is copied, @optlen is set to the
      actual size, and 0 is returned.
      
      While SO_PEERCRED (and thus `struct ucred') already returns the primary
      group, it lacks the auxiliary group vector. However, nearly all access
      controls (including kernel side VFS and SYSVIPC, but also user-space
      polkit, DBus, ...) consider the entire set of groups, rather than just
      the primary group. But this is currently not possible with pure
      SO_PEERCRED. Instead, user-space has to work around this and query the
      system database for the auxiliary groups of a UID retrieved via
      SO_PEERCRED.
      
      Unfortunately, there is no race-free way to query the auxiliary groups
      of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
      solution is to use getgrouplist(3p), which itself falls back to NSS and
      whatever is configured in nsswitch.conf(3). This effectively checks
      which groups we *would* assign to the user if it logged in *now*. On
      normal systems it is as easy as reading /etc/group, but with NSS it can
      resort to quering network databases (eg., LDAP), using IPC or network
      communication.
      
      Long story short: Whenever we want to use auxiliary groups for access
      checks on IPC, we need further IPC to talk to the user/group databases,
      rather than just relying on SO_PEERCRED and the incoming socket. This
      is unfortunate, and might even result in dead-locks if the database
      query uses the same IPC as the original request.
      
      So far, those recursions / dead-locks have been avoided by using
      primitive IPC for all crucial NSS modules. However, we want to avoid
      re-inventing the wheel for each NSS module that might be involved in
      user/group queries. Hence, we would preferably make DBus (and other IPC
      that supports access-management based on groups) work without resorting
      to the user/group database. This new SO_PEERGROUPS ioctl would allow us
      to make dbus-daemon work without ever calling into NSS.
      
      Cc: Michal Sekletar <msekleta@redhat.com>
      Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
      Reviewed-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28b5ba2a
    • P
      udp: prefetch rmem_alloc in udp_queue_rcv_skb() · dd99e425
      Paolo Abeni 提交于
      On UDP packets processing, if the BH is the bottle-neck, it
      always sees a cache miss while updating rmem_alloc; try to
      avoid it prefetching the value as soon as we have the socket
      available.
      
      Performances under flood with multiple NIC rx queues used are
      unaffected, but when a single NIC rx queue is in use, this
      gives ~10% performance improvement.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd99e425
    • J
      ip6mr: add netlink notifications on mrt6msg cache reports · dd12d15c
      Julien Gomes 提交于
      Add Netlink notifications on cache reports in ip6mr, in addition to the
      existing mrt6msg sent to mroute6_sk.
      Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R.
      
      MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
      same data as their equivalent fields in the mrt6msg header.
      PKT attribute is the packet sent to mroute6_sk, without the added
      mrt6msg header.
      Suggested-by: NRyan Halbrook <halbrook@arista.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd12d15c
    • J
      ipmr: add netlink notifications on igmpmsg cache reports · 5a645dd8
      Julien Gomes 提交于
      Add Netlink notifications on cache reports in ipmr, in addition to the
      existing igmpmsg sent to mroute_sk.
      Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R.
      
      MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
      same data as their equivalent fields in the igmpmsg header.
      PKT attribute is the packet sent to mroute_sk, without the added igmpmsg
      header.
      Suggested-by: NRyan Halbrook <halbrook@arista.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a645dd8
    • J
      rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute · 5f729eaa
      Julien Gomes 提交于
      Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the
      NETLINK_ROUTE family.
      Binding to these groups specifically requires CAP_NET_ADMIN to allow
      multicast of sensitive messages (e.g. mroute cache reports).
      Suggested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f729eaa
    • A
      tcp: md5: hide unused variable · 083a0326
      Arnd Bergmann 提交于
      Changing from a memcpy to per-member comparison left the
      size variable unused:
      
      net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup':
      net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable]
      
      This does not show up when CONFIG_IPV6 is enabled, but the
      variable can be removed either way, along with the now unused
      assignment.
      
      Fixes: 6797318e ("tcp: md5: add an address prefix for key lookup")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      083a0326
    • X
      sctp: handle errors when updating asoc · 5ee8aa68
      Xin Long 提交于
      It's a bad thing not to handle errors when updating asoc. The memory
      allocation failure in any of the functions called in sctp_assoc_update()
      would cause sctp to work unexpectedly.
      
      This patch is to fix it by aborting the asoc and reporting the error when
      any of these functions fails.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ee8aa68
    • X
      sctp: uncork the old asoc before changing to the new one · 8cd5c25f
      Xin Long 提交于
      local_cork is used to decide if it should uncork asoc outq after processing
      some cmds, and it is set when replying or sending msgs. local_cork should
      always have the same value with current asoc q->cork in some way.
      
      The thing is when changing to a new asoc by cmd SET_ASOC, local_cork may
      not be consistent with the current asoc any more. The cmd seqs can be:
      
        SCTP_CMD_UPDATE_ASSOC (asoc)
        SCTP_CMD_REPLY (asoc)
        SCTP_CMD_SET_ASOC (new_asoc)
        SCTP_CMD_DELETE_TCB (new_asoc)
        SCTP_CMD_SET_ASOC (asoc)
        SCTP_CMD_REPLY (asoc)
      
      The 1st REPLY makes OLD asoc q->cork and local_cork both are 1, and the cmd
      DELETE_TCB clears NEW asoc q->cork and local_cork. After asoc goes back to
      OLD asoc, q->cork is still 1 while local_cork is 0. The 2nd REPLY will not
      set local_cork because q->cork is already set and it can't be uncorked and
      sent out because of this.
      
      To keep local_cork consistent with the current asoc q->cork, this patch is
      to uncork the old asoc if local_cork is set before changing to the new one.
      
      Note that the above cmd seqs will be used in the next patch when updating
      asoc and handling errors in it.
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cd5c25f
    • X
      dccp: call inet_add_protocol after register_pernet_subsys in dccp_v6_init · a0f9a4c2
      Xin Long 提交于
      Patch "call inet_add_protocol after register_pernet_subsys in dccp_v4_init"
      fixed a null pointer dereference issue for dccp_ipv4 module.
      
      The same fix is needed for dccp_ipv6 module.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0f9a4c2
    • X
      dccp: call inet_add_protocol after register_pernet_subsys in dccp_v4_init · d5494acb
      Xin Long 提交于
      Now dccp_ipv4 works as a kernel module. During loading this module, if
      one dccp packet is being recieved after inet_add_protocol but before
      register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer
      dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0.
      
      Jianlin found this issue when the following call trace occurred:
      
      [  171.950177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
      [  171.951007] IP: [<ffffffffc0558364>] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4]
      [...]
      [  171.984629] Call Trace:
      [  171.984859]  <IRQ>
      [  171.985061]
      [  171.985213]  [<ffffffffc0559a53>] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4]
      [  171.985711]  [<ffffffff815ca054>] ip_local_deliver_finish+0xb4/0x1f0
      [  171.986309]  [<ffffffff815ca339>] ip_local_deliver+0x59/0xd0
      [  171.986852]  [<ffffffff810cd7a4>] ? update_curr+0x104/0x190
      [  171.986956]  [<ffffffff815c9cda>] ip_rcv_finish+0x8a/0x350
      [  171.986956]  [<ffffffff815ca666>] ip_rcv+0x2b6/0x410
      [  171.986956]  [<ffffffff810c83b4>] ? task_cputime+0x44/0x80
      [  171.986956]  [<ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0
      [  171.986956]  [<ffffffff810d2c51>] ? trigger_load_balance+0x61/0x1e0
      [  171.986956]  [<ffffffff81587188>] __netif_receive_skb+0x18/0x60
      [  171.986956]  [<ffffffff8158841e>] process_backlog+0xae/0x180
      [  171.986956]  [<ffffffff8158799d>] net_rx_action+0x16d/0x380
      [  171.986956]  [<ffffffff81090b7f>] __do_softirq+0xef/0x280
      [  171.986956]  [<ffffffff816b6a1c>] call_softirq+0x1c/0x30
      
      This patch is to move inet_add_protocol after register_pernet_subsys in
      dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp
      packets are processed.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5494acb
    • M
      vxlan: get rid of redundant vxlan_dev.flags · dc5321d7
      Matthias Schiffer 提交于
      There is no good reason to keep the flags twice in vxlan_dev and
      vxlan_config.
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc5321d7
    • Y
    • Y
      net: introduce __skb_put_[zero, data, u8] · de77b966
      yuan linyu 提交于
      follow Johannes Berg, semantic patch file as below,
      @@
      identifier p, p2;
      expression len;
      expression skb;
      type t, t2;
      @@
      (
      -p = __skb_put(skb, len);
      +p = __skb_put_zero(skb, len);
      |
      -p = (t)__skb_put(skb, len);
      +p = __skb_put_zero(skb, len);
      )
      ... when != p
      (
      p2 = (t2)p;
      -memset(p2, 0, len);
      |
      -memset(p, 0, len);
      )
      
      @@
      identifier p;
      expression len;
      expression skb;
      type t;
      @@
      (
      -t p = __skb_put(skb, len);
      +t p = __skb_put_zero(skb, len);
      )
      ... when != p
      (
      -memset(p, 0, len);
      )
      
      @@
      type t, t2;
      identifier p, p2;
      expression skb;
      @@
      t *p;
      ...
      (
      -p = __skb_put(skb, sizeof(t));
      +p = __skb_put_zero(skb, sizeof(t));
      |
      -p = (t *)__skb_put(skb, sizeof(t));
      +p = __skb_put_zero(skb, sizeof(t));
      )
      ... when != p
      (
      p2 = (t2)p;
      -memset(p2, 0, sizeof(*p));
      |
      -memset(p, 0, sizeof(*p));
      )
      
      @@
      expression skb, len;
      @@
      -memset(__skb_put(skb, len), 0, len);
      +__skb_put_zero(skb, len);
      
      @@
      expression skb, len, data;
      @@
      -memcpy(__skb_put(skb, len), data, len);
      +__skb_put_data(skb, data, len);
      
      @@
      expression SKB, C, S;
      typedef u8;
      identifier fn = {__skb_put};
      fresh identifier fn2 = fn ## "_u8";
      @@
      - *(u8 *)fn(SKB, S) = C;
      + fn2(SKB, C);
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de77b966
  2. 20 6月, 2017 2 次提交
  3. 18 6月, 2017 21 次提交
    • F
      net: dsa: Fix legacy probing · 06d4d450
      Florian Fainelli 提交于
      After commit 6d3c8c0d ("net: dsa: Remove master_netdev and
      use dst->cpu_dp->netdev") and a29342e7 ("net: dsa: Associate
      slave network device with CPU port") we would be seeing NULL pointer
      dereferences when accessing dst->cpu_dp->netdev too early. In the legacy
      code, we actually know early in advance the master network device, so
      pass it down to the relevant functions.
      
      Fixes: 6d3c8c0d ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev")
      Fixes: a29342e7 ("net: dsa: Associate slave network device with CPU port")
      Reported-by: NJason Cobham <jcobham@questertangent.com>
      Tested-by: NJason Cobham <jcobham@questertangent.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06d4d450
    • D
      tls: update Kconfig · d807ec65
      Dave Watson 提交于
      Missing crypto deps for some platforms.
      Default to n for new module.
      
      config: m68k-amcore_defconfig (attached as .config)
      compiler: m68k-linux-gcc (GCC) 4.9.0
      
      make.cross ARCH=m68k
      All errors (new ones prefixed by >>):
      
         net/built-in.o: In function `tls_set_sw_offload':
      >> (.text+0x732f8): undefined reference to `crypto_alloc_aead'
         net/built-in.o: In function `tls_set_sw_offload':
      >> (.text+0x7333c): undefined reference to `crypto_aead_setkey'
         net/built-in.o: In function `tls_set_sw_offload':
      >> (.text+0x73354): undefined reference to `crypto_aead_setauthsize'
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d807ec65
    • W
      net: remove DST_NOCACHE flag · a4c2fd7f
      Wei Wang 提交于
      DST_NOCACHE flag check has been removed from dst_release() and
      dst_hold_safe() in a previous patch because all the dst are now ref
      counted properly and can be released based on refcnt only.
      Looking at the rest of the DST_NOCACHE use, all of them can now be
      removed or replaced with other checks.
      So this patch gets rid of all the DST_NOCACHE usage and remove this flag
      completely.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4c2fd7f
    • W
      net: remove DST_NOGC flag · b2a9c0ed
      Wei Wang 提交于
      Now that all the components have been changed to release dst based on
      refcnt only and not depend on dst gc anymore, we can remove the
      temporary flag DST_NOGC.
      
      Note that we also need to remove the DST_NOCACHE check in dst_release()
      and dst_hold_safe() because now all the dst are released based on refcnt
      and behaves as DST_NOCACHE.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2a9c0ed
    • W
      net: remove dst gc related code · 5b7c9a8f
      Wei Wang 提交于
      This patch removes all dst gc related code and all the dst free
      functions
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b7c9a8f
    • W
      decnet: take dst->__refcnt when struct dn_route is created · 560fd93b
      Wei Wang 提交于
      struct dn_route is inserted into dn_rt_hash_table but no dst->__refcnt
      is taken.
      This patch makes sure the dn_rt_hash_table's reference to the dst is ref
      counted.
      
      As the dst is always ref counted properly, we can safely mark
      DST_NOGC flag so dst_release() will release dst based on refcnt only.
      And dst gc is no longer needed and all dst_free() or its related
      function calls should be replaced with dst_release() or
      dst_release_immediate(). And dst_dev_put() is called when removing dst
      from the hash table to release the reference on dst->dev before we lose
      pointer to it.
      
      Also, correct the logic in dn_dst_check_expire() and dn_dst_gc() to
      check dst->__refcnt to be > 1 to indicate it is referenced by other
      users.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      560fd93b
    • W
      xfrm: take refcnt of dst when creating struct xfrm_dst bundle · 52df157f
      Wei Wang 提交于
      During the creation of xfrm_dst bundle, always take ref count when
      allocating the dst. This way, xfrm_bundle_create() will form a linked
      list of dst with dst->child pointing to a ref counted dst child. And
      the returned dst pointer is also ref counted. This makes the link from
      the flow cache to this dst now ref counted properly.
      As the dst is always ref counted properly, we can safely mark
      DST_NOGC flag so dst_release() will release dst based on refcnt only.
      And dst gc is no longer needed and all dst_free() and its related
      function calls should be replaced with dst_release() or
      dst_release_immediate().
      
      The special handling logic for dst->child in dst_destroy() can be
      replaced with a simple dst_release_immediate() call on the child to
      release the whole list linked by dst->child pointer.
      Previously used DST_NOHASH flag is not needed anymore as well. The
      reason that DST_NOHASH is used in the existing code is mainly to prevent
      the dst inserted in the fib tree to be wrongly destroyed during the
      deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH
      flag is marked in all the dst children except the one which is in the
      fib tree.
      However, with this patch series to remove dst gc logic and release dst
      only based on ref count, it is safe to release all the children from a
      xfrm_dst bundle as long as the dst children are all ref counted
      properly which is already the case in the existing code.
      So, this patch removes the use of DST_NOHASH flag.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52df157f
    • W
      ipv6: get rid of icmp6 dst garbage collector · db916649
      Wei Wang 提交于
      icmp6 dst route is currently ref counted during creation and will be
      freed by user during its call of dst_release(). So no need of a garbage
      collector for it.
      Remove all icmp6 dst garbage collector related code.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db916649
    • W
      ipv6: mark DST_NOGC and remove the operation of dst_free() · 587fea74
      Wei Wang 提交于
      With the previous preparation patches, we are ready to get rid of the
      dst gc operation in ipv6 code and release dst based on refcnt only.
      So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls
      to dst_free() and its related functions.
      At this point, all dst created in ipv6 code do not use the dst gc
      anymore and will be destroyed at the point when refcnt drops to 0.
      
      Also, as icmp6 dst route is refcounted during creation and will be freed
      by user during its call of dst_release(), there is no need to add this
      dst to the icmp6 gc list as well.
      Instead, we need to add it into uncached list so that when a
      NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through
      these icmp6 dst as well and release the net device properly.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      587fea74
    • W
      ipv6: call dst_hold_safe() properly · ad65a2f0
      Wei Wang 提交于
      Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when
      necessary to avoid double free issue on the dst.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad65a2f0
    • W
      ipv6: call dst_dev_put() properly · 9514528d
      Wei Wang 提交于
      As the intend of this patch series is to completely remove dst gc,
      we need to call dst_dev_put() to release the reference to dst->dev
      when removing routes from fib because we won't keep the gc list anymore
      and will lose the dst pointer right after removing the routes.
      Without the gc list, there is no way to find all the dst's that have
      dst->dev pointing to the going-down dev.
      Hence, we are doing dst_dev_put() immediately before we lose the last
      reference of the dst from the routing code. The next dst_check() will
      trigger a route re-lookup to find another route (if there is any).
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9514528d
    • W
      ipv6: take dst->__refcnt for insertion into fib6 tree · 1cfb71ee
      Wei Wang 提交于
      In IPv6 routing code, struct rt6_info is created for each static route
      and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
      ref count is not taken.
      As explained in the previous patch, this leads to the need of the dst
      garbage collector.
      
      This patch holds ref count of dst before inserting the route into fib6
      tree and properly releases the dst when deleting it from the fib6 tree
      as a preparation in order to fully get rid of dst gc later.
      
      Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
      no user is referencing the dst.
      
      And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
      dst->__refcnt to 1.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cfb71ee
    • W
      ipv4: mark DST_NOGC and remove the operation of dst_free() · b838d5e1
      Wei Wang 提交于
      With the previous preparation patches, we are ready to get rid of the
      dst gc operation in ipv4 code and release dst based on refcnt only.
      So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
      to dst_free().
      At this point, all dst created in ipv4 code do not use the dst gc
      anymore and will be destroyed at the point when refcnt drops to 0.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b838d5e1
    • W
      ipv4: call dst_hold_safe() properly · 9df16efa
      Wei Wang 提交于
      This patch checks all the calls to
      dst_hold()/skb_dst_force()/dst_clone()/dst_use() to see if
      dst_hold_safe() is needed to avoid double free issue if dst
      gc is removed and dst_release() directly destroys dst when
      dst->__refcnt drops to 0.
      
      In tx path, TCP hold sk->sk_rx_dst ref count and also hold sock_lock().
      UDP and other similar protocols always hold refcount for
      skb->_skb_refdst. So both paths seem to be safe.
      
      In rx path, as it is lockless and skb_dst_set_noref() is likely to be
      used, dst_hold_safe() should always be used when trying to hold dst.
      
      In the routing code, if dst is held during an rcu protected session, it
      is necessary to call dst_hold_safe() as the current dst might be in its
      rcu grace period.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9df16efa
    • W
      ipv4: call dst_dev_put() properly · 95c47f9c
      Wei Wang 提交于
      As the intend of this patch series is to completely remove dst gc,
      we need to call dst_dev_put() to release the reference to dst->dev
      when removing routes from fib because we won't keep the gc list anymore
      and will lose the dst pointer right after removing the routes.
      Without the gc list, there is no way to find all the dst's that have
      dst->dev pointing to the going-down dev.
      Hence, we are doing dst_dev_put() immediately before we lose the last
      reference of the dst from the routing code. The next dst_check() will
      trigger a route re-lookup to find another route (if there is any).
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95c47f9c
    • W
      ipv4: take dst->__refcnt when caching dst in fib · 0830106c
      Wei Wang 提交于
      In IPv4 routing code, fib_nh and fib_nh_exception can hold pointers
      to struct rtable but they never increment dst->__refcnt.
      This leads to the need of the dst garbage collector because when user
      is done with this dst and calls dst_release(), it can only decrement
      dst->__refcnt and can not free the dst even it sees dst->__refcnt
      drops from 1 to 0 (unless DST_NOCACHE flag is set) because the routing
      code might still hold reference to it.
      And when the routing code tries to delete a route, it has to put the
      dst to the gc_list if dst->__refcnt is not yet 0 and have a gc thread
      running periodically to check on dst->__refcnt and finally to free dst
      when refcnt becomes 0.
      
      This patch increments dst->__refcnt when
      fib_nh/fib_nh_exception holds reference to this dst and properly release
      the dst when fib_nh/fib_nh_exception has been updated with a new dst.
      
      This patch is a preparation in order to fully get rid of dst gc later.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0830106c
    • W
      net: introduce a new function dst_dev_put() · 4a6ce2b6
      Wei Wang 提交于
      This function should be called when removing routes from fib tree after
      the dst gc is no longer in use.
      We first mark DST_OBSOLETE_DEAD on this dst to make sure next
      dst_ops->check() fails and returns NULL.
      Secondly, as we no longer keep the gc_list, we need to properly
      release dst->dev right at the moment when the dst is removed from
      the fib/fib6 tree.
      It does the following:
      1. change dst->input and output pointers to dst_discard/dst_dscard_out to
         discard all packets
      2. replace dst->dev with loopback interface
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a6ce2b6
    • W
      net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt · 5f56f409
      Wei Wang 提交于
      The current mechanism of freeing dst is a bit complicated. dst has its
      ref count and when user grabs the reference to the dst, the ref count is
      properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing
      code due to some historic reasons.
      
      If the reference to dst is always taken properly, we should be able to
      simplify the logic in dst_release() to destroy dst when dst->__refcnt
      drops from 1 to 0. And this should be the only condition to determine
      if we can call dst_destroy().
      And as dst is always ref counted, there is no need for a dst garbage
      list to hold the dst entries that already get removed by the routing
      code but are still held by other users. And the task to periodically
      check the list to free dst if ref count become 0 is also not needed
      anymore.
      
      This patch introduces a temporary flag DST_NOGC(no garbage collector).
      If it is set in the dst, dst_release() will call dst_destroy() when
      dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag
      and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free
      issue.
      This temporary flag is mainly used so that we can make the transition
      component by component without breaking other parts.
      This flag will be removed after all components are properly transitioned.
      
      This patch also introduces a new function dst_release_immediate() which
      destroys dst without waiting on the rcu when refcnt drops to 0. It will
      be used in later patches.
      
      Follow-up patches will correct all the places to properly take ref count
      on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will
      be used to release the dst instead of dst_free() and its related
      functions.
      And final clean-up patch will remove the DST_NOGC flag.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f56f409
    • W
      net: use loopback dev when generating blackhole route · 1dbe3252
      Wei Wang 提交于
      Existing ipv4/6_blackhole_route() code generates a blackhole route
      with dst->dev pointing to the passed in dst->dev.
      It is not necessary to hold reference to the passed in dst->dev
      because the packets going through this route are dropped anyway.
      A loopback interface is good enough so that we don't need to worry about
      releasing this dst->dev when this dev is going down.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dbe3252
    • W
      udp: call dst_hold_safe() in udp_sk_rx_set_dst() · d24406c8
      Wei Wang 提交于
      In udp_v4/6_early_demux() code, we try to hold dst->__refcnt for
      dst with DST_NOCACHE flag. This is because later in udp_sk_rx_dst_set()
      function, we will try to cache this dst in sk for connected case.
      However, a better way to achieve this is to not try to hold dst in
      early_demux(), but in udp_sk_rx_dst_set(), call dst_hold_safe(). This
      approach is also more consistant with how tcp is handling it. And it
      will make later changes simpler.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d24406c8
    • W
      ipv6: remove unnecessary dst_hold() in ip6_fragment() · 1758fd46
      Wei Wang 提交于
      In ipv6 tx path, rcu_read_lock() is taken so that dst won't get freed
      during the execution of ip6_fragment(). Hence, no need to hold dst in
      it.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1758fd46
  4. 17 6月, 2017 4 次提交