1. 16 11月, 2016 1 次提交
    • H
      igmp: do not remove igmp souce list info when set link down · 24803f38
      Hangbin Liu 提交于
      In commit 24cf3af3 ("igmp: call ip_mc_clear_src..."), we forgot to remove
      igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src().
      This make us clear all IGMPv3 source filter info after NETDEV_DOWN.
      Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need
      ip_mc_clear_src() in ip_mc_destroy_dev().
      
      On the other hand, we should restore back instead of free all source filter
      info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source
      filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE.
      
      Fixes: 24cf3af3 ("igmp: call ip_mc_clear_src() only when ...")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24803f38
  2. 09 8月, 2016 1 次提交
  3. 04 3月, 2016 1 次提交
    • B
      mld, igmp: Fix reserved tailroom calculation · 1837b2e2
      Benjamin Poirier 提交于
      The current reserved_tailroom calculation fails to take hlen and tlen into
      account.
      
      skb:
      [__hlen__|__data____________|__tlen___|__extra__]
      ^                                               ^
      head                                            skb_end_offset
      
      In this representation, hlen + data + tlen is the size passed to alloc_skb.
      "extra" is the extra space made available in __alloc_skb because of
      rounding up by kmalloc. We can reorder the representation like so:
      
      [__hlen__|__data____________|__extra__|__tlen___]
      ^                                               ^
      head                                            skb_end_offset
      
      The maximum space available for ip headers and payload without
      fragmentation is min(mtu, data + extra). Therefore,
      reserved_tailroom
      = data + extra + tlen - min(mtu, data + extra)
      = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
      = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
      
      Compare the second line to the current expression:
      reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
      and we can see that hlen and tlen are not taken into account.
      
      The min() in the third line can be expanded into:
      if mtu < skb_tailroom - tlen:
      	reserved_tailroom = skb_tailroom - mtu
      else:
      	reserved_tailroom = tlen
      
      Depending on hlen, tlen, mtu and the number of multicast address records,
      the current code may output skbs that have less tailroom than
      dev->needed_tailroom or it may output more skbs than needed because not all
      space available is used.
      
      Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1837b2e2
  4. 17 2月, 2016 1 次提交
  5. 11 2月, 2016 4 次提交
  6. 04 12月, 2015 1 次提交
    • A
      ipv4: igmp: Allow removing groups from a removed interface · 4eba7bb1
      Andrew Lunn 提交于
      When a multicast group is joined on a socket, a struct ip_mc_socklist
      is appended to the sockets mc_list containing information about the
      joined group.
      
      If the interface is hot unplugged, this entry becomes stale. Prior to
      commit 52ad353a ("igmp: fix the problem when mc leave group") it
      was possible to remove the stale entry by performing a
      IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
      the interface. However, this fix enforces that the interface must
      still exist. Thus with time, the number of stale entries grows, until
      sysctl_igmp_max_memberships is reached and then it is not possible to
      join and more groups.
      
      The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
      performed without specifying the interface, either by ifindex or ip
      address. However here we do supply one of these. So loosen the
      restriction on device existence to only apply when the interface has
      not been specified. This then restores the ability to clean up the
      stale entries.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Fixes: 52ad353a "(igmp: fix the problem when mc leave group")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eba7bb1
  7. 05 11月, 2015 1 次提交
  8. 08 10月, 2015 2 次提交
  9. 30 9月, 2015 1 次提交
  10. 29 8月, 2015 1 次提交
    • P
      IGMP: Inhibit reports for local multicast groups · df2cf4a7
      Philip Downey 提交于
      The range of addresses between 224.0.0.0 and 224.0.0.255 inclusive, is
      reserved for the use of routing protocols and other low-level topology
      discovery or maintenance protocols, such as gateway discovery and
      group membership reporting.  Multicast routers should not forward any
      multicast datagram with destination addresses in this range,
      regardless of its TTL.
      
      Currently, IGMP reports are generated for this reserved range of
      addresses even though a router will ignore this information since it
      has no purpose.  However, the presence of reserved group addresses in
      an IGMP membership report uses up network bandwidth and can also
      obscure addresses of interest when inspecting membership reports using
      packet inspection or debug messages.
      
      Although the RFCs for the various version of IGMP (e.g.RFC 3376 for
      v3) do not specify that the reserved addresses be excluded from
      membership reports, it should do no harm in doing so.  In particular
      there should be no adverse effect in any IGMP snooping functionality
      since 224.0.0.x is specifically excluded as per RFC 4541 (IGMP and MLD
      Snooping Switches Considerations) section 2.1.2. Data Forwarding
      Rules:
      
          2) Packets with a destination IP (DIP) address in the 224.0.0.X
             range which are not IGMP must be forwarded on all ports.
      
      IGMP reports for local multicast groups can now be optionally
      inhibited by means of a system control variable (by setting the value
      to zero) e.g.:
          echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports
      
      To retain backwards compatibility the previous behaviour is retained
      by default on system boot or reverted by setting the value back to
      non-zero e.g.:
          echo 1 >  /proc/sys/net/ipv4/igmp_link_local_mcast_reports
      Signed-off-by: NPhilip Downey <pdowney@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df2cf4a7
  11. 14 8月, 2015 1 次提交
  12. 05 5月, 2015 1 次提交
    • L
      net: Export IGMP/MLD message validation code · 9afd85c9
      Linus Lüssing 提交于
      With this patch, the IGMP and MLD message validation functions are moved
      from the bridge code to IPv4/IPv6 multicast files. Some small
      refactoring was done to enhance readibility and to iron out some
      differences in behaviour between the IGMP and MLD parsing code (e.g. the
      skb-cloning of MLD messages is now only done if necessary, just like the
      IGMP part always did).
      
      Finally, these IGMP and MLD message validation functions are exported so
      that not only the bridge can use it but batman-adv later, too.
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9afd85c9
  13. 04 4月, 2015 2 次提交
  14. 26 3月, 2015 1 次提交
  15. 19 3月, 2015 1 次提交
  16. 28 2月, 2015 1 次提交
    • M
      multicast: Extend ip address command to enable multicast group join/leave on · 93a714d6
      Madhu Challa 提交于
      Joining multicast group on ethernet level via "ip maddr" command would
      not work if we have an Ethernet switch that does igmp snooping since
      the switch would not replicate multicast packets on ports that did not
      have IGMP reports for the multicast addresses.
      
      Linux vxlan interfaces created via "ip link add vxlan" have the group option
      that enables then to do the required join.
      
      By extending ip address command with option "autojoin" we can get similar
      functionality for openvswitch vxlan interfaces as well as other tunneling
      mechanisms that need to receive multicast traffic. The kernel code is
      structured similar to how the vxlan driver does a group join / leave.
      
      example:
      ip address add 224.1.1.10/24 dev eth5 autojoin
      ip address del 224.1.1.10/24 dev eth5
      Signed-off-by: NMadhu Challa <challa@noironetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93a714d6
  17. 21 2月, 2015 1 次提交
  18. 17 11月, 2014 1 次提交
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · feb91a02
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feb91a02
  19. 06 11月, 2014 2 次提交
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · 4c672e4b
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c672e4b
    • J
      net: Convert SEQ_START_TOKEN/seq_printf to seq_puts · 1744bea1
      Joe Perches 提交于
      Using a single fixed string is smaller code size than using
      a format and many string arguments.
      
      Reduces overall code size a little.
      
      $ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
         text	   data	    bss	    dec	    hex	filename
        34269	   7012	  14824	  56105	   db29	net/ipv4/igmp.o.new
        34315	   7012	  14824	  56151	   db57	net/ipv4/igmp.o.old
        30078	   7869	  13200	  51147	   c7cb	net/ipv6/mcast.o.new
        30105	   7869	  13200	  51174	   c7e6	net/ipv6/mcast.o.old
        11434	   3748	   8580	  23762	   5cd2	net/ipv6/ip6_flowlabel.o.new
        11491	   3748	   8580	  23819	   5d0b	net/ipv6/ip6_flowlabel.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1744bea1
  20. 05 11月, 2014 1 次提交
  21. 07 10月, 2014 1 次提交
  22. 05 9月, 2014 1 次提交
  23. 23 8月, 2014 1 次提交
  24. 25 7月, 2014 1 次提交
  25. 08 7月, 2014 1 次提交
    • D
      igmp: fix the problem when mc leave group · 52ad353a
      dingtianhong 提交于
      The problem was triggered by these steps:
      
      1) create socket, bind and then setsockopt for add mc group.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
         setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
      
      2) drop the mc group for this socket.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
         setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
      
      3) and then drop the socket, I found the mc group was still used by the dev:
      
         netstat -g
      
         Interface       RefCnt Group
         --------------- ------ ---------------------
         eth2		   1	  255.0.0.37
      
      Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
      to be released for the netdev when drop the socket, but this process was broken when
      route default is NULL, the reason is that:
      
      The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
      is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
      then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
      the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
      released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
      when dropping the socket, the mc_list is NULL and the dev still keep this group.
      
      v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
      	so I add the checking for the in_dev before polling the mc_list, make sure when
      	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
      	The problem would never happened again.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52ad353a
  26. 03 6月, 2014 1 次提交
    • E
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet 提交于
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f156a6
  27. 09 5月, 2014 1 次提交
  28. 15 1月, 2014 2 次提交
  29. 27 12月, 2013 1 次提交
  30. 01 10月, 2013 1 次提交
  31. 20 9月, 2013 1 次提交
    • A
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka 提交于
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      703133de
  32. 10 8月, 2013 2 次提交
    • W
      net: igmp: Allow user-space configuration of igmp unsolicited report interval · 2690048c
      William Manley 提交于
      Adds the new procfs knobs:
      
          /proc/sys/net/ipv4/conf/*/igmpv2_unsolicited_report_interval
          /proc/sys/net/ipv4/conf/*/igmpv3_unsolicited_report_interval
      
      Which will allow userspace configuration of the IGMP unsolicited report
      interval (see below) in milliseconds.  The defaults are 10000ms for IGMPv2
      and 1000ms for IGMPv3 in accordance with RFC2236 and RFC3376.
      
      Background:
      
      If an IGMP join packet is lost you will not receive data sent to the
      multicast group so if no data arrives from that multicast group in a
      period of time after the IGMP join a second IGMP join will be sent.  The
      delay between joins is the "IGMP Unsolicited Report Interval".
      
      Prior to this patch this value was hard coded in the kernel to 10s for
      IGMPv2 and 1s for IGMPv3.  10s is unsuitable for some use-cases, such as
      IPTV as it can cause channel change to be slow in the presence of packet
      loss.
      
      This patch allows the value to be overridden from userspace for both
      IGMPv2 and IGMPv3 such that it can be tuned accoding to the network.
      
      Tested with Wireshark and a simple program to join a (non-existent)
      multicast group.  The distribution of timings for the second join differ
      based upon setting the procfs knobs.
      
      igmpvX_unsolicited_report_interval is intended to follow the pattern
      established by force_igmp_version, and while a procfs entry has been added
      a corresponding sysctl knob has not as it is my understanding that sysctl
      is deprecated[1].
      
      [1]: http://lwn.net/Articles/247243/Signed-off-by: NWilliam Manley <william.manley@youview.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2690048c
    • W
      net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3 · cab70040
      William Manley 提交于
      If an IGMP join packet is lost you will not receive data sent to the
      multicast group so if no data arrives from that multicast group in a
      period of time after the IGMP join a second IGMP join will be sent.  The
      delay between joins is the "IGMP Unsolicited Report Interval".
      
      Previously this value was hard coded to be chosen randomly between 0-10s.
      This can be too long for some use-cases, such as IPTV as it can cause
      channel change to be slow in the presence of packet loss.
      
      The value 10s has come from IGMPv2 RFC2236, which was reduced to 1s in
      IGMPv3 RFC3376.  This patch makes the kernel use the 1s value from the
      later RFC if we are operating in IGMPv3 mode.  IGMPv2 behaviour is
      unaffected.
      
      Tested with Wireshark and a simple program to join a (non-existent)
      multicast group.  The distribution of timings for the second join differ
      based upon setting /proc/sys/net/ipv4/conf/eth0/force_igmp_version.
      Signed-off-by: NWilliam Manley <william.manley@youview.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cab70040