1. 16 6月, 2017 1 次提交
    • J
      networking: convert many more places to skb_put_zero() · b080db58
      Johannes Berg 提交于
      There were many places that my previous spatch didn't find,
      as pointed out by yuan linyu in various patches.
      
      The following spatch found many more and also removes the
      now unnecessary casts:
      
          @@
          identifier p, p2;
          expression len;
          expression skb;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, len);
          |
          -memset(p, 0, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, sizeof(*p));
          |
          -memset(p, 0, sizeof(*p));
          )
      
          @@
          expression skb, len;
          @@
          -memset(skb_put(skb, len), 0, len);
          +skb_put_zero(skb, len);
      
      Apply it to the tree (with one manual fixup to keep the
      comment in vxlan.c, which spatch removed.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b080db58
  2. 29 3月, 2017 1 次提交
  3. 10 2月, 2017 1 次提交
  4. 17 1月, 2017 1 次提交
    • H
      mld: do not remove mld souce list info when set link down · 1666d49e
      Hangbin Liu 提交于
      This is an IPv6 version of commit 24803f38 ("igmp: do not remove igmp
      souce list..."). In mld_del_delrec(), we will restore back all source filter
      info instead of flush them.
      
      Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
      we should not remove source list info when set link down. Remove
      igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
      ipv6_mc_down().
      
      Also clear all source info after igmp6_group_dropped() instead of in it
      because ipv6_mc_down() will call igmp6_group_dropped().
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1666d49e
  5. 21 10月, 2016 1 次提交
  6. 09 8月, 2016 1 次提交
  7. 04 3月, 2016 1 次提交
    • B
      mld, igmp: Fix reserved tailroom calculation · 1837b2e2
      Benjamin Poirier 提交于
      The current reserved_tailroom calculation fails to take hlen and tlen into
      account.
      
      skb:
      [__hlen__|__data____________|__tlen___|__extra__]
      ^                                               ^
      head                                            skb_end_offset
      
      In this representation, hlen + data + tlen is the size passed to alloc_skb.
      "extra" is the extra space made available in __alloc_skb because of
      rounding up by kmalloc. We can reorder the representation like so:
      
      [__hlen__|__data____________|__extra__|__tlen___]
      ^                                               ^
      head                                            skb_end_offset
      
      The maximum space available for ip headers and payload without
      fragmentation is min(mtu, data + extra). Therefore,
      reserved_tailroom
      = data + extra + tlen - min(mtu, data + extra)
      = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
      = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
      
      Compare the second line to the current expression:
      reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
      and we can see that hlen and tlen are not taken into account.
      
      The min() in the third line can be expanded into:
      if mtu < skb_tailroom - tlen:
      	reserved_tailroom = skb_tailroom - mtu
      else:
      	reserved_tailroom = tlen
      
      Depending on hlen, tlen, mtu and the number of multicast address records,
      the current code may output skbs that have less tailroom than
      dev->needed_tailroom or it may output more skbs than needed because not all
      space available is used.
      
      Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1837b2e2
  8. 17 11月, 2015 1 次提交
  9. 08 10月, 2015 1 次提交
  10. 18 9月, 2015 3 次提交
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • E
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman 提交于
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a26a56
    • E
      net: Merge dst_output and dst_output_sk · 5a70649e
      Eric W. Biederman 提交于
      Add a sock paramter to dst_output making dst_output_sk superfluous.
      Add a skb->sk parameter to all of the callers of dst_output
      Have the callers of dst_output_sk call dst_output.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a70649e
  11. 08 4月, 2015 1 次提交
    • D
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller 提交于
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7026b1dd
  12. 01 4月, 2015 2 次提交
  13. 19 3月, 2015 1 次提交
  14. 28 2月, 2015 2 次提交
  15. 17 11月, 2014 1 次提交
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · feb91a02
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feb91a02
  16. 06 11月, 2014 2 次提交
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · 4c672e4b
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c672e4b
    • J
      net: Convert SEQ_START_TOKEN/seq_printf to seq_puts · 1744bea1
      Joe Perches 提交于
      Using a single fixed string is smaller code size than using
      a format and many string arguments.
      
      Reduces overall code size a little.
      
      $ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
         text	   data	    bss	    dec	    hex	filename
        34269	   7012	  14824	  56105	   db29	net/ipv4/igmp.o.new
        34315	   7012	  14824	  56151	   db57	net/ipv4/igmp.o.old
        30078	   7869	  13200	  51147	   c7cb	net/ipv6/mcast.o.new
        30105	   7869	  13200	  51174	   c7e6	net/ipv6/mcast.o.old
        11434	   3748	   8580	  23762	   5cd2	net/ipv6/ip6_flowlabel.o.new
        11491	   3748	   8580	  23819	   5d0b	net/ipv6/ip6_flowlabel.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1744bea1
  17. 23 9月, 2014 1 次提交
    • D
      ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback · 35f7aa53
      Daniel Borkmann 提交于
      RFC2710 (MLDv1), section 3.7. says:
      
        The length of a received MLD message is computed by taking the
        IPv6 Payload Length value and subtracting the length of any IPv6
        extension headers present between the IPv6 header and the MLD
        message. If that length is greater than 24 octets, that indicates
        that there are other fields present *beyond* the fields described
        above, perhaps belonging to a *future backwards-compatible* version
        of MLD. An implementation of the version of MLD specified in this
        document *MUST NOT* send an MLD message longer than 24 octets and
        MUST ignore anything past the first 24 octets of a received MLD
        message.
      
      RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
      presence of MLDv1 routers:
      
        In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
        operate in version 1 compatibility mode. [...] When Host
        Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
        on that interface. When Host Compatibility Mode is MLDv1, a host
        acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
        on that interface. [...]
      
      While section 8.3.1. specifies *router* behaviour regarding presence
      of MLDv1 routers:
      
        MLDv2 routers may be placed on a network where there is at least
        one MLDv1 router. The following requirements apply:
      
        If an MLDv1 router is present on the link, the Querier MUST use
        the *lowest* version of MLD present on the network. This must be
        administratively assured. Routers that desire to be compatible
        with MLDv1 MUST have a configuration option to act in MLDv1 mode;
        if an MLDv1 router is present on the link, the system administrator
        must explicitly configure all MLDv2 routers to act in MLDv1 mode.
        When in MLDv1 mode, the Querier MUST send periodic General Queries
        truncated at the Multicast Address field (i.e., 24 bytes long),
        and SHOULD also warn about receiving an MLDv2 Query (such warnings
        must be rate-limited). The Querier MUST also fill in the Maximum
        Response Delay in the Maximum Response Code field, i.e., the
        exponential algorithm described in section 5.1.3. is not used. [...]
      
      That means that we should not get queries from different versions of
      MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
      and MRC == MRD (both fields are overlapping within the 24 octet range).
      
      Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
      address *listeners*:
      
        MLDv2 routers may be placed on a network where there are hosts
        that have not yet been upgraded to MLDv2. In order to be compatible
        with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
        mode. MLDv2 routers keep a compatibility mode per multicast address
        record. The compatibility mode of a multicast address is determined
        from the Multicast Address Compatibility Mode variable, which can be
        in one of the two following states: MLDv1 or MLDv2.
      
        The Multicast Address Compatibility Mode of a multicast address
        record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
        *received* for that multicast address. At the same time, the Older
        Version Host Present timer for the multicast address is set to Older
        Version Host Present Timeout seconds. The timer is re-set whenever a
        new MLDv1 Report is received for that multicast address. If the Older
        Version Host Present timer expires, the router switches back to
        Multicast Address Compatibility Mode of MLDv2 for that multicast
        address. [...]
      
      That means, what can happen is the following scenario, that hosts can
      act in MLDv1 compatibility mode when they previously have received an
      MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
      time, an MLDv2 router could start up and transmits MLDv2 startup query
      messages while being unaware of the current operational mode.
      
      Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
      listener report, so that the router according to RFC3810, section 8.3.2.
      would receive that and internally switch to MLDv1 compatibility as well.
      
      Right now, I believe since the initial implementation of MLDv2, Linux
      hosts would just silently drop such MLDv2 queries instead of replying
      with an MLDv1 listener report, which would prevent a MLDv2 router going
      into fallback mode (until it receives other MLDv1 queries).
      
      Since the mapping of MRC to MRD in exactly such cases can make use of
      the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
      aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
      the RFC. Since encodings are the same up to 32767, assume in such a
      situation this value as a hard upper limit we would clamp. We have asked
      one of the RFC authors on that regard, and he mentioned that there seem
      not to be any implementations that make use of that exponential algorithm
      on startup messages. In any case, this patch fixes this MLD
      interoperability issue.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f7aa53
  18. 14 9月, 2014 4 次提交
  19. 10 9月, 2014 1 次提交
  20. 06 9月, 2014 1 次提交
  21. 05 9月, 2014 1 次提交
  22. 25 8月, 2014 1 次提交
    • I
      ipv6: White-space cleansing : Line Layouts · 67ba4152
      Ian Morris 提交于
      This patch makes no changes to the logic of the code but simply addresses
      coding style issues as detected by checkpatch.
      
      Both objdump and diff -w show no differences.
      
      A number of items are addressed in this patch:
      * Multiple spaces converted to tabs
      * Spaces before tabs removed.
      * Spaces in pointer typing cleansed (char *)foo etc.
      * Remove space after sizeof
      * Ensure spacing around comparators such as if statements.
      Signed-off-by: NIan Morris <ipm@chirality.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67ba4152
  23. 27 6月, 2014 1 次提交
  24. 01 4月, 2014 1 次提交
  25. 18 1月, 2014 1 次提交
    • F
      ipv6: send Change Status Report after DAD is completed · 6a7cc418
      Flavio Leitner 提交于
      The RFC 3810 defines two type of messages for multicast
      listeners. The "Current State Report" message, as the name
      implies, refreshes the *current* state to the querier.
      Since the querier sends Query messages periodically, there
      is no need to retransmit the report.
      
      On the other hand, any change should be reported immediately
      using "State Change Report" messages. Since it's an event
      triggered by a change and that it can be affected by packet
      loss, the rfc states it should be retransmitted [RobVar] times
      to make sure routers will receive timely.
      
      Currently, we are sending "Current State Reports" after
      DAD is completed.  Before that, we send messages using
      unspecified address (::) which should be silently discarded
      by routers.
      
      This patch changes to send "State Change Report" messages
      after DAD is completed fixing the behavior to be RFC compliant
      and also to pass TAHI IPv6 testsuite.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a7cc418
  26. 15 1月, 2014 1 次提交
  27. 01 10月, 2013 1 次提交
  28. 05 9月, 2013 5 次提交