1. 06 1月, 2015 3 次提交
  2. 23 12月, 2014 1 次提交
  3. 11 12月, 2014 1 次提交
  4. 10 12月, 2014 5 次提交
  5. 09 12月, 2014 2 次提交
    • J
      udp: Neaten and reduce size of compute_score functions · 60c04aec
      Joe Perches 提交于
      The compute_score functions are a bit difficult to read.
      
      Neaten them a bit to reduce object sizes and make them a
      bit more intelligible.
      
      Return early to avoid indentation and avoid unnecessary
      initializations.
      
      (allyesconfig, but w/ -O2 and no profiling)
      
      $ size net/ipv[46]/udp.o.*
         text    data     bss     dec     hex filename
        28680    1184      25   29889    74c1 net/ipv4/udp.o.new
        28756    1184      25   29965    750d net/ipv4/udp.o.old
        17600    1010       2   18612    48b4 net/ipv6/udp.o.new
        17632    1010       2   18644    48d4 net/ipv6/udp.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60c04aec
    • W
      net-timestamp: allow reading recv cmsg on errqueue with origin tstamp · 829ae9d6
      Willem de Bruijn 提交于
      Allow reading of timestamps and cmsg at the same time on all relevant
      socket families. One use is to correlate timestamps with egress
      device, by asking for cmsg IP_PKTINFO.
      
      on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
      avoid changing legacy expectations, only do so if the caller sets a
      new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
      
      on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
      returned for all origins. only change is to set ifindex, which is
      not initialized for all error origins.
      
      In both cases, only generate the pktinfo message if an ifindex is
      known. This is not the case for ACK timestamps.
      
      The difference between the protocol families is probably a historical
      accident as a result of the different conditions for generating cmsg
      in the relevant ip(v6)_recv_error function:
      
      ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
      ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
      
      At one time, this was the same test bar for the ICMP/ICMP6
      distinction. This is no longer true.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      Changes
        v1 -> v2
          large rewrite
          - integrate with existing pktinfo cmsg generation code
          - on ipv4: only send with new flag, to maintain legacy behavior
          - on ipv6: send at most a single pktinfo cmsg
          - on ipv6: initialize fields if not yet initialized
      
      The recv cmsg interfaces are also relevant to the discussion of
      whether looping packet headers is problematic. For v6, cmsgs that
      identify many headers are already returned. This patch expands
      that to v4. If it sounds reasonable, I will follow with patches
      
      1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
         (http://patchwork.ozlabs.org/patch/366967/)
      2. sysctl to conditionally drop all timestamps that have payload or
         cmsg from users without CAP_NET_RAW.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      829ae9d6
  6. 29 11月, 2014 1 次提交
  7. 27 11月, 2014 2 次提交
  8. 26 11月, 2014 2 次提交
  9. 25 11月, 2014 1 次提交
  10. 24 11月, 2014 4 次提交
    • A
      new helper: skb_copy_and_csum_datagram_msg() · 227158db
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      227158db
    • L
      ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic · 20ea60ca
      lucien 提交于
      Now the vti_link_ops do not point the .dellink, for fb tunnel device
      (ip_vti0), the net_device will be removed as the default .dellink is
      unregister_netdevice_queue,but the tunnel still in the tunnel list,
      then if we add a new vti tunnel, in ip_tunnel_find():
      
              hlist_for_each_entry_rcu(t, head, hash_node) {
                      if (local == t->parms.iph.saddr &&
                          remote == t->parms.iph.daddr &&
                          link == t->parms.link &&
      ==>                 type == t->dev->type &&
                          ip_tunnel_key_match(&t->parms, flags, key))
                              break;
              }
      
      the panic will happen, cause dev of ip_tunnel *t is null:
      [ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
      [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
      [ 3835.073008] Oops: 0000 [#1] SMP
      .....
      [ 3835.073008] Stack:
      [ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
      [ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
      [ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
      [ 3835.073008] Call Trace:
      [ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
      [ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
      [ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
      [ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
      [ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
      [ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
      [ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
      [ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
      [ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
      [ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
      [ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
      ....
      
      modprobe ip_vti
      ip link del ip_vti0 type vti
      ip link add ip_vti0 type vti
      rmmod ip_vti
      
      do that one or more times, kernel will panic.
      
      fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
      which we skip the unregister of fb tunnel device. do the same on ip6_vti.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20ea60ca
    • I
      ipv6: coding style improvements (remove assignment in if statements) · e5d08d71
      Ian Morris 提交于
      This change has no functional impact and simply addresses some coding
      style issues detected by checkpatch. Specifically this change
      adjusts "if" statements which also include the assignment of a
      variable.
      
      No changes to the resultant object files result as determined by objdiff.
      Signed-off-by: NIan Morris <ipm@chirality.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5d08d71
    • A
      ipv6: Do not treat a GSO_TCPV4 request from UDP tunnel over IPv6 as invalid · b6fef4c6
      Alexander Duyck 提交于
      This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
      the IPv6 GSO offloads.  Without this change VXLAN tunnels running over IPv6
      do not currently handle IPv4 TCP TSO requests correctly and end up handing
      the non-segmented frame off to the device.
      
      Below is the before and after for a simple netperf TCP_STREAM test between
      two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
      a 1Gb/s network adapter.
      
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.29       0.88      Before
       87380  16384  16384    10.03     895.69      After
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6fef4c6
  11. 20 11月, 2014 2 次提交
  12. 19 11月, 2014 1 次提交
  13. 17 11月, 2014 1 次提交
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · feb91a02
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feb91a02
  14. 13 11月, 2014 1 次提交
    • F
      netfilter: fix various sparse warnings · 56768644
      Florian Westphal 提交于
      net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
        no; add include
      net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
        yes
      net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
        no; add include
      net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
        yes
      net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
        no; add include
      net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
      net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
        add __force, 3 is what we want.
      net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
        yes
      net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
        no; add include
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      56768644
  15. 12 11月, 2014 4 次提交
    • W
      neigh: remove dynamic neigh table registration support · d7480fd3
      WANG Cong 提交于
      Currently there are only three neigh tables in the whole kernel:
      arp table, ndisc table and decnet neigh table. What's more,
      we don't support registering multiple tables per family.
      Therefore we can just make these tables statically built-in.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7480fd3
    • J
      net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1
      Joe Perches 提交于
      Use the more common dynamic_debug capable net_dbg_ratelimited
      and remove the LIMIT_NETDEBUG macro.
      
      All messages are still ratelimited.
      
      Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
      
      This may have some negative impact on messages that were
      emitted at KERN_INFO that are not not enabled at all unless
      DEBUG is defined or dynamic_debug is enabled.  Even so,
      these messages are now _not_ emitted by default.
      
      This also eliminates the use of the net_msg_warn sysctl
      "/proc/sys/net/core/warnings".  For backward compatibility,
      the sysctl is not removed, but it has no function.  The extern
      declaration of net_msg_warn is removed from sock.h and made
      static in net/core/sysctl_net_core.c
      
      Miscellanea:
      
      o Update the sysctl documentation
      o Remove the embedded uses of pr_fmt
      o Coalesce format fragments
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7a46f1
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
    • E
      tcp: move sk_mark_napi_id() at the right place · 3d97379a
      Eric Dumazet 提交于
      sk_mark_napi_id() is used to record for a flow napi id of incoming
      packets for busypoll sake.
      We should do this only on established flows, not on listeners.
      
      This was 'working' by virtue of the socket cloning, but doing
      this on SYN packets in unecessary cache line dirtying.
      
      Even if we move sk_napi_id in the same cache line than sk_lock,
      we are working to make SYN processing lockless, so it is desirable
      to set sk_napi_id only for established flows.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d97379a
  16. 11 11月, 2014 1 次提交
  17. 08 11月, 2014 1 次提交
  18. 07 11月, 2014 4 次提交
  19. 06 11月, 2014 3 次提交
    • P
      net: Remove MPLS GSO feature. · 59b93b41
      Pravin B Shelar 提交于
      Device can export MPLS GSO support in dev->mpls_features same way
      it export vlan features in dev->vlan_features. So it is safe to
      remove NETIF_F_GSO_MPLS redundant flag.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      59b93b41
    • T
      fou: Fix typo in returning flags in netlink · e1b2cb65
      Tom Herbert 提交于
      When filling netlink info, dport is being returned as flags. Fix
      instances to return correct value.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1b2cb65
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · 4c672e4b
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c672e4b