1. 21 9月, 2013 4 次提交
  2. 20 9月, 2013 1 次提交
    • A
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka 提交于
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      703133de
  3. 13 9月, 2013 1 次提交
  4. 12 9月, 2013 1 次提交
    • M
      ipv6: don't call fib6_run_gc() until routing is ready · 2c861cc6
      Michal Kubeček 提交于
      When loading the ipv6 module, ndisc_init() is called before
      ip6_route_init(). As the former registers a handler calling
      fib6_run_gc(), this opens a window to run the garbage collector
      before necessary data structures are initialized. If a network
      device is initialized in this window, adding MAC address to it
      triggers a NETDEV_CHANGEADDR event, leading to a crash in
      fib6_clean_all().
      
      Take the event handler registration out of ndisc_init() into a
      separate function ndisc_late_init() and move it after
      ip6_route_init().
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c861cc6
  5. 06 9月, 2013 1 次提交
    • J
      vxlan: Notify drivers for listening UDP port changes · 53cf5275
      Joseph Gasparakis 提交于
      This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
      ndo_del_rx_vxlan_port().
      
      Drivers can get notifications through the above functions about changes
      of the UDP listening port of VXLAN. Also, when physical ports come up,
      now they can call vxlan_get_rx_port() in order to obtain the port number(s)
      of the existing VXLAN interface in case they already up before them.
      
      This information about the listening UDP port would be used for VXLAN
      related offloads.
      
      A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
      input and his suggestions on this patch set.
      
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53cf5275
  6. 05 9月, 2013 2 次提交
    • D
      net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170
      Daniel Borkmann 提交于
      Get rid of MLDV2_MRC and use our new macros for mantisse and
      exponent to calculate Maximum Response Delay out of the Maximum
      Response Code.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3f5b170
    • D
      net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c
      Daniel Borkmann 提交于
      i) RFC3810, 9.2. Query Interval [QI] says:
      
         The Query Interval variable denotes the interval between General
         Queries sent by the Querier. Default value: 125 seconds. [...]
      
      ii) RFC3810, 9.3. Query Response Interval [QRI] says:
      
        The Maximum Response Delay used to calculate the Maximum Response
        Code inserted into the periodic General Queries. Default value:
        10000 (10 seconds) [...] The number of seconds represented by the
        [Query Response Interval] must be less than the [Query Interval].
      
      iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:
      
        The Older Version Querier Present Timeout is the time-out for
        transitioning a host back to MLDv2 Host Compatibility Mode. When an
        MLDv1 query is received, MLDv2 hosts set their Older Version Querier
        Present Timer to [Older Version Querier Present Timeout].
      
        This value MUST be ([Robustness Variable] times (the [Query Interval]
        in the last Query received)) plus ([Query Response Interval]).
      
      Hence, on *default* the timeout results in:
      
        [RV] = 2, [QI] = 125sec, [QRI] = 10sec
        [OVQPT] = [RV] * [QI] + [QRI] = 260sec
      
      Having that said, we currently calculate [OVQPT] (here given as 'switchback'
      variable) as ...
      
        switchback = (idev->mc_qrv + 1) * max_delay
      
      RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
      section "9.14. Configuring timers", it is said:
      
        This section is meant to provide advice to network administrators on
        how to tune these settings to their network. Ambitious router
        implementations might tune these settings dynamically based upon
        changing characteristics of the network. [...]
      
      iv) RFC38010, 9.14.2. Query Interval:
      
        The overall level of periodic MLD traffic is inversely proportional
        to the Query Interval. A longer Query Interval results in a lower
        overall level of MLD traffic. The value of the Query Interval MUST
        be equal to or greater than the Maximum Response Delay used to
        calculate the Maximum Response Code inserted in General Query
        messages.
      
      I assume that was why switchback is calculated as is (3 * max_delay), although
      this setting seems to be meant for routers only to configure their [QI]
      interval for non-default intervals. So usage here like this is clearly wrong.
      
      Concluding, the current behaviour in IPv6's multicast code is not conform
      to the RFC as switch back is calculated wrongly. That is, it has a too small
      value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
      instead of ~260secs on default.
      
      Hence, introduce necessary helper functions and fix this up properly as it
      should be.
      
      Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
      Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
      who did initial testing.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: David Stevens <dlstevens@us.ibm.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89225d1c
  7. 04 9月, 2013 6 次提交
  8. 03 9月, 2013 1 次提交
  9. 01 9月, 2013 6 次提交
  10. 31 8月, 2013 1 次提交
    • S
      qdisc: allow setting default queuing discipline · 6da7c8fc
      stephen hemminger 提交于
      By default, the pfifo_fast queue discipline has been used by default
      for all devices. But we have better choices now.
      
      This patch allow setting the default queueing discipline with sysctl.
      This allows easy use of better queueing disciplines on all devices
      without having to use tc qdisc scripts. It is intended to allow
      an easy path for distributions to make fq_codel or sfq the default
      qdisc.
      
      This patch also makes pfifo_fast more of a first class qdisc, since
      it is now possible to manually override the default and explicitly
      use pfifo_fast. The behavior for systems who do not use the sysctl
      is unchanged, they still get pfifo_fast
      
      Also removes leftover random # in sysctl net core.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6da7c8fc
  11. 30 8月, 2013 2 次提交
    • E
      tcp: TSO packets automatic sizing · 95bd09eb
      Eric Dumazet 提交于
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      packets.
      
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      
      This field could be set by other transports.
      
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      
      For other flows, this helps better packet scheduling and ACK clocking.
      
      This patch increases performance of TCP flows in lossy environments.
      
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      
      sk_pacing_rate = 2 * cwnd * mss / srtt
      
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95bd09eb
    • D
      net: sctp: reorder sctp_globals to reduce cacheline usage · 76bfd898
      Daniel Borkmann 提交于
      Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By
      reordering elements, we can close gaps and simply achieve the following:
      
      Current situation:
        /* size: 80, cachelines: 2, members: 10 */
        /* sum members: 57, holes: 4, sum holes: 16 */
        /* padding: 7 */
        /* last cacheline: 16 bytes */
      
      Afterwards:
        /* size: 64, cachelines: 1, members: 10 */
        /* padding: 7 */
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76bfd898
  12. 29 8月, 2013 2 次提交
  13. 28 8月, 2013 4 次提交
  14. 26 8月, 2013 2 次提交
    • W
      fs/9p: avoid accessing utsname after namespace has been torn down · 50192abe
      Will Deacon 提交于
      During trinity fuzzing in a kvmtool guest, I stumbled across the
      following:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000004
      PC is at v9fs_file_do_lock+0xc8/0x1a0
      LR is at v9fs_file_do_lock+0x48/0x1a0
      [<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124)
      [<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4)
      [<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8)
      [<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8)
      [<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0)
      [<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18)
      
      I believe this is due to an attempt to access utsname()->nodename, after
      exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns
      as NULL and causing the above dereference.
      
      A similar issue was fixed for lockd in 9a1b6bf8 ("LOCKD: Don't call
      utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts
      something similar for 9pfs.
      
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      50192abe
    • H
      xfrm: revert ipv4 mtu determination to dst_mtu · 5a25cf1e
      Hannes Frederic Sowa 提交于
      In commit 0ea9d5e3 ("xfrm: introduce
      helper for safe determination of mtu") I switched the determination of
      ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in
      case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is
      never correct for ipv4 ipsec.
      
      This patch partly reverts 0ea9d5e3
      ("xfrm: introduce helper for safe determination of mtu").
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      5a25cf1e
  15. 24 8月, 2013 1 次提交
  16. 23 8月, 2013 2 次提交
  17. 21 8月, 2013 3 次提交