1. 04 5月, 2015 2 次提交
    • T
      ipv6: Flow label state ranges · 82a584b7
      Tom Herbert 提交于
      This patch divides the IPv6 flow label space into two ranges:
      0-7ffff is reserved for flow label manager, 80000-fffff will be
      used for creating auto flow labels (per RFC6438). This only affects how
      labels are set on transmit, it does not affect receive. This range split
      can be disbaled by systcl.
      
      Background:
      
      IPv6 flow labels have been an unmitigated disappointment thus far
      in the lifetime of IPv6. Support in HW devices to use them for ECMP
      is lacking, and OSes don't turn them on by default. If we had these
      we could get much better hashing in IPv6 networks without resorting
      to DPI, possibly eliminating some of the motivations to to define new
      encaps in UDP just for getting ECMP.
      
      Unfortunately, the initial specfications of IPv6 did not clarify
      how they are to be used. There has always been a vague concept that
      these can be used for ECMP, flow hashing, etc. and we do now have a
      good standard how to this in RFC6438. The problem is that flow labels
      can be either stateful or stateless (as in RFC6438), and we are
      presented with the possibility that a stateless label may collide
      with a stateful one.  Attempts to split the flow label space were
      rejected in IETF. When we added support in Linux for RFC6438, we
      could not turn on flow labels by default due to this conflict.
      
      This patch splits the flow label space and should give us
      a path to enabling auto flow labels by default for all IPv6 packets.
      This is an API change so we need to consider compatibility with
      existing deployment. The stateful range is chosen to be the lower
      values in hopes that most uses would have chosen small numbers.
      
      Once we resolve the stateless/stateful issue, we can proceed to
      look at enabling RFC6438 flow labels by default (starting with
      scaled testing).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82a584b7
    • M
      ipv6: Check RTF_LOCAL on rt->rt6i_flags instead of rt->dst.flags · 7035870d
      Martin KaFai Lau 提交于
      In my earlier commit:
      653437d0 ("ipv6: Stop /128 route from disappearing after pmtu update"),
      there was a horrible typo.  Instead of checking RTF_LOCAL on
      rt->rt6i_flags, it was checked on rt->dst.flags.  This patch fixes
      it.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hajime Tazaki <tazaki@sfc.wide.ad.jp>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7035870d
  2. 03 5月, 2015 3 次提交
  3. 02 5月, 2015 14 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6c3c1eb3
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Receive packet length needs to be adjust by 2 on RX to accomodate
          the two padding bytes in altera_tse driver.  From Vlastimil Setka.
      
       2) If rx frame is dropped due to out of memory in macb driver, we leave
          the receive ring descriptors in an undefined state.  From Punnaiah
          Choudary Kalluri
      
       3) Some netlink subsystems erroneously signal NLM_F_MULTI.  That is
          only for dumps.  Fix from Nicolas Dichtel.
      
       4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via
          the ipv4_mtu() helper.  From Herbert Xu.
      
       5) Fix null deref in bridge netfilter, and miscalculated lengths in
          jump/goto nf_tables verdicts.  From Florian Westphal.
      
       6) Unhash ping sockets properly.
      
       7) Software implementation of BPF divide did 64/32 rather than 64/64
          bit divide.  The JITs got it right.  Fix from Alexei Starovoitov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
        ipv4: Missing sk_nulls_node_init() in ping_unhash().
        net: fec: Fix RGMII-ID mode
        net/mlx4_en: Schedule napi when RX buffers allocation fails
        netxen_nic: use spin_[un]lock_bh around tx_clean_lock
        net/mlx4_core: Fix unaligned accesses
        mlx4_en: Use correct loop cursor in error path.
        cxgb4: Fix MC1 memory offset calculation
        bnx2x: Delay during kdump load
        net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table
        net: dsa: Fix scope of eeprom-length property
        net: macb: Fix race condition in driver when Rx frame is dropped
        hv_netvsc: Fix a bug in netvsc_start_xmit()
        altera_tse: Correct rx packet length
        mlx4: Fix tx ring affinity_mask creation
        tipc: fix problem with parallel link synchronization mechanism
        tipc: remove wrong use of NLM_F_MULTI
        bridge/nl: remove wrong use of NLM_F_MULTI
        bridge/mdb: remove wrong use of NLM_F_MULTI
        net: sched: act_connmark: don't zap skb->nfct
        trivial: net: systemport: bcmsysport.h: fix 0x0x prefix
        ...
      6c3c1eb3
    • S
      virtio: fix typo in vring_need_event() doc comment · e412d3a3
      Stefan Hajnoczi 提交于
      Here the "other side" refers to the guest or host.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e412d3a3
    • R
      virtio: pass baton to Michael Tsirkin · feda5f93
      Rusty Russell 提交于
      With my job change kernel work will be "own time"; I'm keeping lguest
      and modules (and the virtio standards work), but virtio kernel has to
      go.
      
      This makes it clear that Michael is in charge.  He's good, but having
      me watch over his shoulder won't help.
      
      Good luck Michael!
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      feda5f93
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 6fa72720
      Linus Torvalds 提交于
      Pull Ceph RBD fix from Sage Weil.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        rbd: end I/O the entire obj_request on error
      6fa72720
    • D
      ipv4: Missing sk_nulls_node_init() in ping_unhash(). · a134f083
      David S. Miller 提交于
      If we don't do that, then the poison value is left in the ->pprev
      backlink.
      
      This can cause crashes if we do a disconnect, followed by a connect().
      Tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NWen Xu <hotdog3645@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a134f083
    • S
      net: rocker: Use ether_addr_equal · 629161f6
      Simon Horman 提交于
      A small cleanup to make use of the ether_addr_equal helper.
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      629161f6
    • D
      Merge branch 'rt6_pmtu' · 36c82963
      David S. Miller 提交于
      Martin KaFai Lau says:
      
      ====================
      ipv6: Stop /128 route from disappearing after pmtu update
      
      The series is separated from another patch series,
      'ipv6: Only create RTF_CACHE route after encountering pmtu exception',
      which can be found here:
      http://thread.gmane.org/gmane.linux.network/359140
      
      This series focus on fixing the /128 route issues.  It is currently targeted
      for net-next due to the number of code churn but it is also applicable
      to net (should be without conflict).  The original reported problem can be
      found here:
      http://thread.gmane.org/gmane.linux.network/348138
      
      Patch 01 and 02 are to prepare the fib6 search to expect both the
      RTF_CACHE clone and its original route exist at the same fib6_node.
      
      Patch 03 fixes the /128 route disappearing bug.
      
      Patch 04 and 05 stop rt6_info from using the inet_peer's metrics to
      avoid the /128 routes (like the /128 clone and its original route)
      from stepping on each others' metrics.
      
      The second patch is by 'Steffen Klassert <steffen.klassert@secunet.com>'
      which I pulled off from netdev.  The third patch is also mostly by
      Steffen with one minor optimization.
      
      Many thanks to Hannes Frederic Sowa <hannes@stressinduktion.org> on
      reviewing the patches and giving advice.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36c82963
    • M
      ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peer · afc4eef8
      Martin KaFai Lau 提交于
      _rt6i_peer is no longer needed after the last patch,
      'ipv6: Stop rt6_info from using inet_peer's metrics'.
      
      DST_METRICS_FORCE_OVERWRITE is added by
      commit e5fd387a ("ipv6: do not overwrite inetpeer metrics prematurely").
      Since inetpeer is no longer used for metrics, this bit is also not needed.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afc4eef8
    • M
      ipv6: Stop rt6_info from using inet_peer's metrics · 4b32b5ad
      Martin KaFai Lau 提交于
      inet_peer is indexed by the dst address alone.  However, the fib6 tree
      could have multiple routing entries (rt6_info) for the same dst. For
      example,
      1. A /128 dst via multiple gateways.
      2. A RTF_CACHE route cloned from a /128 route.
      
      In the above cases, all of them will share the same metrics and
      step on each other.
      
      This patch will steer away from inet_peer's metrics and use
      dst_cow_metrics_generic() for everything.
      
      Change Highlights:
      1. Remove rt6_cow_metrics() which currently acquires metrics from
         inet_peer for DST_HOST route (i.e. /128 route).
      2. Add rt6i_pmtu to take care of the pmtu update to avoid creating a
         full size metrics just to override the RTAX_MTU.
      3. After (2), the RTF_CACHE route can also share the metrics with its
         dst.from route, by:
         dst_init_metrics(&cache_rt->dst, dst_metrics_ptr(cache_rt->dst.from), true);
      4. Stop creating RTF_CACHE route by cloning another RTF_CACHE route.  Instead,
         directly clone from rt->dst.
      
         [ Currently, cloning from another RTF_CACHE is only possible during
           rt6_do_redirect().  Also, the old clone is removed from the tree
           immediately after the new clone is added. ]
      
         In case of cloning from an older redirect RTF_CACHE, it should work as
         before.
      
         In case of cloning from an older pmtu RTF_CACHE, this patch will forget
         the pmtu and re-learn it (if there is any) from the redirected route.
      
      The _rt6i_peer and DST_METRICS_FORCE_OVERWRITE will be removed
      in the next cleanup patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b32b5ad
    • M
      ipv6: Stop /128 route from disappearing after pmtu update · 653437d0
      Martin KaFai Lau 提交于
      This patch is mostly from Steffen Klassert <steffen.klassert@secunet.com>.
      I only removed the (rt6->rt6i_dst.plen == 128) check from
      ip6_rt_update_pmtu() because the (rt6->rt6i_flags & RTF_CACHE) test
      has already implied it.
      
      This patch:
      1. Create RTF_CACHE route for /128 non local route
      2. After (1), all routes that allow pmtu update should have a RTF_CACHE
         clone.  Hence, stop updating MTU for any non RTF_CACHE route.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      653437d0
    • S
      ipv6: Extend the route lookups to low priority metrics. · 9fbdcfaf
      Steffen Klassert 提交于
      We search only for routes with highest priority metric in
      find_rr_leaf(). However if one of these routes is marked
      as invalid, we may fail to find a route even if there is
      a appropriate route with lower priority. Then we loose
      connectivity until the garbage collector deletes the
      invalid route. This typically happens if a host route
      expires afer a pmtu event. Fix this by searching also
      for routes with a lower priority metric.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fbdcfaf
    • M
      ipv6: Consider RTF_CACHE when searching the fib6 tree · 1f56a01f
      Martin KaFai Lau 提交于
      It is a prep work for the later bug-fix patch which will stop /128 route
      from disappearing after pmtu update.
      
      The later bug-fix patch will allow a /128 route and its RTF_CACHE clone
      both exist at the same fib6_node.  To do this, we need to prepare the
      existing fib6 tree search to expect RTF_CACHE for /128 route.
      
      Note that the fn->leaf is sorted by rt6i_metric.  Hence,
      RTF_CACHE (if there is any) is always at the front.  This property
      leads to the following:
      
      1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which
         the caller is used to ask for deleting clone or non-clone.
         The rtm_to_fib6_config() should also check the RTM_F_CLONED and
         then set RTF_CACHE accordingly so that:
         - 'ip -6 r del...' will make ip6_route_del() to delete a route
           and all its clones. Note that its clones is flushed by fib6_del()
         - 'ip -6 r flush table cache' will make ip6_route_del() to
            only delete clone(s).
      
      2. Exclude RTF_CACHE from addrconf_get_prefix_route() which
         should not configure on a cloned route.
      
      3. No change is need for rt6_device_match() since it currently could
         return a RTF_CACHE clone route, so the later bug-fix patch will not
         affect it.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f56a01f
    • I
      rbd: end I/O the entire obj_request on error · 082a75da
      Ilya Dryomov 提交于
      When we end I/O struct request with error, we need to pass
      obj_request->length as @nr_bytes so that the entire obj_request worth
      of bytes is completed.  Otherwise block layer ends up confused and we
      trip on
      
          rbd_assert(more ^ (which == img_request->obj_request_count));
      
      in rbd_img_obj_callback() due to more being true no matter what.  We
      already do it in most cases but we are missing some, in particular
      those where we don't even get a chance to submit any obj_requests, due
      to an early -ENOMEM for example.
      
      A number of obj_request->xferred assignments seem to be redundant but
      I haven't touched any of obj_request->xferred stuff to keep this small
      and isolated.
      
      Cc: Alex Elder <elder@linaro.org>
      Cc: stable@vger.kernel.org # 3.10+
      Reported-by: NShawn Edwards <lesser.evil@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      082a75da
    • E
      ipv4: speedup ip_idents_reserve() · 355b590c
      Eric Dumazet 提交于
      Under stress, ip_idents_reserve() is accessing a contended
      cache line twice, with non optimal MESI transactions.
      
      If we place timestamps in separate location, we reduce this
      pressure by ~50% and allow atomic_add_return() to issue
      a Request for Ownership.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      355b590c
  4. 01 5月, 2015 21 次提交