1. 17 11月, 2015 2 次提交
    • D
      net: switchdev: fix return code of fdb_dump stub · 24cb7055
      Dragos Tatulea 提交于
      rtnl_fdb_dump always expects an index to be returned by the ndo_fdb_dump op,
      but when CONFIG_NET_SWITCHDEV is off, it returns an error.
      
      Fix that by returning the given unmodified idx.
      
      A similar fix was 0890cf6c ("switchdev: fix return value of
      switchdev_port_fdb_dump in case of error") but for the CONFIG_NET_SWITCHDEV=y
      case.
      
      Fixes: 45d4122c ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.")
      Signed-off-by: NDragos Tatulea <dragos@endocode.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24cb7055
    • J
      ip_tunnel: disable preemption when updating per-cpu tstats · b4fe85f9
      Jason A. Donenfeld 提交于
      Drivers like vxlan use the recently introduced
      udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
      makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
      packet, updates the struct stats using the usual
      u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
      udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
      tstats, so drivers like vxlan, immediately after, call
      iptunnel_xmit_stats, which does the same thing - calls
      u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).
      
      While vxlan is probably fine (I don't know?), calling a similar function
      from, say, an unbound workqueue, on a fully preemptable kernel causes
      real issues:
      
      [  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
      [  188.435579] caller is debug_smp_processor_id+0x17/0x20
      [  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
      [  188.435607] Call Trace:
      [  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
      [  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
      [  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20
      
      The solution would be to protect the whole
      this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
      disabling preemption and then reenabling it.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4fe85f9
  2. 16 11月, 2015 2 次提交
  3. 11 11月, 2015 1 次提交
  4. 09 11月, 2015 1 次提交
  5. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  6. 05 11月, 2015 2 次提交
  7. 03 11月, 2015 6 次提交
  8. 02 11月, 2015 3 次提交
  9. 28 10月, 2015 1 次提交
  10. 27 10月, 2015 6 次提交
  11. 26 10月, 2015 9 次提交
  12. 23 10月, 2015 3 次提交
    • R
      mpls: multipath route support · f8efb73c
      Roopa Prabhu 提交于
      This patch adds support for MPLS multipath routes.
      
      Includes following changes to support multipath:
      - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
      
      - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
      
      - moves mpls route and nexthop structures into internal.h
      
      - A mpls_route can point to multiple mpls_nh structs
      
      - the nexthops are maintained as a array (similar to ipv4 fib)
      
      - In the process of restructuring, this patch also consistently changes
        all labels to u8
      
      - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
      multipath routes similar to ipv4/v6 fib
      
      - In this patch, the multipath route nexthop selection algorithm
      simply returns the first nexthop. It is replaced by a
      hash based algorithm from Robert Shearman in the next patch
      
      - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
      mpls_route_update though implemented to update based on dev, it was
      never used that way. And the dev handling gets tricky with multiple
      nexthops. Cannot match against any single nexthops dev. So, this patch
      removes the unused 'dev' handling in mpls_route_update.
      
      - dead route/path handling will be implemented in a subsequent patch
      
      Example:
      
      $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
                      nexthop as 700 via inet 10.1.1.6 dev swp2 \
                      nexthop as 800 via inet 40.1.1.2 dev swp3
      
      $ip  -f mpls route show
      100
              nexthop as to 200 via inet 10.1.1.2  dev swp1
              nexthop as to 700 via inet 10.1.1.6  dev swp2
              nexthop as to 800 via inet 40.1.1.2  dev swp3
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8efb73c
    • E
      tcp/dccp: fix hashdance race for passive sessions · 5e0724d0
      Eric Dumazet 提交于
      Multiple cpus can process duplicates of incoming ACK messages
      matching a SYN_RECV request socket. This is a rare event under
      normal operations, but definitely can happen.
      
      Only one must win the race, otherwise corruption would occur.
      
      To fix this without adding new atomic ops, we use logic in
      inet_ehash_nolisten() to detect the request was present in the same
      ehash bucket where we try to insert the new child.
      
      If request socket was not found, we have to undo the child creation.
      
      This actually removes a spin_lock()/spin_unlock() pair in
      reqsk_queue_unlink() for the fast path.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e0724d0
    • P
      openvswitch: Fix egress tunnel info. · fc4099f1
      Pravin B Shelar 提交于
      While transitioning to netdev based vport we broke OVS
      feature which allows user to retrieve tunnel packet egress
      information for lwtunnel devices.  Following patch fixes it
      by introducing ndo operation to get the tunnel egress info.
      Same ndo operation can be used for lwtunnel devices and compat
      ovs-tnl-vport devices. So after adding such device operation
      we can remove similar operation from ovs-vport.
      
      Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device").
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4099f1
  13. 22 10月, 2015 3 次提交