1. 24 1月, 2017 1 次提交
    • D
      net: mpls: Fix multipath selection for LSR use case · 9f427a0e
      David Ahern 提交于
      MPLS multipath for LSR is broken -- always selecting the first nexthop
      in the one label case. For example:
      
          $ ip -f mpls ro ls
          100
                  nexthop as to 200 via inet 172.16.2.2  dev virt12
                  nexthop as to 300 via inet 172.16.3.2  dev virt13
          101
                  nexthop as to 201 via inet6 2000:2::2  dev virt12
                  nexthop as to 301 via inet6 2000:3::2  dev virt13
      
      In this example incoming packets have a single MPLS labels which means
      BOS bit is set. The BOS bit is passed from mpls_forward down to
      mpls_multipath_hash which never processes the hash loop because BOS is 1.
      
      Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len
      tracks the total mpls header length on each pass (on pass N mpls_hdr_len
      is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set
      it verifies the skb has sufficient header for ipv4 or ipv6, and find the
      IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to
      advance past it.
      
      With these changes I have verified the code correctly sees the label,
      BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp
      traffic for ipv4 and ipv6 are distributed across the nexthops.
      
      Fixes: 1c78efa8 ("mpls: flow-based multipath selection")
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f427a0e
  2. 21 1月, 2017 5 次提交
  3. 20 1月, 2017 2 次提交
    • A
      tcp: initialize max window for a new fastopen socket · 0dbd7ff3
      Alexey Kodanev 提交于
      Found that if we run LTP netstress test with large MSS (65K),
      the first attempt from server to send data comparable to this
      MSS on fastopen connection will be delayed by the probe timer.
      
      Here is an example:
      
           < S  seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32
           > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0
           < .  ack 1 win 342 length 0
      
      Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now',
      as well as in 'size_goal'. This results the segment not queued for
      transmition until all the data copied from user buffer. Then, inside
      __tcp_push_pending_frames(), it breaks on send window test and
      continues with the check probe timer.
      
      Fragmentation occurs in tcp_write_wakeup()...
      
      +0.2 > P. seq 1:43777 ack 1 win 342 length 43776
           < .  ack 43777, win 1365 length 0
           > P. seq 43777:65001 ack 1 win 342 options [...] length 21224
           ...
      
      This also contradicts with the fact that we should bound to the half
      of the window if it is large.
      
      Fix this flaw by correctly initializing max_window. Before that, it
      could have large values that affect further calculations of 'size_goal'.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dbd7ff3
    • K
      ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock · 03e4deff
      Kefeng Wang 提交于
      Just like commit 4acd4945 ("ipv6: addrconf: Avoid calling
      netdevice notifiers with RCU read-side lock"), it is unnecessary
      to make addrconf_disable_change() use RCU iteration over the
      netdev list, since it already holds the RTNL lock, or we may meet
      Illegal context switch in RCU read-side critical section.
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03e4deff
  4. 19 1月, 2017 3 次提交
    • D
      lwtunnel: fix autoload of lwt modules · 9ed59592
      David Ahern 提交于
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed59592
    • E
      net: fix harmonize_features() vs NETIF_F_HIGHDMA · 7be2c82c
      Eric Dumazet 提交于
      Ashizuka reported a highmem oddity and sent a patch for freescale
      fec driver.
      
      But the problem root cause is that core networking stack
      must ensure no skb with highmem fragment is ever sent through
      a device that does not assert NETIF_F_HIGHDMA in its features.
      
      We need to call illegal_highdma() from harmonize_features()
      regardless of CSUM checks.
      
      Fixes: ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Reported-by: N"Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be2c82c
    • E
      net: ethtool: Initialize buffer when querying device channel settings · 31a86d13
      Eran Ben Elisha 提交于
      Ethtool channels respond struct was uninitialized when querying device
      channel boundaries settings. As a result, unreported fields by the driver
      hold garbage.  This may cause sending unsupported params to driver.
      
      Fixes: 8bf36862 ('ethtool: ensure channel counts are within bounds ...')
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      CC: John W. Linville <linville@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31a86d13
  5. 17 1月, 2017 6 次提交
    • J
      net sched actions: fix refcnt when GETing of action after bind · 0faa9cb5
      Jamal Hadi Salim 提交于
      Demonstrating the issue:
      
      .. add a drop action
      $sudo $TC actions add action drop index 10
      
      .. retrieve it
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 29 sec used 29 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... bug 1 above: reference is two.
          Reference is actually 1 but we forget to subtract 1.
      
      ... do a GET again and we see the same issue
          try a few times and nothing changes
      ~$ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 31 sec used 31 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... lets try to bind the action to a filter..
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      ... and now a few GETs:
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 3 bind 1 installed 204 sec used 204 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 4 bind 1 installed 206 sec used 206 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 5 bind 1 installed 235 sec used 235 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      .... as can be observed the reference count keeps going up.
      
      After the fix
      
      $ sudo $TC actions add action drop index 10
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 4 sec used 4 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 6 sec used 6 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 32 sec used 32 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 33 sec used 33 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      Fixes: aecc5cef ("net sched actions: fix GETing actions")
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0faa9cb5
    • B
      ax25: Fix segfault after sock connection timeout · 8a367e74
      Basil Gunn 提交于
      The ax.25 socket connection timed out & the sock struct has been
      previously taken down ie. sock struct is now a NULL pointer. Checking
      the sock_flag causes the segfault.  Check if the socket struct pointer
      is NULL before checking sock_flag. This segfault is seen in
      timed out netrom connections.
      
      Please submit to -stable.
      Signed-off-by: NBasil Gunn <basil@pacabunga.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a367e74
    • D
      bpf: rework prog_digest into prog_tag · f1f7714e
      Daniel Borkmann 提交于
      Commit 7bd509e3 ("bpf: add prog_digest and expose it via
      fdinfo/netlink") was recently discussed, partially due to
      admittedly suboptimal name of "prog_digest" in combination
      with sha1 hash usage, thus inevitably and rightfully concerns
      about its security in terms of collision resistance were
      raised with regards to use-cases.
      
      The intended use cases are for debugging resp. introspection
      only for providing a stable "tag" over the instruction sequence
      that both kernel and user space can calculate independently.
      It's not usable at all for making a security relevant decision.
      So collisions where two different instruction sequences generate
      the same tag can happen, but ideally at a rather low rate. The
      "tag" will be dumped in hex and is short enough to introspect
      in tracepoints or kallsyms output along with other data such
      as stack trace, etc. Thus, this patch performs a rename into
      prog_tag and truncates the tag to a short output (64 bits) to
      make it obvious it's not collision-free.
      
      Should in future a hash or facility be needed with a security
      relevant focus, then we can think about requirements, constraints,
      etc that would fit to that situation. For now, rework the exposed
      parts for the current use cases as long as nothing has been
      released yet. Tested on x86_64 and s390x.
      
      Fixes: 7bd509e3 ("bpf: add prog_digest and expose it via fdinfo/netlink")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1f7714e
    • P
      tipc: allocate user memory with GFP_KERNEL flag · 57d5f64d
      Parthasarathy Bhuvaragan 提交于
      Until now, we allocate memory always with GFP_ATOMIC flag.
      When the system is under memory pressure and a user tries to send,
      the send fails due to low memory. However, the user application
      can wait for free memory if we allocate it using GFP_KERNEL flag.
      
      In this commit, we use allocate memory with GFP_KERNEL for all user
      allocation.
      Reported-by: NRune Torgersen <runet@innovsys.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57d5f64d
    • J
      ip6_tunnel: Account for tunnel header in tunnel MTU · 02ca0423
      Jakub Sitnicki 提交于
      With ip6gre we have a tunnel header which also makes the tunnel MTU
      smaller. We need to reserve room for it. Previously we were using up
      space reserved for the Tunnel Encapsulation Limit option
      header (RFC 2473).
      
      Also, after commit b05229f4 ("gre6: Cleanup GREv6 transmit path,
      call common GRE functions") our contract with the caller has
      changed. Now we check if the packet length exceeds the tunnel MTU after
      the tunnel header has been pushed, unlike before.
      
      This is reflected in the check where we look at the packet length minus
      the size of the tunnel header, which is already accounted for in tunnel
      MTU.
      
      Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
      Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ca0423
    • H
      mld: do not remove mld souce list info when set link down · 1666d49e
      Hangbin Liu 提交于
      This is an IPv6 version of commit 24803f38 ("igmp: do not remove igmp
      souce list..."). In mld_del_delrec(), we will restore back all source filter
      info instead of flush them.
      
      Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
      we should not remove source list info when set link down. Remove
      igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
      ipv6_mc_down().
      
      Also clear all source info after igmp6_group_dropped() instead of in it
      because ipv6_mc_down() will call igmp6_group_dropped().
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1666d49e
  6. 16 1月, 2017 1 次提交
    • L
      openvswitch: maintain correct checksum state in conntrack actions · 75f01a4c
      Lance Richardson 提交于
      When executing conntrack actions on skbuffs with checksum mode
      CHECKSUM_COMPLETE, the checksum must be updated to account for
      header pushes and pulls. Otherwise we get "hw csum failure"
      logs similar to this (ICMP packet received on geneve tunnel
      via ixgbe NIC):
      
      [  405.740065] genev_sys_6081: hw csum failure
      [  405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G          I     4.10.0-rc3+ #1
      [  405.740108] Call Trace:
      [  405.740110]  <IRQ>
      [  405.740113]  dump_stack+0x63/0x87
      [  405.740116]  netdev_rx_csum_fault+0x3a/0x40
      [  405.740118]  __skb_checksum_complete+0xcf/0xe0
      [  405.740120]  nf_ip_checksum+0xc8/0xf0
      [  405.740124]  icmp_error+0x1de/0x351 [nf_conntrack_ipv4]
      [  405.740132]  nf_conntrack_in+0xe1/0x550 [nf_conntrack]
      [  405.740137]  ? find_bucket.isra.2+0x62/0x70 [openvswitch]
      [  405.740143]  __ovs_ct_lookup+0x95/0x980 [openvswitch]
      [  405.740145]  ? netif_rx_internal+0x44/0x110
      [  405.740149]  ovs_ct_execute+0x147/0x4b0 [openvswitch]
      [  405.740153]  do_execute_actions+0x22e/0xa70 [openvswitch]
      [  405.740157]  ovs_execute_actions+0x40/0x120 [openvswitch]
      [  405.740161]  ovs_dp_process_packet+0x84/0x120 [openvswitch]
      [  405.740166]  ovs_vport_receive+0x73/0xd0 [openvswitch]
      [  405.740168]  ? udp_rcv+0x1a/0x20
      [  405.740170]  ? ip_local_deliver_finish+0x93/0x1e0
      [  405.740172]  ? ip_local_deliver+0x6f/0xe0
      [  405.740174]  ? ip_rcv_finish+0x3a0/0x3a0
      [  405.740176]  ? ip_rcv_finish+0xdb/0x3a0
      [  405.740177]  ? ip_rcv+0x2a7/0x400
      [  405.740180]  ? __netif_receive_skb_core+0x970/0xa00
      [  405.740185]  netdev_frame_hook+0xd3/0x160 [openvswitch]
      [  405.740187]  __netif_receive_skb_core+0x1dc/0xa00
      [  405.740194]  ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe]
      [  405.740197]  __netif_receive_skb+0x18/0x60
      [  405.740199]  netif_receive_skb_internal+0x40/0xb0
      [  405.740201]  napi_gro_receive+0xcd/0x120
      [  405.740204]  gro_cell_poll+0x57/0x80 [geneve]
      [  405.740206]  net_rx_action+0x260/0x3c0
      [  405.740209]  __do_softirq+0xc9/0x28c
      [  405.740211]  irq_exit+0xd9/0xf0
      [  405.740213]  do_IRQ+0x51/0xd0
      [  405.740215]  common_interrupt+0x93/0x93
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75f01a4c
  7. 14 1月, 2017 2 次提交
    • S
      tcp: fix tcp_fastopen unaligned access complaints on sparc · 003c9410
      Shannon Nelson 提交于
      Fix up a data alignment issue on sparc by swapping the order
      of the cookie byte array field with the length field in
      struct tcp_fastopen_cookie, and making it a proper union
      to clean up the typecasting.
      
      This addresses log complaints like these:
          log_unaligned: 113 callbacks suppressed
          Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
          Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
          Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
          Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
          Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      003c9410
    • D
      ipv6: sr: fix several BUGs when preemption is enabled · fa79581e
      David Lebrun 提交于
      When CONFIG_PREEMPT=y, CONFIG_IPV6=m and CONFIG_SEG6_HMAC=y,
      seg6_hmac_init() is called during the initialization of the ipv6 module.
      This causes a subsequent call to smp_processor_id() with preemption
      enabled, resulting in the following trace.
      
      [   20.451460] BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1
      [   20.452556] caller is debug_smp_processor_id+0x17/0x19
      [   20.453304] CPU: 0 PID: 1 Comm: systemd Not tainted 4.9.0-rc5-00973-g46738b13 #1
      [   20.454406]  ffffc9000062fc18 ffffffff813607b2 0000000000000000 ffffffff81a7f782
      [   20.455528]  ffffc9000062fc48 ffffffff813778dc 0000000000000000 00000000001dcf98
      [   20.456539]  ffffffffa003bd08 ffffffff81af93e0 ffffc9000062fc58 ffffffff81377905
      [   20.456539] Call Trace:
      [   20.456539]  [<ffffffff813607b2>] dump_stack+0x63/0x7f
      [   20.456539]  [<ffffffff813778dc>] check_preemption_disabled+0xd1/0xe3
      [   20.456539]  [<ffffffff81377905>] debug_smp_processor_id+0x17/0x19
      [   20.460260]  [<ffffffffa0061f3b>] seg6_hmac_init+0xfa/0x192 [ipv6]
      [   20.460260]  [<ffffffffa0061ccc>] seg6_init+0x39/0x6f [ipv6]
      [   20.460260]  [<ffffffffa006121a>] inet6_init+0x21a/0x321 [ipv6]
      [   20.460260]  [<ffffffffa0061000>] ? 0xffffffffa0061000
      [   20.460260]  [<ffffffff81000457>] do_one_initcall+0x8b/0x115
      [   20.460260]  [<ffffffff811328a3>] do_init_module+0x53/0x1c4
      [   20.460260]  [<ffffffff8110650a>] load_module+0x1153/0x14ec
      [   20.460260]  [<ffffffff81106a7b>] SYSC_finit_module+0x8c/0xb9
      [   20.460260]  [<ffffffff81106a7b>] ? SYSC_finit_module+0x8c/0xb9
      [   20.460260]  [<ffffffff81106abc>] SyS_finit_module+0x9/0xb
      [   20.460260]  [<ffffffff810014d1>] do_syscall_64+0x62/0x75
      [   20.460260]  [<ffffffff816834f0>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Moreover, dst_cache_* functions also call smp_processor_id(), generating
      a similar trace.
      
      This patch uses raw_cpu_ptr() in seg6_hmac_init() rather than this_cpu_ptr()
      and disable preemption when using dst_cache_* functions.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa79581e
  8. 13 1月, 2017 7 次提交
    • M
      mac80211: prevent skb/txq mismatch · dbef5362
      Michal Kazior 提交于
      Station structure is considered as not uploaded
      (to driver) until drv_sta_state() finishes. This
      call is however done after the structure is
      attached to mac80211 internal lists and hashes.
      This means mac80211 can lookup (and use) station
      structure before it is uploaded to a driver.
      
      If this happens (structure exists, but
      sta->uploaded is false) fast_tx path can still be
      taken. Deep in the fastpath call the sta->uploaded
      is checked against to derive "pubsta" argument for
      ieee80211_get_txq(). If sta->uploaded is false
      (and sta is actually non-NULL) ieee80211_get_txq()
      effectively downgraded to vif->txq.
      
      At first glance this may look innocent but coerces
      mac80211 into a state that is almost guaranteed
      (codel may drop offending skb) to crash because a
      station-oriented skb gets queued up on
      vif-oriented txq. The ieee80211_tx_dequeue() ends
      up looking at info->control.flags and tries to use
      txq->sta which in the fail case is NULL.
      
      It's probably pointless to pretend one can
      downgrade skb from sta-txq to vif-txq.
      
      Since downgrading unicast traffic to vif->txq must
      not be done there's no txq to put a frame on if
      sta->uploaded is false. Therefore the code is made
      to fall back to regular tx() op path if the
      described condition is hit.
      
      Only drivers using wake_tx_queue were affected.
      
      Example crash dump before fix:
      
       Unable to handle kernel paging request at virtual address ffffe26c
       PC is at ieee80211_tx_dequeue+0x204/0x690 [mac80211]
       [<bf4252a4>] (ieee80211_tx_dequeue [mac80211]) from
       [<bf4b1388>] (ath10k_mac_tx_push_txq+0x54/0x1c0 [ath10k_core])
       [<bf4b1388>] (ath10k_mac_tx_push_txq [ath10k_core]) from
       [<bf4bdfbc>] (ath10k_htt_txrx_compl_task+0xd78/0x11d0 [ath10k_core])
       [<bf4bdfbc>] (ath10k_htt_txrx_compl_task [ath10k_core])
       [<bf51c5a4>] (ath10k_pci_napi_poll+0x54/0xe8 [ath10k_pci])
       [<bf51c5a4>] (ath10k_pci_napi_poll [ath10k_pci]) from
       [<c0572e90>] (net_rx_action+0xac/0x160)
      Reported-by: NMohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      dbef5362
    • F
      mac80211: initialize SMPS field in HT capabilities · 43071d8f
      Felix Fietkau 提交于
      ibss and mesh modes copy the ht capabilites from the band without
      overriding the SMPS state. Unfortunately the default value 0 for the
      SMPS field means static SMPS instead of disabled.
      
      This results in HT ibss and mesh setups using only single-stream rates,
      even though SMPS is not supposed to be active.
      
      Initialize SMPS to disabled for all bands on ieee80211_hw_register to
      ensure that the value is sane where it is not overriden with the real
      SMPS state.
      Reported-by: NElektra Wagenrad <onelektra@gmx.net>
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      [move VHT TODO comment to a better place]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      43071d8f
    • S
      svcrdma: avoid duplicate dma unmapping during error recovery · ce1ca7d2
      Sriharsha Basavapatna 提交于
      In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
      invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
      svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
      ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
      problems when the iova being unmapped is subsequently reused. Remove
      the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
      handle it.
      
      Fixes: 412a15c0 ("svcrdma: Port to new memory registration API")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ce1ca7d2
    • S
      sunrpc: don't call sleeping functions from the notifier block callbacks · 546125d1
      Scott Mayhew 提交于
      The inet6addr_chain is an atomic notifier chain, so we can't call
      anything that might sleep (like lock_sock)... instead of closing the
      socket from svc_age_temp_xprts_now (which is called by the notifier
      function), just have the rpc service threads do it instead.
      
      Cc: stable@vger.kernel.org
      Fixes: c3d4879e "sunrpc: Add a function to close..."
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      546125d1
    • J
      svcrpc: don't leak contexts on PROC_DESTROY · 78794d18
      J. Bruce Fields 提交于
      Context expiry times are in units of seconds since boot, not unix time.
      
      The use of get_seconds() here therefore sets the expiry time decades in
      the future.  This prevents timely freeing of contexts destroyed by
      client RPC_GSS_PROC_DESTROY requests.  We'd still free them eventually
      (when the module is unloaded or the container shut down), but a lot of
      contexts could pile up before then.
      
      Cc: stable@vger.kernel.org
      Fixes: c5b29f88 "sunrpc: use seconds since boot in expiry cache"
      Reported-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      78794d18
    • D
      net: ipv4: fix table id in getroute response · 8a430ed5
      David Ahern 提交于
      rtm_table is an 8-bit field while table ids are allowed up to u32. Commit
      709772e6 ("net: Fix routing tables with id > 255 for legacy software")
      added the preference to set rtm_table in dumps to RT_TABLE_COMPAT if the
      table id is > 255. The table id returned on get route requests should do
      the same.
      
      Fixes: c36ba660 ("net: Allow user to get table id from route lookup")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a430ed5
    • D
      net: lwtunnel: Handle lwtunnel_fill_encap failure · ea7a8085
      David Ahern 提交于
      Handle failure in lwtunnel_fill_encap adding attributes to skb.
      
      Fixes: 571e7226 ("ipv4: support for fib route lwtunnel encap attributes")
      Fixes: 19e42e45 ("ipv6: support for fib route lwtunnel encap attributes")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea7a8085
  9. 11 1月, 2017 13 次提交