1. 27 2月, 2017 15 次提交
    • S
      mac80211: shorten debug message · 2595d259
      Sara Sharon 提交于
      Tracing is limited to 100 characters and this message passes
      the limit when there are a few buffered frames. Shorten it.
      Signed-off-by: NSara Sharon <sara.sharon@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      2595d259
    • E
      mac80211: fix power saving clients handling in iwlwifi · d98937f4
      Emmanuel Grumbach 提交于
      iwlwifi now supports RSS and can't let mac80211 track the
      PS state based on the Rx frames since they can come out of
      order. iwlwifi is now advertising AP_LINK_PS, and uses
      explicit notifications to teach mac80211 about the PS state
      of the stations and the PS poll / uAPSD trigger frames
      coming our way from the peers.
      
      Because of that, the TIM stopped being maintained in
      mac80211. I tried to fix this in commit c68df2e7
      ("mac80211: allow using AP_LINK_PS with mac80211-generated TIM IE")
      but that was later reverted by Felix in commit 6c18a6b4
      ("Revert "mac80211: allow using AP_LINK_PS with mac80211-generated TIM IE")
      since it broke drivers that do not implement set_tim.
      
      Since none of the drivers that set AP_LINK_PS have the
      set_tim() handler set besides iwlwifi, I can bail out in
      __sta_info_recalc_tim if AP_LINK_PS AND .set_tim is not
      implemented.
      Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      d98937f4
    • F
      mac80211: don't handle filtered frames within a BA session · 890030d3
      Felix Fietkau 提交于
      When running a BA session, the driver (or the hardware) already takes
      care of retransmitting failed frames, since it has to keep the receiver
      reorder window in sync.
      
      Adding another layer of retransmit around that does not improve
      anything. In fact, it can only lead to some strong reordering with huge
      latency.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      890030d3
    • S
      mac80211: don't reorder frames with SN smaller than SSN · b7540d8f
      Sara Sharon 提交于
      When RX aggregation starts, transmitter may continue send frames
      with SN smaller than SSN until the AddBA response is received.
      However, the reorder buffer is already initialized at this point,
      which will cause the drop of such frames as duplicates since the
      head SN of the reorder buffer is set to the SSN, which is bigger.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSara Sharon <sara.sharon@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b7540d8f
    • M
      mac80211: flush delayed work when entering suspend · a9e9200d
      Matt Chen 提交于
      The issue was found when entering suspend and resume.
      It triggers a warning in:
      mac80211/key.c: ieee80211_enable_keys()
      ...
      WARN_ON_ONCE(sdata->crypto_tx_tailroom_needed_cnt ||
                   sdata->crypto_tx_tailroom_pending_dec);
      ...
      
      It points out sdata->crypto_tx_tailroom_pending_dec isn't cleaned up successfully
      in a delayed_work during suspend. Add a flush_delayed_work to fix it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMatt Chen <matt.chen@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      a9e9200d
    • J
      mac80211: fix packet statistics for fast-RX · 0328edc7
      Johannes Berg 提交于
      When adding per-CPU statistics, which added statistics back
      to mac80211 for the fast-RX path, I evidently forgot to add
      the "stats->packets++" line. The reason for that is likely
      that I didn't see it since it's done in defragmentation for
      the regular RX path.
      
      Add the missing line to properly count received packets in
      the fast-RX case.
      
      Fixes: c9c5962b ("mac80211: enable collecting station statistics per-CPU")
      Reported-by: NOren Givon <oren.givon@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0328edc7
    • P
      l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv · 51fb60eb
      Paul Hüber 提交于
      l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
      The return value is passed up to ip_local_deliver_finish, which treats
      negative values as an IP protocol number for resubmission.
      Signed-off-by: NPaul Hüber <phueber@kernsp.in>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51fb60eb
    • J
      xfrm: provide correct dst in xfrm_neigh_lookup · 1ecc9ad0
      Julian Anastasov 提交于
      Fix xfrm_neigh_lookup to provide dst->path to the
      neigh_lookup dst_ops method.
      
      When skb is provided, the IP address in packet should already
      match the dst->path address family. But for the non-skb case,
      we should consider the last tunnel address as nexthop address.
      
      Fixes: f894cbf8 ("net: Add optional SKB arg to dst_ops->neigh_lookup().")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ecc9ad0
    • R
      net sched actions: do not overwrite status of action creation. · 37f1c63e
      Roman Mashak 提交于
      nla_memdup_cookie was overwriting err value, declared at function
      scope and earlier initialized with result of ->init(). At success
      nla_memdup_cookie() returns 0, and thus module refcnt decremented,
      although the action was installed.
      
      $ sudo tc actions add action pass index 1 cookie 1234
      $ sudo tc actions ls action gact
      
              action order 0: gact action pass
               random type none pass val 0
               index 1 ref 1 bind 0
      $
      $ lsmod
      Module                  Size  Used by
      act_gact               16384  0
      ...
      $
      $ sudo rmmod act_gact
      [   52.310283] ------------[ cut here ]------------
      [   52.312551] WARNING: CPU: 1 PID: 455 at kernel/module.c:1113
      module_put+0x99/0xa0
      [   52.316278] Modules linked in: act_gact(-) crct10dif_pclmul crc32_pclmul
      ghash_clmulni_intel psmouse pcbc evbug aesni_intel aes_x86_64 crypto_simd
      serio_raw glue_helper pcspkr cryptd
      [   52.322285] CPU: 1 PID: 455 Comm: rmmod Not tainted 4.10.0+ #11
      [   52.324261] Call Trace:
      [   52.325132]  dump_stack+0x63/0x87
      [   52.326236]  __warn+0xd1/0xf0
      [   52.326260]  warn_slowpath_null+0x1d/0x20
      [   52.326260]  module_put+0x99/0xa0
      [   52.326260]  tcf_hashinfo_destroy+0x7f/0x90
      [   52.326260]  gact_exit_net+0x27/0x40 [act_gact]
      [   52.326260]  ops_exit_list.isra.6+0x38/0x60
      [   52.326260]  unregister_pernet_operations+0x90/0xe0
      [   52.326260]  unregister_pernet_subsys+0x21/0x30
      [   52.326260]  tcf_unregister_action+0x68/0xa0
      [   52.326260]  gact_cleanup_module+0x17/0xa0f [act_gact]
      [   52.326260]  SyS_delete_module+0x1ba/0x220
      [   52.326260]  entry_SYSCALL_64_fastpath+0x1e/0xad
      [   52.326260] RIP: 0033:0x7f527ffae367
      [   52.326260] RSP: 002b:00007ffeb402a598 EFLAGS: 00000202 ORIG_RAX:
      00000000000000b0
      [   52.326260] RAX: ffffffffffffffda RBX: 0000559b069912a0 RCX: 00007f527ffae367
      [   52.326260] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559b06991308
      [   52.326260] RBP: 0000000000000003 R08: 00007f5280264420 R09: 00007ffeb4029511
      [   52.326260] R10: 000000000000087b R11: 0000000000000202 R12: 00007ffeb4029580
      [   52.326260] R13: 0000000000000000 R14: 0000000000000000 R15: 0000559b069912a0
      [   52.354856] ---[ end trace 90d89401542b0db6 ]---
      $
      
      With the fix:
      
      $ sudo modprobe act_gact
      $ lsmod
      Module                  Size  Used by
      act_gact               16384  0
      ...
      $ sudo tc actions add action pass index 1 cookie 1234
      $ sudo tc actions ls action gact
      
              action order 0: gact action pass
               random type none pass val 0
               index 1 ref 1 bind 0
      $
      $ lsmod
      Module                  Size  Used by
      act_gact               16384  1
      ...
      $ sudo rmmod act_gact
      rmmod: ERROR: Module act_gact is in use
      $
      $ sudo /home/mrv/bin/tc actions del action gact index 1
      $ sudo rmmod act_gact
      $ lsmod
      Module                  Size  Used by
      $
      
      Fixes: 1045ba77 ("net sched actions: Add support for user cookies")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37f1c63e
    • D
      rxrpc: Kernel calls get stuck in recvmsg · d7e15835
      David Howells 提交于
      Calls made through the in-kernel interface can end up getting stuck because
      of a missed variable update in a loop in rxrpc_recvmsg_data().  The problem
      is like this:
      
       (1) A new packet comes in and doesn't cause a notification to be given to
           the client as there's still another packet in the ring - the
           assumption being that if the client will keep drawing off data until
           the ring is empty.
      
       (2) The client is in rxrpc_recvmsg_data(), inside the big while loop that
           iterates through the packets.  This copies the window pointers into
           variables rather than using the information in the call struct
           because:
      
           (a) MSG_PEEK might be in effect;
      
           (b) we need a barrier after reading call->rx_top to pair with the
           	 barrier in the softirq routine that loads the buffer.
      
       (3) The reading of call->rx_top is done outside of the loop, and top is
           never updated whilst we're in the loop.  This means that even through
           there's a new packet available, we don't see it and may return -EFAULT
           to the caller - who will happily return to the scheduler and await the
           next notification.
      
       (4) No further notifications are forthcoming until there's an abort as the
           ring isn't empty.
      
      The fix is to move the read of call->rx_top inside the loop - but it needs
      to be done before the condition is checked.
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e15835
    • R
      net sched actions: decrement module reference count after table flush. · edb9d1bf
      Roman Mashak 提交于
      When tc actions are loaded as a module and no actions have been installed,
      flushing them would result in actions removed from the memory, but modules
      reference count not being decremented, so that the modules would not be
      unloaded.
      
      Following is example with GACT action:
      
      % sudo modprobe act_gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions ls action gact
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  1
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  2
      % sudo rmmod act_gact
      rmmod: ERROR: Module act_gact is in use
      ....
      
      After the fix:
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions add action pass index 1
      % sudo tc actions add action pass index 2
      % sudo tc actions add action pass index 3
      % lsmod
      Module                  Size  Used by
      act_gact               16384  3
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      %
      % sudo tc actions flush action gact
      % lsmod
      Module                  Size  Used by
      act_gact               16384  0
      % sudo rmmod act_gact
      % lsmod
      Module                  Size  Used by
      %
      
      Fixes: f97017cd ("net-sched: Fix actions flushing")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb9d1bf
    • X
      ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt · 99253eb7
      Xin Long 提交于
      Commit 5e1859fb ("ipv4: ipmr: various fixes and cleanups") fixed
      the issue for ipv4 ipmr:
      
        ip_mroute_setsockopt() & ip_mroute_getsockopt() should not
        access/set raw_sk(sk)->ipmr_table before making sure the socket
        is a raw socket, and protocol is IGMP
      
      The same fix should be done for ipv6 ipmr as well.
      
      This patch can fix the panic caused by overwriting the same offset
      as ipmr_table as in raw_sk(sk) when accessing other type's socket
      by ip_mroute_setsockopt().
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99253eb7
    • X
      sctp: set sin_port for addr param when checking duplicate address · 2e3ce5bc
      Xin Long 提交于
      Commit b8607805 ("sctp: not copying duplicate addrs to the assoc's
      bind address list") tried to check for duplicate address before copying
      to asoc's bind_addr list from global addr list.
      
      But all the addrs' sin_ports in global addr list are 0 while the addrs'
      sin_ports are bp->port in asoc's bind_addr list. It means even if it's
      a duplicate address, af->cmp_addr will still return 0 as the their
      sin_ports are different.
      
      This patch is to fix it by setting the sin_port for addr param with
      bp->port before comparing the addrs.
      
      Fixes: b8607805 ("sctp: not copying duplicate addrs to the assoc's bind address list")
      Reported-by: NWei Chen <weichen@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e3ce5bc
    • J
      ipv4: mask tos for input route · 6e28099d
      Julian Anastasov 提交于
      Restore the lost masking of TOS in input route code to
      allow ip rules to match it properly.
      
      Problem [1] noticed by Shmulik Ladkani <shmulik.ladkani@gmail.com>
      
      [1] http://marc.info/?t=137331755300040&r=1&w=2
      
      Fixes: 89aef892 ("ipv4: Delete routing cache.")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e28099d
    • J
      ipv4: add missing initialization for flowi4_uid · 8bcfd092
      Julian Anastasov 提交于
      Avoid matching of random stack value for uid when rules
      are looked up on input route or when RP filter is used.
      Problem should affect only setups that use ip rules with
      uid range.
      
      Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bcfd092
  2. 25 2月, 2017 6 次提交
    • Z
      rds: fix memory leak error · 3b5923f0
      Zhu Yanjun 提交于
      When the function register_netdevice_notifier fails, the memory
      allocated by kmem_cache_create should be freed by the function
      kmem_cache_destroy.
      
      Cc: Joe Jin <joe.jin@oracle.com>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b5923f0
    • D
      vti6: return GRE_KEY for vti6 · 7dcdf941
      David Forster 提交于
      Align vti6 with vti by returning GRE_KEY flag. This enables iproute2
      to display tunnel keys on "ip -6 tunnel show"
      Signed-off-by: NDavid Forster <dforster@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dcdf941
    • M
      rxrpc: Fix an assertion in rxrpc_read() · 774521f3
      Marc Dionne 提交于
      In the rxrpc_read() function, which allows a user to read the contents of a
      key, we miscalculate the expected length of an encoded rxkad token by not
      taking into account the key length.  However, the data is stored later
      anyway with an ENCODE_DATA() call - and an assertion failure then ensues
      when the lengths are checked at the end.
      
      Fix this by including the key length in the token size estimation.
      
      The following assertion is produced:
      
      Assertion failed - 384(0x180) == 380(0x17c) is false
      ------------[ cut here ]------------
      kernel BUG at ../net/rxrpc/key.c:1221!
      invalid opcode: 0000 [#1] SMP
      Modules linked in:
      CPU: 2 PID: 2957 Comm: keyctl Not tainted 4.10.0-fscache+ #483
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804013a8500 task.stack: ffff8804013ac000
      RIP: 0010:rxrpc_read+0x10de/0x11b6
      RSP: 0018:ffff8804013afe48 EFLAGS: 00010296
      RAX: 000000000000003b RBX: 0000000000000003 RCX: 0000000000000000
      RDX: 0000000000040001 RSI: 00000000000000f6 RDI: 0000000000000300
      RBP: ffff8804013afed8 R08: 0000000000000001 R09: 0000000000000001
      R10: ffff8804013afd90 R11: 0000000000000002 R12: 00005575f7c911b4
      R13: 00005575f7c911b3 R14: 0000000000000157 R15: ffff880408a5d640
      FS:  00007f8dfbc73700(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005575f7c91008 CR3: 000000040120a000 CR4: 00000000001406e0
      Call Trace:
       keyctl_read_key+0xb6/0xd7
       SyS_keyctl+0x83/0xe7
       do_syscall_64+0x80/0x191
       entry_SYSCALL64_slow_path+0x25/0x25
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      774521f3
    • J
      tipc: move premature initilalization of stack variables · 681a55d7
      Jon Paul Maloy 提交于
      In the function tipc_rcv() we initialize a couple of stack variables
      from the message header before that same header has been validated.
      In rare cases when the arriving header is non-linar, the validation
      function itself may linearize the buffer by calling skb_may_pull(),
      while the wrongly initialized stack fields are not updated accordingly.
      
      We fix this in this commit.
      Reported-by: NMatthew Wong <mwong@sonusnet.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      681a55d7
    • W
      RDS: IB: fix ifnullfree.cocci warnings · 77cc7aee
      Wu Fengguang 提交于
      net/rds/ib.c:115:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
      
       NULL check before some freeing functions is not needed.
      
       Based on checkpatch warning
       "kfree(NULL) is safe this check is probably not required"
       and kfreeaddr.cocci by Julia Lawall.
      
      Generated by: scripts/coccinelle/free/ifnullfree.cocci
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77cc7aee
    • M
      sctp: deny peeloff operation on asocs with threads sleeping on it · dfcb9f4f
      Marcelo Ricardo Leitner 提交于
      commit 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      attempted to avoid a BUG_ON call when the association being used for a
      sendmsg() is blocked waiting for more sndbuf and another thread did a
      peeloff operation on such asoc, moving it to another socket.
      
      As Ben Hutchings noticed, then in such case it would return without
      locking back the socket and would cause two unlocks in a row.
      
      Further analysis also revealed that it could allow a double free if the
      application managed to peeloff the asoc that is created during the
      sendmsg call, because then sctp_sendmsg() would try to free the asoc
      that was created only for that call.
      
      This patch takes another approach. It will deny the peeloff operation
      if there is a thread sleeping on the asoc, so this situation doesn't
      exist anymore. This avoids the issues described above and also honors
      the syscalls that are already being handled (it can be multiple sendmsg
      calls).
      
      Joint work with Xin Long.
      
      Fixes: 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfcb9f4f
  3. 24 2月, 2017 1 次提交
  4. 23 2月, 2017 4 次提交
    • A
      tcp: account for ts offset only if tsecr not zero · eee2faab
      Alexey Kodanev 提交于
      We can get SYN with zero tsecr, don't apply offset in this case.
      
      Fixes: ee684b6f ("tcp: send packets with a socket timestamp")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eee2faab
    • A
      tcp: setup timestamp offset when write_seq already set · 00355fa5
      Alexey Kodanev 提交于
      Found that when randomized tcp offsets are enabled (by default)
      TCP client can still start new connections without them. Later,
      if server does active close and re-uses sockets in TIME-WAIT
      state, new SYN from client can be rejected on PAWS check inside
      tcp_timewait_state_process(), because either tw_ts_recent or
      rcv_tsval doesn't really have an offset set.
      
      Here is how to reproduce it with LTP netstress tool:
          netstress -R 1 &
          netstress -H 127.0.0.1 -lr 1000000 -a1
      
          [...]
          < S  seq 1956977072 win 43690 TS val 295618 ecr 459956970
          > .  ack 1956911535 win 342 TS val 459967184 ecr 1547117608
          < R  seq 1956911535 win 0 length 0
      +1. < S  seq 1956977072 win 43690 TS val 296640 ecr 459956970
          > S. seq 657450664 ack 1956977073 win 43690 TS val 459968205 ecr 296640
      
      Fixes: 95a22cae ("tcp: randomize tcp timestamp offsets for each connection")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00355fa5
    • A
      net/dccp: fix use after free in tw_timer_handler() · ec7cb62d
      Andrey Ryabinin 提交于
      DCCP doesn't purge timewait sockets on network namespace shutdown.
      So, after net namespace destroyed we could still have an active timer
      which will trigger use after free in tw_timer_handler():
      
          BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
          Read of size 8 by task swapper/1/0
          Call Trace:
           __asan_load8+0x54/0x90
           tw_timer_handler+0x4a/0xa0
           call_timer_fn+0x127/0x480
           expire_timers+0x1db/0x2e0
           run_timer_softirq+0x12f/0x2a0
           __do_softirq+0x105/0x5b4
           irq_exit+0xdd/0xf0
           smp_apic_timer_interrupt+0x57/0x70
           apic_timer_interrupt+0x90/0xa0
      
          Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
          Allocated:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xee/0x180
           kasan_slab_alloc+0x12/0x20
           kmem_cache_alloc+0x134/0x310
           copy_net_ns+0x8d/0x280
           create_new_namespaces+0x23f/0x340
           unshare_nsproxy_namespaces+0x75/0xf0
           SyS_unshare+0x299/0x4f0
           entry_SYSCALL_64_fastpath+0x18/0xad
          Freed:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xae/0x180
           kmem_cache_free+0xb4/0x350
           net_drop_ns+0x3f/0x50
           cleanup_net+0x3df/0x450
           process_one_work+0x419/0xbb0
           worker_thread+0x92/0x850
           kthread+0x192/0x1e0
           ret_from_fork+0x2e/0x40
      
      Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
      timewait sockets on net namespace destruction and prevent above issue.
      
      Fixes: f2bf415c ("mib: add net to NET_ADD_STATS_BH")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec7cb62d
    • R
      l2tp: Avoid schedule while atomic in exit_net · 12d656af
      Ridge Kennedy 提交于
      While destroying a network namespace that contains a L2TP tunnel a
      "BUG: scheduling while atomic" can be observed.
      
      Enabling lockdep shows that this is happening because l2tp_exit_net()
      is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from
      within an RCU critical section.
      
      l2tp_exit_net() takes rcu_read_lock_bh()
        << list_for_each_entry_rcu() >>
        l2tp_tunnel_delete()
          l2tp_tunnel_closeall()
            __l2tp_session_unhash()
              synchronize_rcu() << Illegal inside RCU critical section >>
      
      BUG: sleeping function called from invalid context
      in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2
      INFO: lockdep is turned off.
      CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G        W  O    4.4.6-at1 #2
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016
      Workqueue: netns cleanup_net
       0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0
       ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8
       0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024
      Call Trace:
       [<ffffffff812b0013>] dump_stack+0x85/0xc2
       [<ffffffff8107aee8>] ___might_sleep+0x148/0x240
       [<ffffffff8107b024>] __might_sleep+0x44/0x80
       [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0
       [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0
       [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40
       [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220
       [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220
       [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140
       [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60
       [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270
       [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270
       [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60
       [<ffffffff81501166>] cleanup_net+0x1b6/0x280
       ...
      
      This bug can easily be reproduced with a few steps:
      
       $ sudo unshare -n bash  # Create a shell in a new namespace
       # ip link set lo up
       # ip addr add 127.0.0.1 dev lo
       # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \
          peer_tunnel_id 1 udp_sport 50000 udp_dport 50000
       # ip l2tp add session name foo tunnel_id 1 session_id 1 \
          peer_session_id 1
       # ip link set foo up
       # exit  # Exit the shell, in turn exiting the namespace
       $ dmesg
       ...
       [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200
       ...
      
      To fix this, move the call to l2tp_tunnel_closeall() out of the RCU
      critical section, and instead call it from l2tp_tunnel_del_work(), which
      is running from the l2tp_wq workqueue.
      
      Fixes: 2b551c6e ("l2tp: close sessions before initiating tunnel delete")
      Signed-off-by: NRidge Kennedy <ridge.kennedy@alliedtelesis.co.nz>
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d656af
  5. 22 2月, 2017 7 次提交
  6. 21 2月, 2017 2 次提交
  7. 20 2月, 2017 5 次提交
    • X
      sctp: add support for MSG_MORE · 4ea0c32f
      Xin Long 提交于
      This patch is to add support for MSG_MORE on sctp.
      
      It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
      creating datamsg according to the send flag. sctp_packet_can_append_data
      then uses it to decide if the chunks of this msg will be sent at once or
      delay it.
      
      Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
      in assoc. As sctp enqueues the chunks first, then dequeue them one by
      one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
      affect other chunks' bundling.
      
      Since last patch, sctp flush out queue once assoc state falls into
      SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
      solved as well.
      
      [1] https://patchwork.ozlabs.org/patch/372404/Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea0c32f
    • X
      sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING · b7018d0b
      Xin Long 提交于
      This patch is to flush out queue when assoc state falls into
      SHUTDOWN_PENDING if there are still chunks in it, so that the
      data can be sent out as soon as possible before sending SHUTDOWN
      chunk.
      
      When sctp supports MSG_MORE flag in next patch, this improvement
      can also solve the problem that the chunks with MSG_MORE flag
      may be stuck in queue when closing an assoc.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7018d0b
    • J
      openvswitch: Set event bit after initializing labels. · 2317c6b5
      Jarno Rajahalme 提交于
      Connlabels are included in conntrack netlink event messages only if
      the IPCT_LABEL bit is set in the event cache (see
      ctnetlink_conntrack_event()).  Set it after initializing labels for a
      new connection.
      
      Found upon further system testing, where it was noticed that labels
      were missing from the conntrack events.
      
      Fixes: 193e3096 ("openvswitch: Do not trigger events for unconfirmed connections.")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2317c6b5
    • X
      sctp: check duplicate node before inserting a new transport · cd2b7087
      Xin Long 提交于
      sctp has changed to use rhlist for transport rhashtable since commit
      7fda702f ("sctp: use new rhlist interface on sctp transport
      rhashtable").
      
      But rhltable_insert_key doesn't check the duplicate node when inserting
      a node, unlike rhashtable_lookup_insert_key. It may cause duplicate
      assoc/transport in rhashtable. like:
      
       client (addr A, B)                 server (addr X, Y)
          connect to X           INIT (1)
                              ------------>
          connect to Y           INIT (2)
                              ------------>
                               INIT_ACK (1)
                              <------------
                               INIT_ACK (2)
                              <------------
      
      After sending INIT (2), one transport will be created and hashed into
      rhashtable. But when receiving INIT_ACK (1) and processing the address
      params, another transport will be created and hashed into rhashtable
      with the same addr Y and EP as the last transport. This will confuse
      the assoc/transport's lookup.
      
      This patch is to fix it by returning err if any duplicate node exists
      before inserting it.
      
      Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
      Reported-by: NFabio M. Di Nitto <fdinitto@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd2b7087
    • X
      sctp: add reconf chunk event · d884aa63
      Xin Long 提交于
      This patch is to add reconf chunk event based on the sctp event
      frame in rx path, it will call sctp_sf_do_reconf to process the
      reconf chunk.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d884aa63