1. 17 12月, 2013 1 次提交
  2. 12 12月, 2013 5 次提交
  3. 11 12月, 2013 10 次提交
    • E
      udp: ipv4: fix an use after free in __udp4_lib_rcv() · 8afdd99a
      Eric Dumazet 提交于
      Dave Jones reported a use after free in UDP stack :
      
      [ 5059.434216] =========================
      [ 5059.434314] [ BUG: held lock freed! ]
      [ 5059.434420] 3.13.0-rc3+ #9 Not tainted
      [ 5059.434520] -------------------------
      [ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there!
      [ 5059.434815]  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.435012] 3 locks held by named/863:
      [ 5059.435086]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8143054d>] __netif_receive_skb_core+0x11d/0x940
      [ 5059.435295]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81467a5e>] ip_local_deliver_finish+0x3e/0x410
      [ 5059.435500]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.435734]
      stack backtrace:
      [ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365]
      [ 5059.436052] Hardware name:                  /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
      [ 5059.436223]  0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0
      [ 5059.436390]  ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0
      [ 5059.436554]  ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48
      [ 5059.436718] Call Trace:
      [ 5059.436769]  <IRQ>  [<ffffffff8153a372>] dump_stack+0x4d/0x66
      [ 5059.436904]  [<ffffffff8108cafa>] debug_check_no_locks_freed+0x15a/0x160
      [ 5059.437037]  [<ffffffff8141c490>] ? __sk_free+0x110/0x230
      [ 5059.437147]  [<ffffffff8112da2a>] kmem_cache_free+0x6a/0x150
      [ 5059.437260]  [<ffffffff8141c490>] __sk_free+0x110/0x230
      [ 5059.437364]  [<ffffffff8141c5c9>] sk_free+0x19/0x20
      [ 5059.437463]  [<ffffffff8141cb25>] sock_edemux+0x25/0x40
      [ 5059.437567]  [<ffffffff8141c181>] sock_queue_rcv_skb+0x81/0x280
      [ 5059.437685]  [<ffffffff8149bd21>] ? udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.437805]  [<ffffffff81499c82>] __udp_queue_rcv_skb+0x42/0x240
      [ 5059.437925]  [<ffffffff81541d25>] ? _raw_spin_lock+0x65/0x70
      [ 5059.438038]  [<ffffffff8149bebb>] udp_queue_rcv_skb+0x26b/0x4b0
      [ 5059.438155]  [<ffffffff8149c712>] __udp4_lib_rcv+0x152/0xb00
      [ 5059.438269]  [<ffffffff8149d7f5>] udp_rcv+0x15/0x20
      [ 5059.438367]  [<ffffffff81467b2f>] ip_local_deliver_finish+0x10f/0x410
      [ 5059.438492]  [<ffffffff81467a5e>] ? ip_local_deliver_finish+0x3e/0x410
      [ 5059.438621]  [<ffffffff81468653>] ip_local_deliver+0x43/0x80
      [ 5059.438733]  [<ffffffff81467f70>] ip_rcv_finish+0x140/0x5a0
      [ 5059.438843]  [<ffffffff81468926>] ip_rcv+0x296/0x3f0
      [ 5059.438945]  [<ffffffff81430b72>] __netif_receive_skb_core+0x742/0x940
      [ 5059.439074]  [<ffffffff8143054d>] ? __netif_receive_skb_core+0x11d/0x940
      [ 5059.442231]  [<ffffffff8108c81d>] ? trace_hardirqs_on+0xd/0x10
      [ 5059.442231]  [<ffffffff81430d83>] __netif_receive_skb+0x13/0x60
      [ 5059.442231]  [<ffffffff81431c1e>] netif_receive_skb+0x1e/0x1f0
      [ 5059.442231]  [<ffffffff814334e0>] napi_gro_receive+0x70/0xa0
      [ 5059.442231]  [<ffffffffa01de426>] rtl8169_poll+0x166/0x700 [r8169]
      [ 5059.442231]  [<ffffffff81432bc9>] net_rx_action+0x129/0x1e0
      [ 5059.442231]  [<ffffffff810478cd>] __do_softirq+0xed/0x240
      [ 5059.442231]  [<ffffffff81047e25>] irq_exit+0x125/0x140
      [ 5059.442231]  [<ffffffff81004241>] do_IRQ+0x51/0xc0
      [ 5059.442231]  [<ffffffff81542bef>] common_interrupt+0x6f/0x6f
      
      We need to keep a reference on the socket, by using skb_steal_sock()
      at the right place.
      
      Note that another patch is needed to fix a race in
      udp_sk_rx_dst_set(), as we hold no lock protecting the dst.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8afdd99a
    • W
      sctp: fix up a spacing · b486b228
      wangweidong 提交于
      fix up spacing of proc_sctp_do_hmac_alg for according to the
      proc_sctp_do_rto_min[max] in sysctl.c
      Suggested-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b486b228
    • W
      sctp: add check rto_min and rto_max in sysctl · 4f3fdf3b
      wangweidong 提交于
      rto_min should be smaller than rto_max while rto_max should be larger
      than rto_min. Add two proc_handler for the checking.
      Suggested-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f3fdf3b
    • W
      sctp: check the rto_min and rto_max in setsockopt · 85f935d4
      wangweidong 提交于
      When we set 0 to rto_min or rto_max, just not change the value. Also
      we should check the rto_min > rto_max.
      Suggested-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85f935d4
    • F
      ipv6: do not erase dst address with flow label destination · ce7a3bdf
      Florent Fourcot 提交于
      This patch is following b579035f
      	"ipv6: remove old conditions on flow label sharing"
      
      Since there is no reason to restrict a label to a
      destination, we should not erase the destination value of a
      socket with the value contained in the flow label storage.
      
      This patch allows to really have the same flow label to more
      than one destination.
      Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce7a3bdf
    • N
      sctp: properly latch and use autoclose value from sock to association · 9f70f46b
      Neil Horman 提交于
      Currently, sctp associations latch a sockets autoclose value to an association
      at association init time, subject to capping constraints from the max_autoclose
      sysctl value.  This leads to an odd situation where an application may set a
      socket level autoclose timeout, but sliently sctp will limit the autoclose
      timeout to something less than that.
      
      Fix this by modifying the autoclose setsockopt function to check the limit, cap
      it and warn the user via syslog that the timeout is capped.  This will allow
      getsockopt to return valid autoclose timeout values that reflect what subsequent
      associations actually use.
      
      While were at it, also elimintate the assoc->autoclose variable, it duplicates
      whats in the timeout array, which leads to multiple sources for the same
      information, that may differ (as the former isn't subject to any capping).  This
      gives us the timeout information in a canonical place and saves some space in
      the association structure as well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      CC: Wang Weidong <wangweidong1@huawei.com>
      CC: David Miller <davem@davemloft.net>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f70f46b
    • Y
      tipc: protect handler_enabled variable with qitem_lock spin lock · 00ede977
      Ying Xue 提交于
      'handler_enabled' is a global flag indicating whether the TIPC
      signal handling service is enabled or not. The lack of lock
      protection for this flag incurs a risk for contention, so that
      a tipc_k_signal() call might queue a signal handler to a destroyed
      signal queue, with unpredictable results. To correct this, we let
      the already existing 'qitem_lock' protect the flag, as it already
      does with the queue itself. This way, we ensure that the flag
      always is consistent across all cores.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00ede977
    • J
      tipc: correct the order of stopping services at rmmod · 993b858e
      Jon Paul Maloy 提交于
      The 'signal handler' service in TIPC is a mechanism that makes it
      possible to postpone execution of functions, by launcing them into
      a job queue for execution in a separate tasklet, independent of
      the launching execution thread.
      
      When we do rmmod on the tipc module, this service is stopped after
      the network service. At the same time, the stopping of the network
      service may itself launch jobs for execution, with the risk that these
      functions may be scheduled for execution after the data structures
      meant to be accessed by the job have already been deleted. We have
      seen this happen, most often resulting in an oops.
      
      This commit ensures that the signal handler is the very first to be
      stopped when TIPC is shut down, so there are no surprises during
      the cleanup of the other services.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      993b858e
    • S
      net: unix: allow set_peek_off to fail · 12663bfc
      Sasha Levin 提交于
      unix_dgram_recvmsg() will hold the readlock of the socket until recv
      is complete.
      
      In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
      unix_dgram_recvmsg() will complete (which can take a while) without allowing
      us to break out of it, triggering a hung task spew.
      
      Instead, allow set_peek_off to fail, this way userspace will not hang.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12663bfc
    • S
      inet: fix NULL pointer Oops in fib(6)_rule_suppress · 673498b8
      Stefan Tomanek 提交于
      This changes ensures that the routing entry investigated by the suppress
      function actually does point to a device struct before following that pointer,
      fixing a possible kernel oops situation when verifying the interface group
      associated with a routing table entry.
      
      According to Daniel Golle, this Oops can be triggered by a user process trying
      to establish an outgoing IPv6 connection while having no real IPv6 connectivity
      set up (only autoassigned link-local addresses).
      
      Fixes: 6ef94cfa ("fib_rules: add route suppression based on ifgroup")
      Reported-by: NDaniel Golle <daniel.golle@gmail.com>
      Tested-by: NDaniel Golle <daniel.golle@gmail.com>
      Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      673498b8
  4. 10 12月, 2013 3 次提交
  5. 08 12月, 2013 2 次提交
    • P
      netfilter: nf_tables: fix missing rules flushing per table · cf9dc09d
      Pablo Neira Ayuso 提交于
      This patch allows you to atomically remove all rules stored in
      a table via the NFT_MSG_DELRULE command. You only need to indicate
      the specific table and no chain to flush all rules stored in that
      table.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cf9dc09d
    • S
      netfilter: xt_hashlimit: fix proc entry leak in netns destroy path · b4ef4ce0
      Sergey Popovich 提交于
      In (32263dd1 netfilter: xt_hashlimit: fix namespace destroy path)
      the hashlimit_net_exit() function is always called right before
      hashlimit_mt_destroy() to release netns data. If you use xt_hashlimit
      with IPv4 and IPv6 together, this produces the following splat via
      netconsole in the netns destroy path:
      
       Pid: 9499, comm: kworker/u:0 Tainted: G        WC O 3.2.0-5-netctl-amd64-core2
       Call Trace:
        [<ffffffff8104708d>] ? warn_slowpath_common+0x78/0x8c
        [<ffffffff81047139>] ? warn_slowpath_fmt+0x45/0x4a
        [<ffffffff81144a99>] ? remove_proc_entry+0xd8/0x22e
        [<ffffffff810ebbaa>] ? kfree+0x5b/0x6c
        [<ffffffffa043c501>] ? hashlimit_net_exit+0x45/0x8d [xt_hashlimit]
        [<ffffffff8128ab30>] ? ops_exit_list+0x1c/0x44
        [<ffffffff8128b28e>] ? cleanup_net+0xf1/0x180
        [<ffffffff810369fc>] ? should_resched+0x5/0x23
        [<ffffffff8105b8f9>] ? process_one_work+0x161/0x269
        [<ffffffff8105aea5>] ? cwq_activate_delayed_work+0x3c/0x48
        [<ffffffff8105c8c2>] ? worker_thread+0xc2/0x145
        [<ffffffff8105c800>] ? manage_workers.isra.25+0x15b/0x15b
        [<ffffffff8105fa01>] ? kthread+0x76/0x7e
        [<ffffffff813581f4>] ? kernel_thread_helper+0x4/0x10
        [<ffffffff8105f98b>] ? kthread_worker_fn+0x139/0x139
        [<ffffffff813581f0>] ? gs_change+0x13/0x13
       ---[ end trace d8c3cc0ad163ef79 ]---
       ------------[ cut here ]------------
       WARNING: at /usr/src/linux-3.2.52/debian/build/source_netctl/fs/proc/generic.c:849
       remove_proc_entry+0x217/0x22e()
       Hardware name:
       remove_proc_entry: removing non-empty directory 'net/ip6t_hashlimit', leaking at least 'IN-REJECT'
      
      This is due to lack of removal net/ip6t_hashlimit/* entries in
      hashlimit_proc_net_exit(), since only IPv4 entries are deleted. Fix
      it by always removing the IPv4 and IPv6 entries and their parent
      directories in the netns destroy path.
      Signed-off-by: NSergey Popovich <popovich_sergei@mail.ru>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b4ef4ce0
  6. 07 12月, 2013 1 次提交
  7. 06 12月, 2013 8 次提交
  8. 04 12月, 2013 1 次提交
    • V
      rds: prevent BUG_ON triggered on congestion update to loopback · 18fc25c9
      Venkat Venkatsubra 提交于
      After congestion update on a local connection, when rds_ib_xmit returns
      less bytes than that are there in the message, rds_send_xmit calls
      back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
      to trigger.
      
      For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
      the message contains 8240 bytes. rds_send_xmit thinks there is more to send
      and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
      =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
      
      The commit 6094628b
      "rds: prevent BUG_ON triggering on congestion map updates" introduced
      this regression. That change was addressing the triggering of a different
      BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
       	BUG_ON(ret != 0 &&
          		 conn->c_xmit_sg == rm->data.op_nents);
      This was the sequence it was going through:
      (rds_ib_xmit)
      /* Do not send cong updates to IB loopback */
      if (conn->c_loopback
         && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
        	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
          	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
      }
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
         		 = 8192
        c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
        and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
      rds_ib_xmit returns 8240
      On this iteration this sequence causes the BUG_ON in rds_send_xmit:
          while (ret) {
          	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
          	[tmp = 65536 - 57632 = 7904]
          	conn->c_xmit_data_off += tmp;
          	[c_xmit_data_off = 57632 + 7904 = 65536]
          	ret -= tmp;
          	[ret = 8240 - 7904 = 336]
          	if (conn->c_xmit_data_off == sg->length) {
          		conn->c_xmit_data_off = 0;
          		sg++;
          		conn->c_xmit_sg++;
          		BUG_ON(ret != 0 &&
          			conn->c_xmit_sg == rm->data.op_nents);
          		[c_xmit_sg = 1, rm->data.op_nents = 1]
      
      What the current fix does:
      Since the congestion update over loopback is not actually transmitted
      as a message, all that rds_ib_xmit needs to do is let the caller think
      the full message has been transmitted and not return partial bytes.
      It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
      And 64Kb+48 when page size is 64Kb.
      Reported-by: NJosh Hunt <joshhunt00@gmail.com>
      Tested-by: NHonggang Li <honli@redhat.com>
      Acked-by: NBang Nguyen <bang.nguyen@oracle.com>
      Signed-off-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18fc25c9
  9. 03 12月, 2013 3 次提交
  10. 02 12月, 2013 3 次提交
  11. 01 12月, 2013 3 次提交