1. 30 8月, 2012 12 次提交
  2. 27 8月, 2012 1 次提交
  3. 23 8月, 2012 8 次提交
    • P
      packet: Protect packet sk list with mutex (v2) · 0fa7fa98
      Pavel Emelyanov 提交于
      Change since v1:
      
      * Fixed inuse counters access spotted by Eric
      
      In patch eea68e2f (packet: Report socket mclist info via diag module) I've
      introduced a "scheduling in atomic" problem in packet diag module -- the
      socket list is traversed under rcu_read_lock() while performed under it sk
      mclist access requires rtnl lock (i.e. -- mutex) to be taken.
      
      [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
      [152363.820573] 4 locks held by crtools/12517:
      [152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
      [152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
      [152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
      [152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
      
      Similar thing was then re-introduced by further packet diag patches (fanount
      mutex and pgvec mutex for rings) :(
      
      Apart from being terribly sorry for the above, I propose to change the packet
      sk list protection from spinlock to mutex. This lock currently protects two
      modifications:
      
      * sklist
      * prot inuse counters
      
      The sklist modifications can be just reprotected with mutex since they already
      occur in a sleeping context. The inuse counters modifications are trickier -- the
      __this_cpu_-s are used inside, thus requiring the caller to handle the potential
      issues with contexts himself. Since packet sockets' counters are modified in two
      places only (packet_create and packet_release) we only need to protect the context
      from being preempted. BH disabling is not required in this case.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fa7fa98
    • D
      af_packet: use define instead of constant · 9e67030a
      danborkmann@iogearbox.net 提交于
      Instead of using a hard-coded value for the status variable, it would make
      the code more readable to use its destined define from linux/if_packet.h.
      
      Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e67030a
    • Y
      rds: Don't disable BH on BH context · bfdc587c
      Ying Xue 提交于
      Since we have already in BH context when *_write_space(),
      *_data_ready() as well as *_state_change() are called, it's
      unnecessary to disable BH.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfdc587c
    • E
      ipv6: gre: fix ip6gre_err() · b87fb39e
      Eric Dumazet 提交于
      ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
      not 40 as expected), and should take into account 'offset' parameter.
      
      Also uses pskb_may_pull() to cope with some fragged skbs
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87fb39e
    • E
      xfrm: fix RCU bugs · ef8531b6
      Eric Dumazet 提交于
      This patch reverts commit 56892261 (xfrm: Use rcu_dereference_bh to
      deference pointer protected by rcu_read_lock_bh), and fixes bugs
      introduced in commit 418a99ac ( Replace rwlock on xfrm_policy_afinfo
      with rcu )
      
      1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH
      
      2) We must defer some writes after the synchronize_rcu() call or a reader
       can crash dereferencing NULL pointer.
      
      3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
      context, we no longer need to block BH in xfrm_policy_register_afinfo()
      and xfrm_policy_unregister_afinfo()
      
      4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
      xfrm_policy_unregister_afinfo()
      
      5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
        and also move xfrm_policy_get_afinfo() declaration.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Fan Du <fan.du@windriver.com>
      Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef8531b6
    • E
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet 提交于
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0115e8e3
    • J
      netfilter: remove unnecessary goto statement for error recovery · 90efbed1
      Jean Sacren 提交于
      Usually it's a good practice to use goto statement for error recovery
      when initializing the module. This approach could be an overkill if:
      
       1) there is only one fail case;
       2) success and failure use the same return statement.
      
      For a cleaner approach, remove the unnecessary goto statement and
      directly implement error recovery.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      90efbed1
    • M
      netfilter: replace list_for_each_continue_rcu with new interface · 6705e867
      Michael Wang 提交于
      This patch replaces list_for_each_continue_rcu() with
      list_for_each_entry_continue_rcu() to allow removing
      list_for_each_continue_rcu().
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6705e867
  4. 22 8月, 2012 4 次提交
    • E
      af_netlink: force credentials passing [CVE-2012-3520] · e0e3cea4
      Eric Dumazet 提交于
      Pablo Neira Ayuso discovered that avahi and
      potentially NetworkManager accept spoofed Netlink messages because of a
      kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
      to the receiver if the sender did not provide such data, instead of not
      including any such data at all or including the correct data from the
      peer (as it is the case with AF_UNIX).
      
      This bug was introduced in commit 16e57262
      (af_unix: dont send SCM_CREDENTIALS by default)
      
      This patch forces passing credentials for netlink, as
      before the regression.
      
      Another fix would be to not add SCM_CREDENTIALS in
      netlink messages if not provided by the sender, but it
      might break some programs.
      
      With help from Florian Weimer & Petr Matousek
      
      This issue is designated as CVE-2012-3520
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e3cea4
    • E
      ipv4: fix ip header ident selection in __ip_make_skb() · a9915a1b
      Eric Dumazet 提交于
      Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
      memory in __ip_select_ident().
      
      It turns out that __ip_make_skb() called ip_select_ident() before
      properly initializing iph->daddr.
      
      This is a bug uncovered by commit 1d861aa4 (inet: Minimize use of
      cached route inetpeer.)
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131Reported-by: NChristian Casteyde <casteyde.christian@free.fr>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9915a1b
    • C
      ipv4: Use newinet->inet_opt in inet_csk_route_child_sock() · 1a7b27c9
      Christoph Paasch 提交于
      Since 0e734419 ("ipv4: Use inet_csk_route_child_sock() in DCCP and
      TCP."), inet_csk_route_child_sock() is called instead of
      inet_csk_route_req().
      
      However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
      ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
      Thus, inside inet_csk_route_child_sock() opt is always NULL and the
      SRR-options are not respected anymore.
      Packets sent by the server won't have the correct destination-IP.
      
      This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
      inside inet_csk_route_child_sock().
      Reported-by: NLuca Boccassi <luca.boccassi@gmail.com>
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a7b27c9
    • E
      tcp: fix possible socket refcount problem · 144d56e9
      Eric Dumazet 提交于
      Commit 6f458dfb (tcp: improve latencies of timer triggered events)
      added bug leading to following trace :
      
      [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
      [ 2866.131726]
      [ 2866.132188] =========================
      [ 2866.132281] [ BUG: held lock freed! ]
      [ 2866.132281] 3.6.0-rc1+ #622 Not tainted
      [ 2866.132281] -------------------------
      [ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
      [ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
      [ 2866.132281] 4 locks held by kworker/0:1/652:
      [ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
      [ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
      [ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
      [ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
      [ 2866.132281]
      [ 2866.132281] stack backtrace:
      [ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
      [ 2866.132281] Call Trace:
      [ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
      [ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
      [ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
      [ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
      [ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
      [ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
      [ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
      [ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
      [ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
      [ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
      [ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
      [ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
      [ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
      [ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
      [ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
      [ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
      [ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
      [ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
      [ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
      [ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
      [ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
      [ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
      [ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
      [ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
      [ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
      [ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
      [ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
      [ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
      [ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
      [ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
      [ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
      [ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
      [ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
      [ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
      [ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
      [ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
      [ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
      [ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
      [ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
      [ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
      [ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
      [ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
      [ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
      [ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
      [ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
      [ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
      [ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
      [ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
      [ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
      [ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
      [ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
      [ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
      [ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
      [ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
      [ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
      [ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
      [ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
      [ 2866.309689] =============================================================================
      [ 2866.310254] BUG TCP (Not tainted): Object already free
      [ 2866.310254] -----------------------------------------------------------------------------
      [ 2866.310254]
      
      The bug comes from the fact that timer set in sk_reset_timer() can run
      before we actually do the sock_hold(). socket refcount reaches zero and
      we free the socket too soon.
      
      timer handler is not allowed to reduce socket refcnt if socket is owned
      by the user, or we need to change sk_reset_timer() implementation.
      
      We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
      or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags
      
      Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
      was used instead of TCP_DELACK_TIMER_DEFERRED.
      
      For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
      even if not fired from a timer.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      144d56e9
  5. 20 8月, 2012 15 次提交
    • P
      netfilter: nf_conntrack: remove unnecessary RTNL locking · 2834a638
      Patrick McHardy 提交于
      Locking the rtnl was added to nf_conntrack_l{3,4}_proto_unregister()
      for walking the network namespace list. This is not done anymore since
      we have proper namespace support in the protocols now, so we don't
      need to take the RTNL anymore.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2834a638
    • P
      netfilter: sparse endian fixes · fe31d1a8
      Patrick McHardy 提交于
      Fix a couple of endian annotation in net/netfilter:
      
      net/netfilter/nfnetlink_acct.c:82:30: warning: cast to restricted __be64
      net/netfilter/nfnetlink_acct.c:86:30: warning: cast to restricted __be64
      net/netfilter/nfnetlink_cthelper.c:77:28: warning: cast to restricted __be16
      net/netfilter/xt_NFQUEUE.c:46:16: warning: restricted __be32 degrades to integer
      net/netfilter/xt_NFQUEUE.c:60:34: warning: restricted __be32 degrades to integer
      net/netfilter/xt_NFQUEUE.c:68:34: warning: restricted __be32 degrades to integer
      net/netfilter/xt_osf.c:272:55: warning: cast to restricted __be16
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fe31d1a8
    • N
      net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() · fae6ef87
      Neal Cardwell 提交于
      This commit removes the sk_rx_dst_set calls from
      tcp_create_openreq_child(), because at that point the icsk_af_ops
      field of ipv6_mapped TCP sockets has not been set to its proper final
      value.
      
      Instead, to make sure we get the right sk_rx_dst_set variant
      appropriate for the address family of the new connection, we have
      tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
      shortly after the call to tcp_create_openreq_child() returns.
      
      This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
      with the new approach.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Reported-by: NArtem Savkov <artem.savkov@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fae6ef87
    • R
      net/core/dev.c: fix kernel-doc warning · 3de7a37b
      Randy Dunlap 提交于
      Fix kernel-doc warning:
      
      Warning(net/core/dev.c:5745): No description found for parameter 'dev'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc:	"David S. Miller" <davem@davemloft.net>
      Cc:	netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3de7a37b
    • P
      net: ipv6: fix oops in inet_putpeer() · 9d7b0fc1
      Patrick McHardy 提交于
      Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
      a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
      initialized, causing a false positive result from inetpeer_ptr_is_peer(),
      which in turn causes a NULL pointer dereference in inet_putpeer().
      
      Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
      EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
      EIP is at inet_putpeer+0xe/0x16
      EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
      ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
       DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
      CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
      DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      DR6: ffff0ff0 DR7: 00000400
       f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
       f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
       f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
       [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
       [<c038d5f7>] dst_destroy+0x1d/0xa4
       [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
       [<c0396870>] flow_cache_gc_task+0x85/0x9f
       [<c0142d2b>] process_one_work+0x122/0x441
       [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
       [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
       [<c0143e2d>] worker_thread+0x113/0x3cc
      
      Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
      properly initialize the dst's peer pointer.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d7b0fc1
    • J
      caif: Do not dereference NULL in chnl_recv_cb() · d92c7f8a
      Jesper Juhl 提交于
      In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
      which may return NULL, but we do not check for a NULL pointer before
      dereferencing it.
      This patch adds such a NULL check and properly free's allocated memory
      and return an error (-EINVAL) on failure - much better than crashing..
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Acked-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d92c7f8a
    • E
      af_packet: don't emit packet on orig fanout group · c0de08d0
      Eric Leblond 提交于
      If a packet is emitted on one socket in one group of fanout sockets,
      it is transmitted again. It is thus read again on one of the sockets
      of the fanout group. This result in a loop for software which
      generate packets when receiving one.
      This retransmission is not the intended behavior: a fanout group
      must behave like a single socket. The packet should not be
      transmitted on a socket if it originates from a socket belonging
      to the same fanout group.
      
      This patch fixes the issue by changing the transmission check to
      take fanout group info account.
      Reported-by: NAleksandr Kotov <a1k@mail.ru>
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0de08d0
    • Y
      tipc: eliminate configuration for maximum number of name publications · e6a04b1d
      Ying Xue 提交于
      Gets rid of the need for users to specify the maximum number of
      name publications supported by TIPC. TIPC now automatically provides
      support for the maximum number of name publications to 65535.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6a04b1d
    • Y
      tipc: eliminate configuration for maximum number of name subscriptions · 34f256cc
      Ying Xue 提交于
      Gets rid of the need for users to specify the maximum number of
      name subscriptions supported by TIPC. TIPC now automatically provides
      support for the maximum number of name subscriptions to 65535.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34f256cc
    • Y
      tipc: add __read_mostly annotations to several global variables · 61cdd4d8
      Ying Xue 提交于
      Added to the following:
      
       - tipc_random
       - tipc_own_addr
       - tipc_max_ports
       - tipc_net_id
       - tipc_remote_management
       - handler_enabled
      
      The above global variables are read often, but written rarely. Use
      __read_mostly to prevent them being on the same cacheline as another
      variable which is written to often, which would cause cacheline
      bouncing.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61cdd4d8
    • Y
      tipc: convert tipc_nametbl_size type from variable to macro · f046e7d9
      Ying Xue 提交于
      There is nothing changing this variable dynamically, so change
      it to a macro to make that more obvious when reading the code.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f046e7d9
    • Y
      tipc: change tipc_net_start routine return value type · 379c0456
      Ying Xue 提交于
      Since now tipc_net_start() always returns a success code - 0, its
      return value type should be changed from integer to void, which can
      avoid unnecessary check for its return value.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      379c0456
    • Y
      tipc: manually inline single use media_name_valid routine · 38129433
      Ying Xue 提交于
      After eliminating the mechanism which checks whether all letters
      in media name string are within a given character set, the
      media_name_valid routine becomes trivial.  It is also only
      used once, so it is unnecessary to keep it as a separate function.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38129433
    • Y
      tipc: remove pointless name sanity check and tipc_alphabet array · fc073938
      Ying Xue 提交于
      There is no real reason to check whether all letters in the given
      media name and network interface name are within the character set
      defined in tipc_alphabet array. Even if we eliminate the checking,
      the rest of checking conditions in tipc_enable_bearer() can ensure
      we do not enable an invalid or illegal bearer.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc073938
    • Y
      tipc: fix lockdep warning during bearer initialization · 4225a398
      Ying Xue 提交于
      When the lockdep validator is enabled, it will report the below
      warning when we enable a TIPC bearer:
      
      [ INFO: possible irq lock inversion dependency detected ]
      ---------------------------------------------------------
      Possible interrupt unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(ptype_lock);
                                      local_irq_disable();
                                      lock(tipc_net_lock);
                                      lock(ptype_lock);
         <Interrupt>
         lock(tipc_net_lock);
      
        *** DEADLOCK ***
      
      the shortest dependencies between 2nd lock and 1st lock:
        -> (ptype_lock){+.+...} ops: 10 {
      [...]
      SOFTIRQ-ON-W at:
                            [<c1089418>] __lock_acquire+0x528/0x13e0
                            [<c108a360>] lock_acquire+0x90/0x100
                            [<c1553c38>] _raw_spin_lock+0x38/0x50
                            [<c14651ca>] dev_add_pack+0x3a/0x60
                            [<c182da75>] arp_init+0x1a/0x48
                            [<c182dce5>] inet_init+0x181/0x27e
                            [<c1001114>] do_one_initcall+0x34/0x170
                            [<c17f7329>] kernel_init+0x110/0x1b2
                            [<c155b6a2>] kernel_thread_helper+0x6/0x10
      [...]
         ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
         ... acquired at:
          [<c108a360>] lock_acquire+0x90/0x100
          [<c1553c38>] _raw_spin_lock+0x38/0x50
          [<c14651ca>] dev_add_pack+0x3a/0x60
          [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
          [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
          [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
          [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
          [<c148e802>] genl_rcv_msg+0x232/0x280
          [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
          [<c148e5bc>] genl_rcv+0x1c/0x30
          [<c148d144>] netlink_unicast+0x174/0x1f0
          [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
          [<c1456bc1>] sock_aio_write+0x161/0x170
          [<c1135a7c>] do_sync_write+0xac/0xf0
          [<c11360f6>] vfs_write+0x156/0x170
          [<c11361e2>] sys_write+0x42/0x70
          [<c155b0df>] sysenter_do_call+0x12/0x38
      [...]
      }
        -> (tipc_net_lock){+..-..} ops: 4 {
      [...]
          IN-SOFTIRQ-R at:
                           [<c108953a>] __lock_acquire+0x64a/0x13e0
                           [<c108a360>] lock_acquire+0x90/0x100
                           [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                           [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                           [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                           [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                           [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                           [<c13c43d2>] pcnet32_poll+0x292/0x780
                           [<c146b00a>] net_rx_action+0xfa/0x1e0
                           [<c103a4be>] __do_softirq+0xae/0x1e0
      [...]
      }
      
      >From the log, we can see three different call chains between
      CPU0 and CPU1:
      
      Time 0 on CPU0:
      
        kernel_init()->inet_init()->dev_add_pack()
      
      At time 0, the ptype_lock is held by CPU0 in dev_add_pack();
      
      Time 1 on CPU1:
      
        tipc_enable_bearer()->enable_bearer()->dev_add_pack()
      
      At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
      wants to take ptype_lock to register TIPC protocol handler into the
      networking stack.  But the ptype_lock has been taken by dev_add_pack()
      on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
      busy looping.
      
      Time 2 on CPU0:
      
        netif_receive_skb()->recv_msg()->tipc_recv_msg()
      
      At time 2, an incoming TIPC packet arrives at CPU0, hence
      tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
      to hold tipc_net_lock.  At the moment, below scenario happens:
      
      On CPU0, below is our sequence of taking locks:
      
        lock(ptype_lock)->lock(tipc_net_lock)
      
      On CPU1, our sequence of taking locks looks like:
      
        lock(tipc_net_lock)->lock(ptype_lock)
      
      Obviously deadlock may happen in this case.
      
      But please note the deadlock possibly doesn't occur at all when the
      first TIPC bearer is enabled.  Before enable_bearer() -- running on
      CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
      recv_msg()) is not registered successfully via dev_add_pack(), so
      the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
      message comes to CPU0. But when the second TIPC bearer is
      registered, the deadlock can perhaps really happen.
      
      To fix it, we will push the work of registering TIPC protocol
      handler into workqueue context. After the change, both paths taking
      ptype_lock are always in process contexts, thus, the deadlock should
      never occur.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4225a398