1. 13 9月, 2012 1 次提交
    • D
      KEYS: Add payload preparsing opportunity prior to key instantiate or update · d4f65b5d
      David Howells 提交于
      Give the key type the opportunity to preparse the payload prior to the
      instantiation and update routines being called.  This is done with the
      provision of two new key type operations:
      
      	int (*preparse)(struct key_preparsed_payload *prep);
      	void (*free_preparse)(struct key_preparsed_payload *prep);
      
      If the first operation is present, then it is called before key creation (in
      the add/update case) or before the key semaphore is taken (in the update and
      instantiate cases).  The second operation is called to clean up if the first
      was called.
      
      preparse() is given the opportunity to fill in the following structure:
      
      	struct key_preparsed_payload {
      		char		*description;
      		void		*type_data[2];
      		void		*payload;
      		const void	*data;
      		size_t		datalen;
      		size_t		quotalen;
      	};
      
      Before the preparser is called, the first three fields will have been cleared,
      the payload pointer and size will be stored in data and datalen and the default
      quota size from the key_type struct will be stored into quotalen.
      
      The preparser may parse the payload in any way it likes and may store data in
      the type_data[] and payload fields for use by the instantiate() and update()
      ops.
      
      The preparser may also propose a description for the key by attaching it as a
      string to the description field.  This can be used by passing a NULL or ""
      description to the add_key() system call or the key_create_or_update()
      function.  This cannot work with request_key() as that required the description
      to tell the upcall about the key to be created.
      
      This, for example permits keys that store PGP public keys to generate their own
      name from the user ID and public key fingerprint in the key.
      
      The instantiate() and update() operations are then modified to look like this:
      
      	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
      	int (*update)(struct key *key, struct key_preparsed_payload *prep);
      
      and the new payload data is passed in *prep, whether or not it was preparsed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d4f65b5d
  2. 06 9月, 2012 1 次提交
    • M
      Fix order of arguments to compat_put_time[spec|val] · ed6fe9d6
      Mikulas Patocka 提交于
      Commit 644595f8 ("compat: Handle COMPAT_USE_64BIT_TIME in
      net/socket.c") introduced a bug where the helper functions to take
      either a 64-bit or compat time[spec|val] got the arguments in the wrong
      order, passing the kernel stack pointer off as a user pointer (and vice
      versa).
      
      Because of the user address range check, that in turn then causes an
      EFAULT due to the user pointer range checking failing for the kernel
      address.  Incorrectly resuling in a failed system call for 32-bit
      processes with a 64-bit kernel.
      
      On odder architectures like HP-PA (with separate user/kernel address
      spaces), it can be used read kernel memory.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed6fe9d6
  3. 01 9月, 2012 2 次提交
  4. 31 8月, 2012 5 次提交
    • P
      netfilter: nf_conntrack: fix racy timer handling with reliable events · 5b423f6a
      Pablo Neira Ayuso 提交于
      Existing code assumes that del_timer returns true for alive conntrack
      entries. However, this is not true if reliable events are enabled.
      In that case, del_timer may return true for entries that were
      just inserted in the dying list. Note that packets / ctnetlink may
      hold references to conntrack entries that were just inserted to such
      list.
      
      This patch fixes the issue by adding an independent timer for
      event delivery. This increases the size of the ecache extension.
      Still we can revisit this later and use variable size extensions
      to allocate this area on demand.
      Tested-by: NOliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5b423f6a
    • E
      ipv4: must use rcu protection while calling fib_lookup · c5ae7d41
      Eric Dumazet 提交于
      Following lockdep splat was reported by Pavel Roskin :
      
      [ 1570.586223] ===============================
      [ 1570.586225] [ INFO: suspicious RCU usage. ]
      [ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
      [ 1570.586229] -------------------------------
      [ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
      [ 1570.586233]
      [ 1570.586233] other info that might help us debug this:
      [ 1570.586233]
      [ 1570.586236]
      [ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
      [ 1570.586238] 2 locks held by Chrome_IOThread/4467:
      [ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
      [ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
      [ 1570.586260]
      [ 1570.586260] stack backtrace:
      [ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
      [ 1570.586265] Call Trace:
      [ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
      [ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
      [ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
      [ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
      [ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
      [ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
      [ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
      [ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
      [ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
      [ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
      [ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
      [ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
      [ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
      [ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
      [ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
      [ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
      [ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
      [ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
      [ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NPavel Roskin <proski@gnu.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ae7d41
    • F
      net: ipv4: ipmr_expire_timer causes crash when removing net namespace · acbb219d
      Francesco Ruggeri 提交于
      When tearing down a net namespace, ipv4 mr_table structures are freed
      without first deactivating their timers. This can result in a crash in
      run_timer_softirq.
      This patch mimics the corresponding behaviour in ipv6.
      Locking and synchronization seem to be adequate.
      We are about to kfree mrt, so existing code should already make sure that
      no other references to mrt are pending or can be created by incoming traffic.
      The functions invoked here do not cause new references to mrt or other
      race conditions to be created.
      Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
      Both ipmr_expire_process (whose completion we may have to wait in
      del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
      or other synchronizations when needed, and they both only modify mrt.
      
      Tested in Linux 3.4.8.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acbb219d
    • X
      l2tp: avoid to use synchronize_rcu in tunnel free function · 99469c32
      xeb@mail.ru 提交于
      Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
      atomic.
      Signed-off-by: NDmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99469c32
    • P
      netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation · 3f509c68
      Pablo Neira Ayuso 提交于
      We're hitting bug while trying to reinsert an already existing
      expectation:
      
      kernel BUG at kernel/timer.c:895!
      invalid opcode: 0000 [#1] SMP
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
       [<ffffffff812d423a>] ? in4_pton+0x72/0x131
       [<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
       [<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
       [<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
       [<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
       [<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]
      
      We have to remove the RTP expectation if the RTCP expectation hits EBUSY
      since we keep trying with other ports until we succeed.
      Reported-by: NRafal Fitt <rafalf@aplusc.com.pl>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3f509c68
  5. 30 8月, 2012 4 次提交
  6. 25 8月, 2012 2 次提交
    • Y
      tcp: fix cwnd reduction for non-sack recovery · 7c4a56fe
      Yuchung Cheng 提交于
      The cwnd reduction in fast recovery is based on the number of packets
      newly delivered per ACK. For non-sack connections every DUPACK
      signifies a packet has been delivered, but the sender mistakenly
      skips counting them for cwnd reduction.
      
      The fix is to compute newly_acked_sacked after DUPACKs are accounted
      in sacked_out for non-sack connections.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c4a56fe
    • P
      netlink: fix possible spoofing from non-root processes · 20e1db19
      Pablo Neira Ayuso 提交于
      Non-root user-space processes can send Netlink messages to other
      processes that are well-known for being subscribed to Netlink
      asynchronous notifications. This allows ilegitimate non-root
      process to send forged messages to Netlink subscribers.
      
      The userspace process usually verifies the legitimate origin in
      two ways:
      
      a) Socket credentials. If UID != 0, then the message comes from
         some ilegitimate process and the message needs to be dropped.
      
      b) Netlink portID. In general, portID == 0 means that the origin
         of the messages comes from the kernel. Thus, discarding any
         message not coming from the kernel.
      
      However, ctnetlink sets the portID in event messages that has
      been triggered by some user-space process, eg. conntrack utility.
      So other processes subscribed to ctnetlink events, eg. conntrackd,
      know that the event was triggered by some user-space action.
      
      Neither of the two ways to discard ilegitimate messages coming
      from non-root processes can help for ctnetlink.
      
      This patch adds capability validation in case that dst_pid is set
      in netlink_sendmsg(). This approach is aggressive since existing
      applications using any Netlink bus to deliver messages between
      two user-space processes will break. Note that the exception is
      NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
      userspace communication.
      
      Still, if anyone wants that his Netlink bus allows netlink-to-netlink
      userspace, then they can set NL_NONROOT_SEND. However, by default,
      I don't think it makes sense to allow to use NETLINK_ROUTE to
      communicate two processes that are sending no matter what information
      that is not related to link/neighbouring/routing. They should be using
      NETLINK_USERSOCK instead for that.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20e1db19
  7. 24 8月, 2012 2 次提交
  8. 23 8月, 2012 1 次提交
    • E
      ipv4: properly update pmtu · 9b04f350
      Eric Dumazet 提交于
      Sylvain Munault reported following info :
      
       - TCP connection get "stuck" with data in send queue when doing
         "large" transfers ( like typing 'ps ax' on a ssh connection )
       - Only happens on path where the PMTU is lower than the MTU of
         the interface
       - Is not present right after boot, it only appears 10-20min after
         boot or so. (and that's inside the _same_ TCP connection, it works
         fine at first and then in the same ssh session, it'll get stuck)
       - Definitely seems related to fragments somehow since I see a router
         sending ICMP message saying fragmentation is needed.
       - Exact same setup works fine with kernel 3.5.1
      
      Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
      period is over.
      
      ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
      but dst_set_expires() does nothing because dst.expires is already set.
      
      It seems we want to set the expires field to a new value, regardless
      of prior one.
      
      With help from Julian Anastasov.
      Reported-by: NSylvain Munaut <s.munaut@whatever-company.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      CC: Julian Anastasov <ja@ssi.bg>
      Tested-by: NSylvain Munaut <s.munaut@whatever-company.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b04f350
  9. 22 8月, 2012 6 次提交
    • T
      mac80211: fix DS to MBSS address translation · 27f01124
      Thomas Pedersen 提交于
      The destination address of unicast frames forwarded through a mesh gate
      was being replaced with the broadcast address. Instead leave the
      original destination address as the mesh DA. If the nexthop address is
      not in the mpath table it will be resolved. If that fails, the frame
      will be forwarded to known mesh gates.
      Reported-by: NCedric Voncken <cedric.voncken@acksys.fr>
      Signed-off-by: NThomas Pedersen <thomas@cozybit.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      27f01124
    • J
      libceph: avoid truncation due to racing banners · 6d4221b5
      Jim Schutt 提交于
      Because the Ceph client messenger uses a non-blocking connect, it is
      possible for the sending of the client banner to race with the
      arrival of the banner sent by the peer.
      
      When ceph_sock_state_change() notices the connect has completed, it
      schedules work to process the socket via con_work().  During this
      time the peer is writing its banner, and arrival of the peer banner
      races with con_work().
      
      If con_work() calls try_read() before the peer banner arrives, there
      is nothing for it to do, after which con_work() calls try_write() to
      send the client's banner.  In this case Ceph's protocol negotiation
      can complete succesfully.
      
      The server-side messenger immediately sends its banner and addresses
      after accepting a connect request, *before* actually attempting to
      read or verify the banner from the client.  As a result, it is
      possible for the banner from the server to arrive before con_work()
      calls try_read().  If that happens, try_read() will read the banner
      and prepare protocol negotiation info via prepare_write_connect().
      prepare_write_connect() calls con_out_kvec_reset(), which discards
      the as-yet-unsent client banner.  Next, con_work() calls
      try_write(), which sends the protocol negotiation info rather than
      the banner that the peer is expecting.
      
      The result is that the peer sees an invalid banner, and the client
      reports "negotiation failed".
      
      Fix this by moving con_out_kvec_reset() out of
      prepare_write_connect() to its callers at all locations except the
      one where the banner might still need to be sent.
      
      [elder@inktak.com: added note about server-side behavior]
      Signed-off-by: NJim Schutt <jaschut@sandia.gov>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      6d4221b5
    • E
      af_netlink: force credentials passing [CVE-2012-3520] · e0e3cea4
      Eric Dumazet 提交于
      Pablo Neira Ayuso discovered that avahi and
      potentially NetworkManager accept spoofed Netlink messages because of a
      kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
      to the receiver if the sender did not provide such data, instead of not
      including any such data at all or including the correct data from the
      peer (as it is the case with AF_UNIX).
      
      This bug was introduced in commit 16e57262
      (af_unix: dont send SCM_CREDENTIALS by default)
      
      This patch forces passing credentials for netlink, as
      before the regression.
      
      Another fix would be to not add SCM_CREDENTIALS in
      netlink messages if not provided by the sender, but it
      might break some programs.
      
      With help from Florian Weimer & Petr Matousek
      
      This issue is designated as CVE-2012-3520
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e3cea4
    • E
      ipv4: fix ip header ident selection in __ip_make_skb() · a9915a1b
      Eric Dumazet 提交于
      Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
      memory in __ip_select_ident().
      
      It turns out that __ip_make_skb() called ip_select_ident() before
      properly initializing iph->daddr.
      
      This is a bug uncovered by commit 1d861aa4 (inet: Minimize use of
      cached route inetpeer.)
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131Reported-by: NChristian Casteyde <casteyde.christian@free.fr>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9915a1b
    • C
      ipv4: Use newinet->inet_opt in inet_csk_route_child_sock() · 1a7b27c9
      Christoph Paasch 提交于
      Since 0e734419 ("ipv4: Use inet_csk_route_child_sock() in DCCP and
      TCP."), inet_csk_route_child_sock() is called instead of
      inet_csk_route_req().
      
      However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
      ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
      Thus, inside inet_csk_route_child_sock() opt is always NULL and the
      SRR-options are not respected anymore.
      Packets sent by the server won't have the correct destination-IP.
      
      This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
      inside inet_csk_route_child_sock().
      Reported-by: NLuca Boccassi <luca.boccassi@gmail.com>
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a7b27c9
    • E
      tcp: fix possible socket refcount problem · 144d56e9
      Eric Dumazet 提交于
      Commit 6f458dfb (tcp: improve latencies of timer triggered events)
      added bug leading to following trace :
      
      [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
      [ 2866.131726]
      [ 2866.132188] =========================
      [ 2866.132281] [ BUG: held lock freed! ]
      [ 2866.132281] 3.6.0-rc1+ #622 Not tainted
      [ 2866.132281] -------------------------
      [ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
      [ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
      [ 2866.132281] 4 locks held by kworker/0:1/652:
      [ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
      [ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
      [ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
      [ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
      [ 2866.132281]
      [ 2866.132281] stack backtrace:
      [ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
      [ 2866.132281] Call Trace:
      [ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
      [ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
      [ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
      [ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
      [ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
      [ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
      [ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
      [ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
      [ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
      [ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
      [ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
      [ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
      [ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
      [ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
      [ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
      [ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
      [ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
      [ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
      [ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
      [ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
      [ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
      [ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
      [ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
      [ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
      [ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
      [ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
      [ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
      [ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
      [ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
      [ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
      [ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
      [ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
      [ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
      [ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
      [ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
      [ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
      [ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
      [ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
      [ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
      [ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
      [ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
      [ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
      [ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
      [ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
      [ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
      [ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
      [ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
      [ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
      [ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
      [ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
      [ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
      [ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
      [ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
      [ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
      [ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
      [ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
      [ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
      [ 2866.309689] =============================================================================
      [ 2866.310254] BUG TCP (Not tainted): Object already free
      [ 2866.310254] -----------------------------------------------------------------------------
      [ 2866.310254]
      
      The bug comes from the fact that timer set in sk_reset_timer() can run
      before we actually do the sock_hold(). socket refcount reaches zero and
      we free the socket too soon.
      
      timer handler is not allowed to reduce socket refcnt if socket is owned
      by the user, or we need to change sk_reset_timer() implementation.
      
      We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
      or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags
      
      Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
      was used instead of TCP_DELACK_TIMER_DEFERRED.
      
      For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
      even if not fired from a timer.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      144d56e9
  10. 21 8月, 2012 4 次提交
    • J
      svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping · d10f27a7
      J. Bruce Fields 提交于
      The rpc server tries to ensure that there will be room to send a reply
      before it receives a request.
      
      It does this by tracking, in xpt_reserved, an upper bound on the total
      size of the replies that is has already committed to for the socket.
      
      Currently it is adding in the estimate for a new reply *before* it
      checks whether there is space available.  If it finds that there is not
      space, it then subtracts the estimate back out.
      
      This may lead the subsequent svc_xprt_enqueue to decide that there is
      space after all.
      
      The results is a svc_recv() that will repeatedly return -EAGAIN, causing
      server threads to loop without doing any actual work.
      
      Cc: stable@vger.kernel.org
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d10f27a7
    • J
      svcrpc: sends on closed socket should stop immediately · f06f00a2
      J. Bruce Fields 提交于
      svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
      However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
      threads could send further replies before the socket is really shut
      down.  This can manifest as data corruption: for example, if a truncated
      read reply is followed by another rpc reply, that second reply will look
      to the client like further read data.
      
      Symptoms were data corruption preceded by svc_tcp_sendto logging
      something like
      
      	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket
      
      Cc: stable@vger.kernel.org
      Reported-by: NMalahal Naineni <malahal@us.ibm.com>
      Tested-by: NMalahal Naineni <malahal@us.ibm.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f06f00a2
    • J
      svcrpc: fix BUG() in svc_tcp_clear_pages · be1e4444
      J. Bruce Fields 提交于
      Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
      consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
      sk_tcplen would lead us to expect n pages of data).
      
      svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
      This is OK, since both functions are serialized by XPT_BUSY.  However,
      that means the inconsistency must be repaired before dropping XPT_BUSY.
      
      Therefore we should be ensuring that svc_tcp_save_pages repairs the
      problem before exiting svc_tcp_recv_record on error.
      
      Symptoms were a BUG() in svc_tcp_clear_pages.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      be1e4444
    • S
      libceph: delay debugfs initialization until we learn global_id · d1c338a5
      Sage Weil 提交于
      The debugfs directory includes the cluster fsid and our unique global_id.
      We need to delay the initialization of the debug entry until we have
      learned both the fsid and our global_id from the monitor or else the
      second client can't create its debugfs entry and will fail (and multiple
      client instances aren't properly reflected in debugfs).
      
      Reported by: Yan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
      d1c338a5
  11. 20 8月, 2012 6 次提交
    • P
      netfilter: nfnetlink_log: fix NLA_PUT macro removal bug · 2dba62c3
      Patrick McHardy 提交于
      Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
      converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:
      
      -               NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
      +               if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2dba62c3
    • N
      net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() · fae6ef87
      Neal Cardwell 提交于
      This commit removes the sk_rx_dst_set calls from
      tcp_create_openreq_child(), because at that point the icsk_af_ops
      field of ipv6_mapped TCP sockets has not been set to its proper final
      value.
      
      Instead, to make sure we get the right sk_rx_dst_set variant
      appropriate for the address family of the new connection, we have
      tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
      shortly after the call to tcp_create_openreq_child() returns.
      
      This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
      with the new approach.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Reported-by: NArtem Savkov <artem.savkov@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fae6ef87
    • R
      net/core/dev.c: fix kernel-doc warning · 3de7a37b
      Randy Dunlap 提交于
      Fix kernel-doc warning:
      
      Warning(net/core/dev.c:5745): No description found for parameter 'dev'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc:	"David S. Miller" <davem@davemloft.net>
      Cc:	netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3de7a37b
    • P
      net: ipv6: fix oops in inet_putpeer() · 9d7b0fc1
      Patrick McHardy 提交于
      Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
      a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
      initialized, causing a false positive result from inetpeer_ptr_is_peer(),
      which in turn causes a NULL pointer dereference in inet_putpeer().
      
      Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
      EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
      EIP is at inet_putpeer+0xe/0x16
      EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
      ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
       DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
      CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
      DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      DR6: ffff0ff0 DR7: 00000400
       f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
       f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
       f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
       [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
       [<c038d5f7>] dst_destroy+0x1d/0xa4
       [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
       [<c0396870>] flow_cache_gc_task+0x85/0x9f
       [<c0142d2b>] process_one_work+0x122/0x441
       [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
       [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
       [<c0143e2d>] worker_thread+0x113/0x3cc
      
      Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
      properly initialize the dst's peer pointer.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d7b0fc1
    • J
      caif: Do not dereference NULL in chnl_recv_cb() · d92c7f8a
      Jesper Juhl 提交于
      In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
      which may return NULL, but we do not check for a NULL pointer before
      dereferencing it.
      This patch adds such a NULL check and properly free's allocated memory
      and return an error (-EINVAL) on failure - much better than crashing..
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Acked-by: NSjur Brændeland <sjur.brandeland@stericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d92c7f8a
    • E
      af_packet: don't emit packet on orig fanout group · c0de08d0
      Eric Leblond 提交于
      If a packet is emitted on one socket in one group of fanout sockets,
      it is transmitted again. It is thus read again on one of the sockets
      of the fanout group. This result in a loop for software which
      generate packets when receiving one.
      This retransmission is not the intended behavior: a fanout group
      must behave like a single socket. The packet should not be
      transmitted on a socket if it originates from a socket belonging
      to the same fanout group.
      
      This patch fixes the issue by changing the transmission check to
      take fanout group info account.
      Reported-by: NAleksandr Kotov <a1k@mail.ru>
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0de08d0
  12. 17 8月, 2012 4 次提交
  13. 16 8月, 2012 2 次提交
    • P
      netfilter: nf_ct_expect: fix possible access to uninitialized timer · 2614f864
      Pablo Neira Ayuso 提交于
      In __nf_ct_expect_check, the function refresh_timer returns 1
      if a matching expectation is found and its timer is successfully
      refreshed. This results in nf_ct_expect_related returning 0.
      Note that at this point:
      
      - the passed expectation is not inserted in the expectation table
        and its timer was not initialized, since we have refreshed one
        matching/existing expectation.
      
      - nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
        timer is in some undefined state just after the allocation,
        until it is appropriately initialized.
      
      This can be a problem for the SIP helper during the expectation
      addition:
      
       ...
       if (nf_ct_expect_related(rtp_exp) == 0) {
               if (nf_ct_expect_related(rtcp_exp) != 0)
                       nf_ct_unexpect_related(rtp_exp);
       ...
      
      Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
      case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
      returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:
      
       spin_lock_bh(&nf_conntrack_lock);
       if (del_timer(&exp->timeout)) {
               nf_ct_unlink_expect(exp);
               nf_ct_expect_put(exp);
       }
       spin_unlock_bh(&nf_conntrack_lock);
      
      Note that del_timer always returns false if the timer has been
      initialized.  However, the timer was not initialized since setup_timer
      was not called, therefore, the expectation timer remains in some
      undefined state. If I'm not missing anything, this may lead to the
      removal an unexistent expectation.
      
      To fix this, the optimization that allows refreshing an expectation
      is removed. Now nf_conntrack_expect_related looks more consistent
      to me since it always add the expectation in case that it returns
      success.
      
      Thanks to Patrick McHardy for participating in the discussion of
      this patch.
      
      I think this may be the source of the problem described by:
      http://marc.info/?l=netfilter-devel&m=134073514719421&w=2Reported-by: NRafal Fitt <rafalf@aplusc.com.pl>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2614f864
    • M
      net: fix info leak in compat dev_ifconf() · 43da5f2e
      Mathias Krause 提交于
      The implementation of dev_ifconf() for the compat ioctl interface uses
      an intermediate ifc structure allocated in userland for the duration of
      the syscall. Though, it fails to initialize the padding bytes inserted
      for alignment and that for leaks four bytes of kernel stack. Add an
      explicit memset(0) before filling the structure to avoid the info leak.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43da5f2e