1. 17 5月, 2013 3 次提交
  2. 06 5月, 2013 1 次提交
    • K
      net: frag, fix race conditions in LRU list maintenance · b56141ab
      Konstantin Khlebnikov 提交于
      This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
      which was introduced in commit 3ef0eb0d
      ("net: frag, move LRU list maintenance outside of rwlock")
      
      One cpu already added new fragment queue into hash but not into LRU.
      Other cpu found it in hash and tries to move it to the end of LRU.
      This leads to NULL pointer dereference inside of list_move_tail().
      
      Another possible race condition is between inet_frag_lru_move() and
      inet_frag_lru_del(): move can happens after deletion.
      
      This patch initializes LRU list head before adding fragment into hash and
      inet_frag_lru_move() doesn't touches it if it's empty.
      
      I saw this kernel oops two times in a couple of days.
      
      [119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
      [119482.140221] Oops: 0000 [#1] SMP
      [119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
      [119482.152692] CPU 3
      [119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
      [119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
      [119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
      [119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
      [119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
      [119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
      [119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
      [119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
      [119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
      [119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
      [119482.223113] Stack:
      [119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
      [119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
      [119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
      [119482.243090] Call Trace:
      [119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
      [119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
      [119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
      [119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
      [119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
      [119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
      [119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
      [119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
      [119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
      [119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
      [119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
      [119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
      [119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
      [119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
      [119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
      [119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
      [119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
      [119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
      [119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
      [119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
      [119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.348675]  RSP <ffff880216d5db58>
      [119482.353493] CR2: 0000000000000000
      
      Oops happened on this path:
      ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b56141ab
  3. 02 5月, 2013 1 次提交
    • E
      af_unix: fix a fatal race with bit fields · 60bc851a
      Eric Dumazet 提交于
      Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
      uses 64bit instructions to manipulate them.
      If the 64bit word includes any atomic_t or spinlock_t, we can lose
      critical concurrent changes.
      
      This is happening in af_unix, where unix_sk(sk)->gc_candidate/
      gc_maybe_cycle/lock share the same 64bit word.
      
      This leads to fatal deadlock, as one/several cpus spin forever
      on a spinlock that will never be available again.
      
      A safer way would be to use a long to store flags.
      This way we are sure compiler/arch wont do bad things.
      
      As we own unix_gc_lock spinlock when clearing or setting bits,
      we can use the non atomic __set_bit()/__clear_bit().
      
      recursion_level can share the same 64bit location with the spinlock,
      as it is set only with this spinlock held.
      
      [1] bug fixed in gcc-4.8.0 :
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: NAmbrose Feinstein <ambrose@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60bc851a
  4. 30 4月, 2013 5 次提交
    • D
      hostap: Don't use create_proc_read_entry() · 6bbefe86
      David Howells 提交于
      Don't use create_proc_read_entry() as that is deprecated, but rather use
      proc_create_data() and seq_file instead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc: Jouni Malinen <j@w1.fi>
      cc: John W. Linville <linville@tuxdriver.com>
      cc: Johannes Berg <johannes@sipsolutions.net>
      cc: linux-wireless@vger.kernel.org
      cc: netdev@vger.kernel.org
      cc: devel@driverdev.osuosl.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6bbefe86
    • E
      net: defer net_secret[] initialization · aebda156
      Eric Dumazet 提交于
      Instead of feeding net_secret[] at boot time, defer the init
      at the point first socket is created.
      
      This permits some platforms to use better entropy sources than
      the ones available at boot time.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aebda156
    • S
      sctp: Correct type and usage of sctp_end_cksum() · eee1d5a1
      Simon Horman 提交于
      Change the type of the crc32 parameter of sctp_end_cksum()
      from __be32 to __u32 to reflect that fact that it is passed
      to cpu_to_le32().
      
      There are five in-tree users of sctp_end_cksum().
      The following four had warnings flagged by sparse which are
      no longer present with this change.
      
      net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum()
      net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check()
      net/sctp/input.c:sctp_rcv_checksum()
      net/sctp/output.c:sctp_packet_transmit()
      
      The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt().
      It has been updated to pass a __u32 instead of a __be32,
      the value in question was already calculated in cpu byte-order.
      
      net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also
      been updated to assign the return value of sctp_end_cksum()
      directly to a variable of type __le32, matching the
      type of the return value. Previously the return value
      was assigned to a variable of type __be32 and then that variable
      was finally assigned to another variable of type __le32.
      
      Problems flagged by sparse.
      Compile and sparse tested only.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eee1d5a1
    • F
      netfilter: move skb_gso_segment into nfnetlink_queue module · a5fedd43
      Florian Westphal 提交于
      skb_gso_segment is expensive, so it would be nice if we could
      avoid it in the future. However, userspace needs to be prepared
      to receive larger-than-mtu-packets (which will also have incorrect
      l3/l4 checksums), so we cannot simply remove it.
      
      The plan is to add a per-queue feature flag that userspace can
      set when binding the queue.
      
      The problem is that in nf_queue, we only have a queue number,
      not the queue context/configuration settings.
      
      This patch should have no impact other than the skb_gso_segment
      call now being in a function that has access to the queue config
      data.
      
      A new size attribute in nf_queue_entry is needed so
      nfnetlink_queue can duplicate the entry of the gso skb
      when segmenting the skb while also copying the route key.
      
      The follow up patch adds switch to disable skb_gso_segment when
      queue config says so.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a5fedd43
    • J
      net: increase frag hash size · a4c4009f
      Jesper Dangaard Brouer 提交于
      Increase fragmentation hash bucket size to 1024 from old 64 elems.
      
      After we increased the frag mem limits commit c2a93660 (net: increase
      fragment memory usage limits) the hash size of 64 elements is simply
      too small.  Also considering the mem limit is per netns and the hash
      table is shared for all netns.
      
      For the embedded people, note that this increase will change the hash
      table/array from using approx 1 Kbytes to 16 Kbytes.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4c4009f
  5. 25 4月, 2013 2 次提交
  6. 24 4月, 2013 1 次提交
  7. 23 4月, 2013 3 次提交
  8. 22 4月, 2013 3 次提交
  9. 21 4月, 2013 1 次提交
  10. 20 4月, 2013 2 次提交
  11. 19 4月, 2013 1 次提交
  12. 18 4月, 2013 12 次提交
  13. 17 4月, 2013 5 次提交
    • D
      Bluetooth: l2cap: add l2cap_user sub-modules · 2c8e1411
      David Herrmann 提交于
      Several sub-modules like HIDP, rfcomm, ... need to track l2cap
      connections. The l2cap_conn->hcon->dev object is used as parent for sysfs
      devices so the sub-modules need to be notified when the hci_conn object is
      removed from sysfs.
      
      As submodules normally use the l2cap layer, the l2cap_user objects are
      registered there instead of on the underlying hci_conn object. This avoids
      any direct dependency on the HCI layer and lets the l2cap core handle any
      specifics.
      
      This patch introduces l2cap_user objects which contain a "probe" and
      "remove" callback. You can register them on any l2cap_conn object and if
      it is active, the "probe" callback will get called. Otherwise, an error is
      returned.
      
      The l2cap_conn object will call your "remove" callback directly before it
      is removed from user-space. This allows you to remove your submodules
      _before_ the parent l2cap_conn and hci_conn object is removed.
      
      At any time you can asynchronously unregister your l2cap_user object if
      your submodule vanishes before the l2cap_conn object does.
      
      There is no way around l2cap_user. If we want wire-protocols in the
      kernel, we always want the hci_conn object as parent in the sysfs tree. We
      cannot use a channel here since we might need multiple channels for a
      single protocol.
      But the problem is, we _must_ get notified when an l2cap_conn object is
      removed. We cannot use reference-counting for object-removal! This is not
      how it works. If a hardware is removed, we should immediately remove the
      object from sysfs. Any other behavior would be inconsistent with the rest
      of the system. Also note that device_del() might sleep, but it doesn't
      wait for user-space or block very long. It only _unlinks_ the object from
      sysfs and the whole device-tree. Everything else is handled by ref-counts!
      This is exactly what the other sub-modules must do: unlink their devices
      when the "remove" l2cap_user callback is called. They should not do any
      cleanup or synchronous shutdowns.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      2c8e1411
    • D
      Bluetooth: l2cap: introduce l2cap_conn ref-counting · 9c903e37
      David Herrmann 提交于
      If we want to use l2cap_conn outside of l2cap_core.c, we need refcounting
      for these objects. Otherwise, we cannot synchronize l2cap locks with
      outside locks and end up with deadlocks.
      
      Hence, introduce ref-counting for l2cap_conn objects. This doesn't affect
      l2cap internals at all, as they use a direct synchronization.
      We also keep a reference to the parent hci_conn for locking purposes as
      l2cap_conn depends on this. This doesn't affect the connection itself but
      only the lifetime of the (dead) object.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      9c903e37
    • D
      Bluetooth: allow constant arguments for bacmp()/bacpy() · f53c20e9
      David Herrmann 提交于
      There is no reason to require the source arguments to be writeable so fix
      this to allow constant source addresses.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      f53c20e9
    • D
      Bluetooth: introduce hci_conn ref-counting · 8d12356f
      David Herrmann 提交于
      We currently do not allow using hci_conn from outside of HCI-core.
      However, several other users could make great use of it. This includes
      HIDP, rfcomm and all other sub-protocols that rely on an active
      connection.
      
      Hence, we now introduce hci_conn ref-counting. We currently never call
      get_device(). put_device() is exclusively used in hci_conn_del_sysfs().
      Hence, we currently never have a greater device-refcnt than 1.
      Therefore, it is safe to move the put_device() call from
      hci_conn_del_sysfs() to hci_conn_del() (it's the only caller). In fact,
      this even fixes a "use-after-free" bug as we access hci_conn after calling
      hci_conn_del_sysfs() in hci_conn_del().
      
      From now on we can add references to hci_conn objects in other layers
      (like l2cap_sock, HIDP, rfcomm, ...) and grab a reference via
      hci_conn_get(). This does _not_ guarantee, that the connection is still
      alive. But, this isn't what we want. We can simply lock the hci_conn
      device and use "device_is_registered(hci_conn->dev)" to test that.
      However, this is hardly necessary as outside users should never rely on
      the HCI connection to be alive, anyway. Instead, they should solely rely
      on the device-object to be available.
      But if sub-devices want the hci_conn object as sysfs parent, they need to
      be notified when the connection drops. This will be introduced in later
      patches with l2cap_users.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      8d12356f
    • D
      Bluetooth: remove unneeded hci_conn_hold/put_device() · fc225c3f
      David Herrmann 提交于
      hci_conn_hold/put_device() is used to control when hci_conn->dev is no
      longer needed and can be deleted from the system. Lets first look how they
      are currently used throughout the code (excluding HIDP!).
      
      All code that uses hci_conn_hold_device() looks like this:
          ...
          hci_conn_hold_device();
          hci_conn_add_sysfs();
          ...
      On the other side, hci_conn_put_device() is exclusively used in
      hci_conn_del().
      
      So, considering that hci_conn_del() must not be called twice (which would
      fail horribly), we know that hci_conn_put_device() is only called _once_
      (which is in hci_conn_del()).
      On the other hand, hci_conn_add_sysfs() must not be called twice, either
      (it would call device_add twice, which breaks the device, see
      drivers/base/core.c). So we know that hci_conn_hold_device() is also
      called only once (it's only called directly before hci_conn_add_sysfs()).
      
      So hold and put are known to be called only once. That means we can safely
      remove them and directly call hci_conn_del_sysfs() in hci_conn_del().
      
      But there is one issue left: HIDP also uses hci_conn_hold/put_device().
      However, this case can be ignored and simply removed as it is totally
      broken. The issue is, the only thing HIDP delays with
      hci_conn_hold_device() is the removal of the hci_conn->dev from sysfs.
      But, the hci_conn device has no mechanism to get notified when its own
      parent (hci_dev) gets removed from sysfs. hci_dev_hold/put() does _not_
      control when it is removed but only when the device object is created
      and destroyed.
      And hci_dev calls hci_conn_flush_*() when it removes itself from sysfs,
      which itself causes hci_conn_del() to be called, but it does _not_ cause
      hci_conn_del_sysfs() to be called, which is wrong.
      
      Hence, we fix it to call hci_conn_del_sysfs() in hci_conn_del(). This
      guarantees that a hci_conn object is removed from sysfs _before_ its
      parent hci_dev is removed.
      
      The changes to HIDP look scary, wrong and broken. However, if you look at
      the HIDP session management, you will notice they're already broken in the
      exact _same_ way (ever tried "unplugging" HIDP devices? Breaks _all_ the
      time).
      So this patch only makes HIDP look _scary_ and _obviously broken_. It does
      not break HIDP itself, it already is!
      
      See later patches in this series which fix HIDP to use proper
      session-management.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      fc225c3f