1. 25 10月, 2010 1 次提交
    • E
      netlink: fix netlink_change_ngroups() · 5c398dc8
      Eric Dumazet 提交于
      commit 6c04bb18 (netlink: use call_rcu for netlink_change_ngroups)
      used a somewhat convoluted and racy way to perform call_rcu().
      
      The old block of memory is freed after a grace period, but the rcu_head
      used to track it is located in new block.
      
      This can clash if we call two times or more netlink_change_ngroups(),
      and a block is freed before another. call_rcu() called on different cpus
      makes no guarantee in order of callbacks.
      
      Fix this using a more standard way of handling this : Each block of
      memory contains its own rcu_head, so that no 'use after free' can
      happens.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Johannes Berg <johannes@sipsolutions.net>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c398dc8
  2. 01 9月, 2010 1 次提交
  3. 19 8月, 2010 1 次提交
    • J
      netlink: fix compat recvmsg · 68d6ac6d
      Johannes Berg 提交于
      Since
      commit 1dacc76d
      Author: Johannes Berg <johannes@sipsolutions.net>
      Date:   Wed Jul 1 11:26:02 2009 +0000
      
          net/compat/wext: send different messages to compat tasks
      
      we had a race condition when setting and then
      restoring frag_list. Eric attempted to fix it,
      but the fix created even worse problems.
      
      However, the original motivation I had when I
      added the code that turned out to be racy is
      no longer clear to me, since we only copy up
      to skb->len to userspace, which doesn't include
      the frag_list length. As a result, not doing
      any frag_list clearing and restoring avoids
      the race condition, while not introducing any
      other problems.
      
      Additionally, while preparing this patch I found
      that since none of the remaining netlink code is
      really aware of the frag_list, we need to use the
      original skb's information for packet information
      and credentials. This fixes, for example, the
      group information received by compat tasks.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert 1235f504]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68d6ac6d
  4. 16 8月, 2010 1 次提交
  5. 27 7月, 2010 1 次提交
  6. 21 7月, 2010 1 次提交
    • N
      drop_monitor: convert some kfree_skb call sites to consume_skb · 70d4bf6d
      Neil Horman 提交于
      Convert a few calls from kfree_skb to consume_skb
      
      Noticed while I was working on dropwatch that I was detecting lots of internal
      skb drops in several places.  While some are legitimate, several were not,
      freeing skbs that were at the end of their life, rather than being discarded due
      to an error.  This patch converts those calls sites from using kfree_skb to
      consume_skb, which quiets the in-kernel drop_monitor code from detecting them as
      drops.  Tested successfully by myself
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70d4bf6d
  7. 17 6月, 2010 1 次提交
  8. 22 5月, 2010 1 次提交
  9. 02 4月, 2010 1 次提交
  10. 27 3月, 2010 1 次提交
  11. 21 3月, 2010 1 次提交
  12. 28 2月, 2010 1 次提交
  13. 04 2月, 2010 1 次提交
    • A
      netlink: fix for too early rmmod · 974c37e9
      Alexey Dobriyan 提交于
      Netlink code does module autoload if protocol userspace is asking for is
      not ready. However, module can dissapear right after it was autoloaded.
      Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM.
      
      netlink_create() in such situation _will_ create userspace socket and
      _will_not_ pin module. Now if module was removed and we're going to call
      ->netlink_rcv into nothing:
      
      BUG: unable to handle kernel paging request at ffffffffa02f842a
      					       ^^^^^^^^^^^^^^^^
      	modules are loaded near these addresses here
      
      IP: [<ffffffffa02f842a>] 0xffffffffa02f842a
      PGD 161f067 PUD 1623063 PMD baa12067 PTE 0
      Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
      CPU 1
      Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E
      RIP: 0010:[<ffffffffa02f842a>]  [<ffffffffa02f842a>] 0xffffffffa02f842a
      RSP: 0018:ffff8800baa3db48  EFLAGS: 00010292
      RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000
      RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001
      RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011
      R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010
      FS:  00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30)
      Stack:
       ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d
      <0> 0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000
      <0> ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000
      Call Trace:
       [<ffffffff813637c0>] ? netlink_unicast+0x100/0x2d0
       [<ffffffff8136397d>] netlink_unicast+0x2bd/0x2d0
      
      	netlink_unicast_kernel:
      		nlk->netlink_rcv(skb);
      
       [<ffffffff81344adc>] ? memcpy_fromiovec+0x6c/0x90
       [<ffffffff81364263>] netlink_sendmsg+0x1d3/0x2d0
       [<ffffffff8133975b>] sock_sendmsg+0xbb/0xf0
       [<ffffffff8106cdeb>] ? __lock_acquire+0x27b/0xa60
       [<ffffffff810a18c3>] ? might_fault+0x73/0xd0
       [<ffffffff810a18c3>] ? might_fault+0x73/0xd0
       [<ffffffff8106db22>] ? __lock_release+0x82/0x170
       [<ffffffff810a190e>] ? might_fault+0xbe/0xd0
       [<ffffffff810a18c3>] ? might_fault+0x73/0xd0
       [<ffffffff81344c77>] ? verify_iovec+0x47/0xd0
       [<ffffffff8133a509>] sys_sendmsg+0x1a9/0x360
       [<ffffffff813c2be5>] ? _raw_spin_unlock_irqrestore+0x65/0x70
       [<ffffffff8106aced>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff813c2bc2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
       [<ffffffff81197004>] ? __up_read+0x84/0xb0
       [<ffffffff8106ac95>] ? trace_hardirqs_on_caller+0x145/0x190
       [<ffffffff813c207f>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8100262b>] system_call_fastpath+0x16/0x1b
      Code:  Bad RIP value.
      RIP  [<ffffffffa02f842a>] 0xffffffffa02f842a
       RSP <ffff8800baa3db48>
      CR2: ffffffffa02f842a
      
      If module was quickly removed after autoloading, return -E.
      
      Return -EPROTONOSUPPORT if module was quickly removed after autoloading.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      974c37e9
  14. 26 11月, 2009 1 次提交
  15. 17 11月, 2009 1 次提交
  16. 11 11月, 2009 1 次提交
  17. 06 11月, 2009 1 次提交
  18. 07 10月, 2009 1 次提交
  19. 01 10月, 2009 1 次提交
  20. 27 9月, 2009 1 次提交
  21. 25 9月, 2009 1 次提交
    • J
      genetlink: fix netns vs. netlink table locking (2) · b8273570
      Johannes Berg 提交于
      Similar to commit d136f1bd,
      there's a bug when unregistering a generic netlink family,
      which is caught by the might_sleep() added in that commit:
      
          BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
          in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
          2 locks held by rmmod/1510:
           #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
           #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
          Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
          Call Trace:
           [<ffffffff81044ff9>] __might_sleep+0x119/0x150
           [<ffffffff81380501>] netlink_table_grab+0x21/0x100
           [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
           [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
           [<ffffffff81382866>] genl_unregister_family+0x56/0x130
           [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
           [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]
      
      Fix in the same way by grabbing the netlink table lock
      before doing rcu_read_lock().
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8273570
  22. 22 9月, 2009 1 次提交
  23. 15 9月, 2009 1 次提交
    • J
      genetlink: fix netns vs. netlink table locking · d136f1bd
      Johannes Berg 提交于
      Since my commits introducing netns awareness into
      genetlink we can get this problem:
      
      BUG: scheduling while atomic: modprobe/1178/0x00000002
      2 locks held by modprobe/1178:
       #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou
       #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g
      Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
      Call Trace:
       [<ffffffff8103e285>] __schedule_bug+0x85/0x90
       [<ffffffff81403138>] schedule+0x108/0x588
       [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0
       [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100
       [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290
      
      because I overlooked that netlink_table_grab() will
      schedule, thinking it was just the rwlock. However,
      in the contention case, that isn't actually true.
      
      Fix this by letting the code grab the netlink table
      lock first and then the RCU for netns protection.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d136f1bd
  24. 25 8月, 2009 1 次提交
  25. 15 7月, 2009 1 次提交
    • J
      net/compat/wext: send different messages to compat tasks · 1dacc76d
      Johannes Berg 提交于
      Wireless extensions have the unfortunate problem that events
      are multicast netlink messages, and are not independent of
      pointer size. Thus, currently 32-bit tasks on 64-bit platforms
      cannot properly receive events and fail with all kinds of
      strange problems, for instance wpa_supplicant never notices
      disassociations, due to the way the 64-bit event looks (to a
      32-bit process), the fact that the address is all zeroes is
      lost, it thinks instead it is 00:00:00:00:01:00.
      
      The same problem existed with the ioctls, until David Miller
      fixed those some time ago in an heroic effort.
      
      A different problem caused by this is that we cannot send the
      ASSOCREQIE/ASSOCRESPIE events because sending them causes a
      32-bit wpa_supplicant on a 64-bit system to overwrite its
      internal information, which is worse than it not getting the
      information at all -- so we currently resort to sending a
      custom string event that it then parses. This, however, has a
      severe size limitation we are frequently hitting with modern
      access points; this limitation would can be lifted after this
      patch by sending the correct binary, not custom, event.
      
      A similar problem apparently happens for some other netlink
      users on x86_64 with 32-bit tasks due to the alignment for
      64-bit quantities.
      
      In order to fix these problems, I have implemented a way to
      send compat messages to tasks. When sending an event, we send
      the non-compat event data together with a compat event data in
      skb_shinfo(main_skb)->frag_list. Then, when the event is read
      from the socket, the netlink code makes sure to pass out only
      the skb that is compatible with the task. This approach was
      suggested by David Miller, my original approach required
      always sending two skbs but that had various small problems.
      
      To determine whether compat is needed or not, I have used the
      MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
      recvfrom to include it, even if those calls do not have a cmsg
      parameter.
      
      I have not solved one small part of the problem, and I don't
      think it is necessary to: if a 32-bit application uses read()
      rather than any form of recvmsg() it will still get the wrong
      (64-bit) event. However, neither do applications actually do
      this, nor would it be a regression.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dacc76d
  26. 13 7月, 2009 2 次提交
  27. 18 6月, 2009 1 次提交
  28. 25 3月, 2009 1 次提交
    • P
      netlink: add NETLINK_NO_ENOBUFS socket flag · 38938bfe
      Pablo Neira Ayuso 提交于
      This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can
      be used by unicast and broadcast listeners to avoid receiving
      ENOBUFS errors.
      
      Generally speaking, ENOBUFS errors are useful to notify two things
      to the listener:
      
      a) You may increase the receiver buffer size via setsockopt().
      b) You have lost messages, you may be out of sync.
      
      In some cases, ignoring ENOBUFS errors can be useful. For example:
      
      a) nfnetlink_queue: this subsystem does not have any sort of resync
      method and you can decide to ignore ENOBUFS once you have set a
      given buffer size.
      
      b) ctnetlink: you can use this together with the socket flag
      NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as
      you do not need to resync (packets whose event are not delivered
      are drop to provide reliable logging and state-synchronization).
      
      Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down"
      effect in terms of performance which is due to the netlink congestion
      control when the listener cannot back off. The effect is the following:
      
      1) throughput rate goes up and netlink messages are inserted in the
      receiver buffer.
      2) Then, netlink buffer fills and overruns (set on nlk->state bit 0).
      3) While the listener empties the receiver buffer, netlink keeps
      dropping messages. Thus, throughput goes dramatically down.
      4) Then, once the listener has emptied the buffer (nlk->state
      bit 0 is set off), goto step 1.
      
      This effect is easy to trigger with netlink broadcast under heavy
      load, and it is more noticeable when using a big receiver buffer.
      You can find some results in [1] that show this problem.
      
      [1] http://1984.lsi.us.es/linux/netlink/
      
      This patch also includes the use of sk_drop to account the number of
      netlink messages drop due to overrun. This value is shown in
      /proc/net/netlink.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38938bfe
  29. 23 3月, 2009 1 次提交
  30. 04 3月, 2009 1 次提交
    • P
      netlink: invert error code in netlink_set_err() · 4843b93c
      Pablo Neira Ayuso 提交于
      The callers of netlink_set_err() currently pass a negative value
      as parameter for the error code. However, sk->sk_err wants a
      positive error value. Without this patch, skb_recv_datagram() called
      by netlink_recvmsg() may return a positive value to report an error.
      
      Another choice to fix this is to change callers to pass a positive
      error value, but this seems a bit inconsistent and error prone
      to me. Indeed, the callers of netlink_set_err() assumed that the
      (usual) negative value for error codes was fine before this patch :).
      
      This patch also includes some documentation in docbook format
      for netlink_set_err() to avoid this sort of confusion.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4843b93c
  31. 27 2月, 2009 1 次提交
  32. 25 2月, 2009 1 次提交
    • P
      netlink: change nlmsg_notify() return value logic · 1ce85fe4
      Pablo Neira Ayuso 提交于
      This patch changes the return value of nlmsg_notify() as follows:
      
      If NETLINK_BROADCAST_ERROR is set by any of the listeners and
      an error in the delivery happened, return the broadcast error;
      else if there are no listeners apart from the socket that
      requested a change with the echo flag, return the result of the
      unicast notification. Thus, with this patch, the unicast
      notification is handled in the same way of a broadcast listener
      that has set the NETLINK_BROADCAST_ERROR socket flag.
      
      This patch is useful in case that the caller of nlmsg_notify()
      wants to know the result of the delivery of a netlink notification
      (including the broadcast delivery) and take any action in case
      that the delivery failed. For example, ctnetlink can drop packets
      if the event delivery failed to provide reliable logging and
      state-synchronization at the cost of dropping packets.
      
      This patch also modifies the rtnetlink code to ignore the return
      value of rtnl_notify() in all callers. The function rtnl_notify()
      (before this patch) returned the error of the unicast notification
      which makes rtnl_set_sk_err() reports errors to all listeners. This
      is not of any help since the origin of the change (the socket that
      requested the echoing) notices the ENOBUFS error if the notification
      fails and should resync itself.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce85fe4
  33. 20 2月, 2009 1 次提交
    • P
      netlink: add NETLINK_BROADCAST_ERROR socket option · be0c22a4
      Pablo Neira Ayuso 提交于
      This patch adds NETLINK_BROADCAST_ERROR which is a netlink
      socket option that the listener can set to make netlink_broadcast()
      return errors in the delivery to the caller. This option is useful
      if the caller of netlink_broadcast() do something with the result
      of the message delivery, like in ctnetlink where it drops a network
      packet if the event delivery failed, this is used to enable reliable
      logging and state-synchronization. If this socket option is not set,
      netlink_broadcast() only reports ESRCH errors and silently ignore
      ENOBUFS errors, which is what most netlink_broadcast() callers
      should do.
      
      This socket option is based on a suggestion from Patrick McHardy.
      Patrick McHardy can exchange this patch for a beer from me ;).
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be0c22a4
  34. 06 2月, 2009 1 次提交
    • P
      netlink: change return-value logic of netlink_broadcast() · ff491a73
      Pablo Neira Ayuso 提交于
      Currently, netlink_broadcast() reports errors to the caller if no
      messages at all were delivered:
      
      1) If, at least, one message has been delivered correctly, returns 0.
      2) Otherwise, if no messages at all were delivered due to skb_clone()
         failure, return -ENOBUFS.
      3) Otherwise, if there are no listeners, return -ESRCH.
      
      With this patch, the caller knows if the delivery of any of the
      messages to the listeners have failed:
      
      1) If it fails to deliver any message (for whatever reason), return
         -ENOBUFS.
      2) Otherwise, if all messages were delivered OK, returns 0.
      3) Otherwise, if no listeners, return -ESRCH.
      
      In the current ctnetlink code and in Netfilter in general, we can add
      reliable logging and connection tracking event delivery by dropping the
      packets whose events were not successfully delivered over Netlink. Of
      course, this option would be settable via /proc as this approach reduces
      performance (in terms of filtered connections per seconds by a stateful
      firewall) but providing reliable logging and event delivery (for
      conntrackd) in return.
      
      This patch also changes some clients of netlink_broadcast() that
      may report ENOBUFS errors via printk. This error handling is not
      of any help. Instead, the userspace daemons that are listening to
      those netlink messages should resync themselves with the kernel-side
      if they hit ENOBUFS.
      
      BTW, netlink_broadcast() clients include those that call
      cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they
      internally call netlink_broadcast() and return its error value.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff491a73
  35. 25 11月, 2008 1 次提交
  36. 24 11月, 2008 2 次提交
  37. 17 10月, 2008 1 次提交
  38. 14 10月, 2008 1 次提交