1. 27 9月, 2009 1 次提交
  2. 25 9月, 2009 1 次提交
    • J
      genetlink: fix netns vs. netlink table locking (2) · b8273570
      Johannes Berg 提交于
      Similar to commit d136f1bd,
      there's a bug when unregistering a generic netlink family,
      which is caught by the might_sleep() added in that commit:
      
          BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
          in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
          2 locks held by rmmod/1510:
           #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
           #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
          Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
          Call Trace:
           [<ffffffff81044ff9>] __might_sleep+0x119/0x150
           [<ffffffff81380501>] netlink_table_grab+0x21/0x100
           [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
           [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
           [<ffffffff81382866>] genl_unregister_family+0x56/0x130
           [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
           [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]
      
      Fix in the same way by grabbing the netlink table lock
      before doing rcu_read_lock().
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8273570
  3. 22 9月, 2009 1 次提交
  4. 15 9月, 2009 1 次提交
    • J
      genetlink: fix netns vs. netlink table locking · d136f1bd
      Johannes Berg 提交于
      Since my commits introducing netns awareness into
      genetlink we can get this problem:
      
      BUG: scheduling while atomic: modprobe/1178/0x00000002
      2 locks held by modprobe/1178:
       #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou
       #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g
      Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
      Call Trace:
       [<ffffffff8103e285>] __schedule_bug+0x85/0x90
       [<ffffffff81403138>] schedule+0x108/0x588
       [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0
       [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100
       [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290
      
      because I overlooked that netlink_table_grab() will
      schedule, thinking it was just the rwlock. However,
      in the contention case, that isn't actually true.
      
      Fix this by letting the code grab the netlink table
      lock first and then the RCU for netns protection.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d136f1bd
  5. 25 8月, 2009 1 次提交
  6. 15 7月, 2009 1 次提交
    • J
      net/compat/wext: send different messages to compat tasks · 1dacc76d
      Johannes Berg 提交于
      Wireless extensions have the unfortunate problem that events
      are multicast netlink messages, and are not independent of
      pointer size. Thus, currently 32-bit tasks on 64-bit platforms
      cannot properly receive events and fail with all kinds of
      strange problems, for instance wpa_supplicant never notices
      disassociations, due to the way the 64-bit event looks (to a
      32-bit process), the fact that the address is all zeroes is
      lost, it thinks instead it is 00:00:00:00:01:00.
      
      The same problem existed with the ioctls, until David Miller
      fixed those some time ago in an heroic effort.
      
      A different problem caused by this is that we cannot send the
      ASSOCREQIE/ASSOCRESPIE events because sending them causes a
      32-bit wpa_supplicant on a 64-bit system to overwrite its
      internal information, which is worse than it not getting the
      information at all -- so we currently resort to sending a
      custom string event that it then parses. This, however, has a
      severe size limitation we are frequently hitting with modern
      access points; this limitation would can be lifted after this
      patch by sending the correct binary, not custom, event.
      
      A similar problem apparently happens for some other netlink
      users on x86_64 with 32-bit tasks due to the alignment for
      64-bit quantities.
      
      In order to fix these problems, I have implemented a way to
      send compat messages to tasks. When sending an event, we send
      the non-compat event data together with a compat event data in
      skb_shinfo(main_skb)->frag_list. Then, when the event is read
      from the socket, the netlink code makes sure to pass out only
      the skb that is compatible with the task. This approach was
      suggested by David Miller, my original approach required
      always sending two skbs but that had various small problems.
      
      To determine whether compat is needed or not, I have used the
      MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
      recvfrom to include it, even if those calls do not have a cmsg
      parameter.
      
      I have not solved one small part of the problem, and I don't
      think it is necessary to: if a 32-bit application uses read()
      rather than any form of recvmsg() it will still get the wrong
      (64-bit) event. However, neither do applications actually do
      this, nor would it be a regression.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dacc76d
  7. 13 7月, 2009 2 次提交
  8. 18 6月, 2009 1 次提交
  9. 25 3月, 2009 1 次提交
    • P
      netlink: add NETLINK_NO_ENOBUFS socket flag · 38938bfe
      Pablo Neira Ayuso 提交于
      This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can
      be used by unicast and broadcast listeners to avoid receiving
      ENOBUFS errors.
      
      Generally speaking, ENOBUFS errors are useful to notify two things
      to the listener:
      
      a) You may increase the receiver buffer size via setsockopt().
      b) You have lost messages, you may be out of sync.
      
      In some cases, ignoring ENOBUFS errors can be useful. For example:
      
      a) nfnetlink_queue: this subsystem does not have any sort of resync
      method and you can decide to ignore ENOBUFS once you have set a
      given buffer size.
      
      b) ctnetlink: you can use this together with the socket flag
      NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as
      you do not need to resync (packets whose event are not delivered
      are drop to provide reliable logging and state-synchronization).
      
      Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down"
      effect in terms of performance which is due to the netlink congestion
      control when the listener cannot back off. The effect is the following:
      
      1) throughput rate goes up and netlink messages are inserted in the
      receiver buffer.
      2) Then, netlink buffer fills and overruns (set on nlk->state bit 0).
      3) While the listener empties the receiver buffer, netlink keeps
      dropping messages. Thus, throughput goes dramatically down.
      4) Then, once the listener has emptied the buffer (nlk->state
      bit 0 is set off), goto step 1.
      
      This effect is easy to trigger with netlink broadcast under heavy
      load, and it is more noticeable when using a big receiver buffer.
      You can find some results in [1] that show this problem.
      
      [1] http://1984.lsi.us.es/linux/netlink/
      
      This patch also includes the use of sk_drop to account the number of
      netlink messages drop due to overrun. This value is shown in
      /proc/net/netlink.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38938bfe
  10. 23 3月, 2009 1 次提交
  11. 04 3月, 2009 1 次提交
    • P
      netlink: invert error code in netlink_set_err() · 4843b93c
      Pablo Neira Ayuso 提交于
      The callers of netlink_set_err() currently pass a negative value
      as parameter for the error code. However, sk->sk_err wants a
      positive error value. Without this patch, skb_recv_datagram() called
      by netlink_recvmsg() may return a positive value to report an error.
      
      Another choice to fix this is to change callers to pass a positive
      error value, but this seems a bit inconsistent and error prone
      to me. Indeed, the callers of netlink_set_err() assumed that the
      (usual) negative value for error codes was fine before this patch :).
      
      This patch also includes some documentation in docbook format
      for netlink_set_err() to avoid this sort of confusion.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4843b93c
  12. 27 2月, 2009 1 次提交
  13. 25 2月, 2009 1 次提交
    • P
      netlink: change nlmsg_notify() return value logic · 1ce85fe4
      Pablo Neira Ayuso 提交于
      This patch changes the return value of nlmsg_notify() as follows:
      
      If NETLINK_BROADCAST_ERROR is set by any of the listeners and
      an error in the delivery happened, return the broadcast error;
      else if there are no listeners apart from the socket that
      requested a change with the echo flag, return the result of the
      unicast notification. Thus, with this patch, the unicast
      notification is handled in the same way of a broadcast listener
      that has set the NETLINK_BROADCAST_ERROR socket flag.
      
      This patch is useful in case that the caller of nlmsg_notify()
      wants to know the result of the delivery of a netlink notification
      (including the broadcast delivery) and take any action in case
      that the delivery failed. For example, ctnetlink can drop packets
      if the event delivery failed to provide reliable logging and
      state-synchronization at the cost of dropping packets.
      
      This patch also modifies the rtnetlink code to ignore the return
      value of rtnl_notify() in all callers. The function rtnl_notify()
      (before this patch) returned the error of the unicast notification
      which makes rtnl_set_sk_err() reports errors to all listeners. This
      is not of any help since the origin of the change (the socket that
      requested the echoing) notices the ENOBUFS error if the notification
      fails and should resync itself.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce85fe4
  14. 20 2月, 2009 1 次提交
    • P
      netlink: add NETLINK_BROADCAST_ERROR socket option · be0c22a4
      Pablo Neira Ayuso 提交于
      This patch adds NETLINK_BROADCAST_ERROR which is a netlink
      socket option that the listener can set to make netlink_broadcast()
      return errors in the delivery to the caller. This option is useful
      if the caller of netlink_broadcast() do something with the result
      of the message delivery, like in ctnetlink where it drops a network
      packet if the event delivery failed, this is used to enable reliable
      logging and state-synchronization. If this socket option is not set,
      netlink_broadcast() only reports ESRCH errors and silently ignore
      ENOBUFS errors, which is what most netlink_broadcast() callers
      should do.
      
      This socket option is based on a suggestion from Patrick McHardy.
      Patrick McHardy can exchange this patch for a beer from me ;).
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be0c22a4
  15. 06 2月, 2009 1 次提交
    • P
      netlink: change return-value logic of netlink_broadcast() · ff491a73
      Pablo Neira Ayuso 提交于
      Currently, netlink_broadcast() reports errors to the caller if no
      messages at all were delivered:
      
      1) If, at least, one message has been delivered correctly, returns 0.
      2) Otherwise, if no messages at all were delivered due to skb_clone()
         failure, return -ENOBUFS.
      3) Otherwise, if there are no listeners, return -ESRCH.
      
      With this patch, the caller knows if the delivery of any of the
      messages to the listeners have failed:
      
      1) If it fails to deliver any message (for whatever reason), return
         -ENOBUFS.
      2) Otherwise, if all messages were delivered OK, returns 0.
      3) Otherwise, if no listeners, return -ESRCH.
      
      In the current ctnetlink code and in Netfilter in general, we can add
      reliable logging and connection tracking event delivery by dropping the
      packets whose events were not successfully delivered over Netlink. Of
      course, this option would be settable via /proc as this approach reduces
      performance (in terms of filtered connections per seconds by a stateful
      firewall) but providing reliable logging and event delivery (for
      conntrackd) in return.
      
      This patch also changes some clients of netlink_broadcast() that
      may report ENOBUFS errors via printk. This error handling is not
      of any help. Instead, the userspace daemons that are listening to
      those netlink messages should resync themselves with the kernel-side
      if they hit ENOBUFS.
      
      BTW, netlink_broadcast() clients include those that call
      cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they
      internally call netlink_broadcast() and return its error value.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff491a73
  16. 25 11月, 2008 1 次提交
  17. 24 11月, 2008 2 次提交
  18. 17 10月, 2008 1 次提交
  19. 14 10月, 2008 1 次提交
  20. 26 7月, 2008 1 次提交
  21. 02 7月, 2008 1 次提交
  22. 06 6月, 2008 1 次提交
  23. 28 4月, 2008 1 次提交
  24. 19 4月, 2008 1 次提交
  25. 26 3月, 2008 3 次提交
  26. 22 3月, 2008 1 次提交
  27. 01 3月, 2008 2 次提交
  28. 02 2月, 2008 1 次提交
  29. 01 2月, 2008 1 次提交
    • P
      [NETNS]: Fix race between put_net() and netlink_kernel_create(). · 23fe1866
      Pavel Emelyanov 提交于
      The comment about "race free view of the set of network
      namespaces" was a bit hasty. Look (there even can be only
      one CPU, as discovered by Alexey Dobriyan and Denis Lunev):
      
      put_net()
        if (atomic_dec_and_test(&net->refcnt))
          /* true */
            __put_net(net);
              queue_work(...);
      
      /*
       * note: the net now has refcnt 0, but still in
       * the global list of net namespaces
       */
      
      == re-schedule ==
      
      register_pernet_subsys(&some_ops);
        register_pernet_operations(&some_ops);
          (*some_ops)->init(net);
            /*
             * we call netlink_kernel_create() here
             * in some places
             */
            netlink_kernel_create();
               sk_alloc();
                  get_net(net); /* refcnt = 1 */
               /*
                * now we drop the net refcount not to
                * block the net namespace exit in the
                * future (or this can be done on the
                * error path)
                */
               put_net(sk->sk_net);
                   if (atomic_dec_and_test(&...))
                         /*
                          * true. BOOOM! The net is
                          * scheduled for release twice
                          */
      
      When thinking on this problem, I decided, that getting and
      putting the net in init callback is wrong. If some init
      callback needs to have a refcount-less reference on the struct
      net, _it_ has to be careful himself, rather than relying on
      the infrastructure to handle this correctly.
      
      In case of netlink_kernel_create(), the problem is that the
      sk_alloc() gets the given namespace, but passing the info
      that we don't want to get it inside this call is too heavy.
      
      Instead, I propose to crate the socket inside an init_net
      namespace and then re-attach it to the desired one right
      after the socket is created.
      
      After doing this, we also have to be careful on error paths
      not to drop the reference on the namespace, we didn't get
      the one on.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NDenis Lunev <den@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fe1866
  30. 29 1月, 2008 6 次提交