1. 10 7月, 2009 1 次提交
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  2. 09 7月, 2009 2 次提交
    • A
      netpoll: Fix carrier detection for drivers that are using phylib · 1b614fb9
      Anton Vorontsov 提交于
      Using early netconsole and gianfar driver this error pops up:
      
        netconsole: timeout waiting for carrier
      
      It appears that net/core/netpoll.c:netpoll_setup() is using
      cond_resched() in a loop waiting for a carrier.
      
      The thing is that cond_resched() is a no-op when system_state !=
      SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
      scheduled, therefore link detection doesn't work.
      
      I belive that the main problem is in cond_resched()[1], but despite
      how the cond_resched() story ends, it might be a good idea to call
      msleep(1) instead of cond_resched(), as suggested by Andrew Morton.
      
      [1] http://lkml.org/lkml/2009/7/7/463Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b614fb9
    • J
      ipv4: Fix fib_trie rebalancing, part 4 (root thresholds) · 345aa031
      Jarek Poplawski 提交于
      Pawel Staszewski wrote:
      <blockquote>
      Some time ago i report this:
      http://bugzilla.kernel.org/show_bug.cgi?id=6648
      
      and now with 2.6.29 / 2.6.29.1 / 2.6.29.3 and 2.6.30 it back
      dmesg output:
      oprofile: using NMI interrupt.
      Fix inflate_threshold_root. Now=15 size=11 bits
      ...
      Fix inflate_threshold_root. Now=15 size=11 bits
      
      cat /proc/net/fib_triestat
      Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes.
      Main:
              Aver depth:     2.28
              Max depth:      6
              Leaves:         276539
              Prefixes:       289922
              Internal nodes: 66762
                1: 35046  2: 13824  3: 9508  4: 4897  5: 2331  6: 1149  7: 5
      9: 1  18: 1
              Pointers: 691228
      Null ptrs: 347928
      Total size: 35709  kB
      </blockquote>
      
      It seems, the current threshold for root resizing is too aggressive,
      and it causes misleading warnings during big updates, but it might be
      also responsible for memory problems, especially with non-preempt
      configs, when RCU freeing is delayed long after call_rcu.
      
      It should be also mentioned that because of non-atomic changes during
      resizing/rebalancing the current lookup algorithm can miss valid leaves
      so it's additional argument to shorten these activities even at a cost
      of a minimally longer searching.
      
      This patch restores values before the patch "[IPV4]: fib_trie root
      node settings", commit: 965ffea4 from
      v2.6.22.
      
      Pawel's report:
      <blockquote>
      I dont see any big change of (cpu load or faster/slower
      routing/propagating routes from bgpd or something else) - in avg there
      is from 2% to 3% more of CPU load i dont know why but it is - i change
      from "preempt" to "no preempt" 3 times and check this my "mpstat -P ALL
      1 30"
      always avg cpu load was from 2 to 3% more compared to "no preempt"
      [...]
      cat /proc/net/fib_triestat
      Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes.
      Main:
              Aver depth:     2.44
              Max depth:      6
              Leaves:         277814
              Prefixes:       291306
              Internal nodes: 66420
                1: 32737  2: 14850  3: 10332  4: 4871  5: 2313  6: 942  7: 371  8: 3  17: 1
              Pointers: 599098
      Null ptrs: 254865
      Total size: 18067  kB
      </blockquote>
      
      According to this and other similar reports average depth is slightly
      increased (~0.2), and root nodes are shorter (log 17 vs. 18), but
      there is no visible performance decrease. So, until memory handling is
      improved or added parameters for changing this individually, this
      patch resets to safer defaults.
      Reported-by: NPawel Staszewski <pstaszewski@itcare.pl>
      Reported-by: NJorge Boncompte [DTI2] <jorge@dti2.net>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Tested-by: NPawel Staszewski <pstaszewski@itcare.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      345aa031
  3. 08 7月, 2009 4 次提交
    • L
      mac80211: minstrel: avoid accessing negative indices in rix_to_ndx() · 3938b45c
      Luciano Coelho 提交于
      If rix is not found in mi->r[], i will become -1 after the loop.  This value
      is eventually used to access arrays, so we were accessing arrays with a
      negative index, which is obviously not what we want to do.  This patch fixes
      this potential problem.
      Signed-off-by: NLuciano Coelho <luciano.coelho@nokia.com>
      Acked-by: NFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      3938b45c
    • J
      cfg80211: fix refcount leak · 2dce4c2b
      Johannes Berg 提交于
      The code in cfg80211's cfg80211_bss_update erroneously
      grabs a reference to the BSS, which means that it will
      never be freed.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Cc: stable@kernel.org [2.6.29, 2.6.30]
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      2dce4c2b
    • A
      mac80211: fix allocation in mesh_queue_preq · 59615b5f
      Andrey Yurovsky 提交于
      We allocate a PREQ queue node in mesh_queue_preq, however the allocation
      may cause us to sleep.  Use GFP_ATOMIC to prevent this.
      
      [ 1869.126498] BUG: scheduling while atomic: ping/1859/0x10000100
      [ 1869.127164] Modules linked in: ath5k mac80211 ath
      [ 1869.128310] Pid: 1859, comm: ping Not tainted 2.6.30-wl #1
      [ 1869.128754] Call Trace:
      [ 1869.129293]  [<c1023a2b>] __schedule_bug+0x48/0x4d
      [ 1869.129866]  [<c13b5533>] __schedule+0x77/0x67a
      [ 1869.130544]  [<c1026f2e>] ? release_console_sem+0x17d/0x185
      [ 1869.131568]  [<c807cf47>] ? mesh_queue_preq+0x2b/0x165 [mac80211]
      [ 1869.132318]  [<c13b5b3e>] schedule+0x8/0x1f
      [ 1869.132807]  [<c1023c12>] __cond_resched+0x16/0x2f
      [ 1869.133478]  [<c13b5bf0>] _cond_resched+0x27/0x32
      [ 1869.134191]  [<c108a370>] kmem_cache_alloc+0x1c/0xcf
      [ 1869.134714]  [<c10273ae>] ? printk+0x15/0x17
      [ 1869.135670]  [<c807cf47>] mesh_queue_preq+0x2b/0x165 [mac80211]
      [ 1869.136731]  [<c807d1f8>] mesh_nexthop_lookup+0xee/0x12d [mac80211]
      [ 1869.138130]  [<c807417e>] ieee80211_xmit+0xe6/0x2b2 [mac80211]
      [ 1869.138935]  [<c80be46d>] ? ath5k_hw_setup_rx_desc+0x0/0x66 [ath5k]
      [ 1869.139831]  [<c80c97bc>] ? ath5k_tasklet_rx+0xba/0x506 [ath5k]
      [ 1869.140863]  [<c8075191>] ieee80211_subif_start_xmit+0x6c9/0x6e4
      [mac80211]
      [ 1869.141665]  [<c105cf1c>] ? handle_level_irq+0x78/0x9d
      [ 1869.142390]  [<c12e3f93>] dev_hard_start_xmit+0x168/0x1c7
      [ 1869.143092]  [<c12f1f17>] __qdisc_run+0xe1/0x1b7
      [ 1869.143612]  [<c12e25ff>] qdisc_run+0x18/0x1a
      [ 1869.144248]  [<c12e62f4>] dev_queue_xmit+0x16a/0x25a
      [ 1869.144785]  [<c13b6dcc>] ? _read_unlock_bh+0xe/0x10
      [ 1869.145465]  [<c12eacdb>] neigh_resolve_output+0x19c/0x1c7
      [ 1869.146182]  [<c130e2da>] ? ip_finish_output+0x0/0x51
      [ 1869.146697]  [<c130e2a0>] ip_finish_output2+0x182/0x1bc
      [ 1869.147358]  [<c130e327>] ip_finish_output+0x4d/0x51
      [ 1869.147863]  [<c130e9d5>] ip_output+0x80/0x85
      [ 1869.148515]  [<c130cc49>] dst_output+0x9/0xb
      [ 1869.149141]  [<c130dec6>] ip_local_out+0x17/0x1a
      [ 1869.149632]  [<c130e0bc>] ip_push_pending_frames+0x1f3/0x255
      [ 1869.150343]  [<c13247ff>] raw_sendmsg+0x5e6/0x667
      [ 1869.150883]  [<c1033c55>] ? insert_work+0x6a/0x73
      [ 1869.151834]  [<c8071e00>] ?
      ieee80211_invoke_rx_handlers+0x17da/0x1ae8 [mac80211]
      [ 1869.152630]  [<c132bd68>] inet_sendmsg+0x3b/0x48
      [ 1869.153232]  [<c12d7deb>] __sock_sendmsg+0x45/0x4e
      [ 1869.153740]  [<c12d8537>] sock_sendmsg+0xb8/0xce
      [ 1869.154519]  [<c80be46d>] ? ath5k_hw_setup_rx_desc+0x0/0x66 [ath5k]
      [ 1869.155289]  [<c1036b25>] ? autoremove_wake_function+0x0/0x30
      [ 1869.155859]  [<c115992b>] ? __copy_from_user_ll+0x11/0xce
      [ 1869.156573]  [<c1159d99>] ? copy_from_user+0x31/0x54
      [ 1869.157235]  [<c12df646>] ? verify_iovec+0x40/0x6e
      [ 1869.157778]  [<c12d869a>] sys_sendmsg+0x14d/0x1a5
      [ 1869.158714]  [<c8072c40>] ? __ieee80211_rx+0x49e/0x4ee [mac80211]
      [ 1869.159641]  [<c80c83fe>] ? ath5k_rxbuf_setup+0x6d/0x8d [ath5k]
      [ 1869.160543]  [<c80be46d>] ? ath5k_hw_setup_rx_desc+0x0/0x66 [ath5k]
      [ 1869.161434]  [<c80beba4>] ? ath5k_hw_get_rxdp+0xe/0x10 [ath5k]
      [ 1869.162319]  [<c80c97bc>] ? ath5k_tasklet_rx+0xba/0x506 [ath5k]
      [ 1869.163063]  [<c1005627>] ? enable_8259A_irq+0x40/0x43
      [ 1869.163594]  [<c101edb8>] ? __dequeue_entity+0x23/0x27
      [ 1869.164793]  [<c100187a>] ? __switch_to+0x2b/0x105
      [ 1869.165442]  [<c1021d5f>] ? finish_task_switch+0x5b/0x74
      [ 1869.166129]  [<c12d963a>] sys_socketcall+0x14b/0x17b
      [ 1869.166612]  [<c1002b95>] syscall_call+0x7/0xb
      Signed-off-by: NAndrey Yurovsky <andrey@cozybit.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      59615b5f
    • J
      Wireless: nl80211, fix lock imbalance · 1f5fc70a
      Jiri Slaby 提交于
      Don't forget to unlock cfg80211_mutex in one fail path of
      nl80211_set_wiphy.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      1f5fc70a
  4. 07 7月, 2009 1 次提交
    • W
      sctp: fix warning at inet_sock_destruct() while release sctp socket · 1bc4ee40
      Wei Yongjun 提交于
      Commit 'net: Move rx skb_orphan call to where needed' broken sctp protocol
      with warning at inet_sock_destruct(). Actually, sctp can do this right with
      sctp_sock_rfree_frag() and sctp_skb_set_owner_r_frag() pair.
      
          sctp_sock_rfree_frag(skb);
          sctp_skb_set_owner_r_frag(skb, newsk);
      
      This patch not revert the commit d55d87fd,
      instead remove the sctp_sock_rfree_frag() function.
      
      ------------[ cut here ]------------
      WARNING: at net/ipv4/af_inet.c:151 inet_sock_destruct+0xe0/0x142()
      Modules linked in: sctp ipv6 dm_mirror dm_region_hash dm_log dm_multipath
      scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
      Pid: 1808, comm: sctp_test Not tainted 2.6.31-rc2 #40
      Call Trace:
       [<c042dd06>] warn_slowpath_common+0x6a/0x81
       [<c064a39a>] ? inet_sock_destruct+0xe0/0x142
       [<c042dd2f>] warn_slowpath_null+0x12/0x15
       [<c064a39a>] inet_sock_destruct+0xe0/0x142
       [<c05fde44>] __sk_free+0x19/0xcc
       [<c05fdf50>] sk_free+0x18/0x1a
       [<ca0d14ad>] sctp_close+0x192/0x1a1 [sctp]
       [<c0649f7f>] inet_release+0x47/0x4d
       [<c05fba4d>] sock_release+0x19/0x5e
       [<c05fbab3>] sock_close+0x21/0x25
       [<c049c31b>] __fput+0xde/0x189
       [<c049c3de>] fput+0x18/0x1a
       [<c049988f>] filp_close+0x56/0x60
       [<c042f422>] put_files_struct+0x5d/0xa1
       [<c042f49f>] exit_files+0x39/0x3d
       [<c043086a>] do_exit+0x1a5/0x5dd
       [<c04a86c2>] ? d_kill+0x35/0x3b
       [<c0438fa4>] ? dequeue_signal+0xa6/0x115
       [<c0430d05>] do_group_exit+0x63/0x8a
       [<c0439504>] get_signal_to_deliver+0x2e1/0x2f9
       [<c0401d9e>] do_notify_resume+0x7c/0x6b5
       [<c043f601>] ? autoremove_wake_function+0x0/0x34
       [<c04a864e>] ? __d_free+0x3d/0x40
       [<c04a867b>] ? d_free+0x2a/0x3c
       [<c049ba7e>] ? vfs_write+0x103/0x117
       [<c05fc8fa>] ? sys_socketcall+0x178/0x182
       [<c0402a56>] work_notifysig+0x13/0x19
      ---[ end trace 9db92c463e789fba ]---
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bc4ee40
  5. 06 7月, 2009 1 次提交
  6. 04 7月, 2009 3 次提交
    • B
      IPv6: preferred lifetime of address not getting updated · a1ed0526
      Brian Haley 提交于
      There's a bug in addrconf_prefix_rcv() where it won't update the
      preferred lifetime of an IPv6 address if the current valid lifetime
      of the address is less than 2 hours (the minimum value in the RA).
      
      For example, If I send a router advertisement with a prefix that
      has valid lifetime = preferred lifetime = 2 hours we'll build
      this address:
      
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
             valid_lft 7175sec preferred_lft 7175sec
      
      If I then send the same prefix with valid lifetime = preferred
      lifetime = 0 it will be ignored since the minimum valid lifetime
      is 2 hours:
      
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
             valid_lft 7161sec preferred_lft 7161sec
      
      But according to RFC 4862 we should always reset the preferred lifetime
      even if the valid lifetime is invalid, which would cause the address
      to immediately get deprecated.  So with this patch we'd see this:
      
      5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
             valid_lft 7163sec preferred_lft 0sec
      
      The comment winds-up being 5x the size of the code to fix the problem.
      
      Update the preferred lifetime of IPv6 addresses derived from a prefix
      info option in a router advertisement even if the valid lifetime in
      the option is invalid, as specified in RFC 4862 Section 5.5.3e.  Fixes
      an issue where an address will not immediately become deprecated.
      Reported by Jens Rosenboom.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1ed0526
    • W
      xfrm6: fix the proto and ports decode of sctp protocol · 59cae009
      Wei Yongjun 提交于
      The SCTP pushed the skb above the sctp chunk header, so the
      check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
      _decode_session6() will never return 0 and the ports decode
      of sctp will always fail. (nh + offset + 1 - skb->data < 0)
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59cae009
    • W
      xfrm4: fix the ports decode of sctp protocol · c615c9f3
      Wei Yongjun 提交于
      The SCTP pushed the skb data above the sctp chunk header, so the check
      of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will
      never return 0 because xprth + 4 - skb->data < 0, the ports decode of
      sctp will always fail.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c615c9f3
  7. 03 7月, 2009 1 次提交
  8. 01 7月, 2009 2 次提交
  9. 30 6月, 2009 4 次提交
  10. 29 6月, 2009 4 次提交
  11. 27 6月, 2009 7 次提交
  12. 26 6月, 2009 2 次提交
  13. 25 6月, 2009 4 次提交
  14. 24 6月, 2009 2 次提交
    • N
      ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off · b6280b47
      Neil Horman 提交于
      When route caching is disabled (rt_caching returns false), We still use route
      cache entries that are created and passed into rt_intern_hash once.  These
      routes need to be made usable for the one call path that holds a reference to
      them, and they need to be reclaimed when they're finished with their use.  To be
      made usable, they need to be associated with a neighbor table entry (which they
      currently are not), otherwise iproute_finish2 just discards the packet, since we
      don't know which L2 peer to send the packet to.  To do this binding, we need to
      follow the path a bit higher up in rt_intern_hash, which calls
      arp_bind_neighbour, but not assign the route entry to the hash table.
      Currently, if caching is off, we simply assign the route to the rp pointer and
      are reutrn success.  This patch associates us with a neighbor entry first.
      
      Secondly, we need to make sure that any single use routes like this are known to
      the garbage collector when caching is off.  If caching is off, and we try to
      hash in a route, it will leak when its refcount reaches zero.  To avoid this,
      this patch calls rt_free on the route cache entry passed into rt_intern_hash.
      This places us on the gc list for the route cache garbage collector, so that
      when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this
      suggestion).
      
      I've tested this on a local system here, and with these patches in place, I'm
      able to maintain routed connectivity to remote systems, even if I set
      /proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to
      return false.
      Signed-off-by: NNeil Horman <nhorman@redhat.com>
      Reported-by: NJarek Poplawski <jarkao2@gmail.com>
      Reported-by: NMaxime Bizon <mbizon@freebox.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6280b47
    • H
      net: Move rx skb_orphan call to where needed · d55d87fd
      Herbert Xu 提交于
      In order to get the tun driver to account packets, we need to be
      able to receive packets with destructors set.  To be on the safe
      side, I added an skb_orphan call for all protocols by default since
      some of them (IP in particular) cannot handle receiving packets
      destructors properly.
      
      Now it seems that at least one protocol (CAN) expects to be able
      to pass skb->sk through the rx path without getting clobbered.
      
      So this patch attempts to fix this properly by moving the skb_orphan
      call to where it's actually needed.  In particular, I've added it
      to skb_set_owner_[rw] which is what most users of skb->destructor
      call.
      
      This is actually an improvement for tun too since it means that
      we only give back the amount charged to the socket when the skb
      is passed to another socket that will also be charged accordingly.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NOliver Hartkopp <olver@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d55d87fd
  15. 23 6月, 2009 1 次提交
  16. 22 6月, 2009 1 次提交