1. 06 5月, 2010 2 次提交
    • W
      netpoll: add generic support for bridge and bonding devices · 0e34e931
      WANG Cong 提交于
      This whole patchset is for adding netpoll support to bridge and bonding
      devices. I already tested it for bridge, bonding, bridge over bonding,
      and bonding over bridge. It looks fine now.
      
      To make bridge and bonding support netpoll, we need to adjust
      some netpoll generic code. This patch does the following things:
      
      1) introduce two new priv_flags for struct net_device:
         IFF_IN_NETPOLL which identifies we are processing a netpoll;
         IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
         at run-time;
      
      2) introduce one new method for netdev_ops:
         ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
           removed.
      
      3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
         export netpoll_send_skb() and netpoll_poll_dev() which will be used later;
      
      4) hide a pointer to struct netpoll in struct netpoll_info, ditto.
      
      5) introduce ->real_dev for struct netpoll.
      
      6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
         netconsole before releasing a slave, to avoid deadlocks.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e34e931
    • J
      mac80211: use fixed channel in ibss join when appropriate · adfba3c7
      Johannes Berg 提交于
      "mac80211: improve IBSS scanning" was missing a hunk.
      This adds that hunk as originally intended.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      adfba3c7
  2. 05 5月, 2010 1 次提交
    • E
      net: __alloc_skb() speedup · ec7d2f2c
      Eric Dumazet 提交于
      With following patch I can reach maximum rate of my pktgen+udpsink
      simulator :
      - 'old' machine : dual quad core E5450  @3.00GHz
      - 64 UDP rx flows (only differ by destination port)
      - RPS enabled, NIC interrupts serviced on cpu0
      - rps dispatched on 7 other cores. (~130.000 IPI per second)
      - SLAB allocator (faster than SLUB in this workload)
      - tg3 NIC
      - 1.080.000 pps without a single drop at NIC level.
      
      Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
      first sk_buff cache line, the second to prefetch the shinfo part.
      
      Also using one memset() to initialize all skb_shared_info fields instead
      of one by one to reduce number of instructions, using long word moves.
      
      All skb_shared_info fields before 'dataref' are cleared in 
      __alloc_skb().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec7d2f2c
  3. 04 5月, 2010 6 次提交
  4. 03 5月, 2010 1 次提交
    • C
      net: fix softnet_stat · dee42870
      Changli Gao 提交于
      Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
      without any protection. And enqueue_to_backlog should update the netdev_rx_stat
      of the target CPU.
      
      This patch renames softnet_data.total to softnet_data.processed: the number of
      packets processed in uppper levels(IP stacks).
      
      softnet_stat data is moved into softnet_data.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/linux/netdevice.h |   17 +++++++----------
       net/core/dev.c            |   26 ++++++++++++--------------
       net/sched/sch_generic.c   |    2 +-
       3 files changed, 20 insertions(+), 25 deletions(-)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dee42870
  5. 02 5月, 2010 2 次提交
    • D
      net: Inline skb_pull() in eth_type_trans(). · 47d29646
      David S. Miller 提交于
      In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot")
      we uninlined skb_pull.
      
      But in some critical paths it makes sense to inline this thing
      and it helps performance significantly.
      
      Create an skb_pull_inline() so that we can do this in a way that
      serves also as annotation.
      
      Based upon a patch by Eric Dumazet.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d29646
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482
  6. 01 5月, 2010 23 次提交
  7. 29 4月, 2010 5 次提交
    • E
      net: ip_queue_rcv_skb() helper · f84af32c
      Eric Dumazet 提交于
      When queueing a skb to socket, we can immediately release its dst if
      target socket do not use IP_CMSG_PKTINFO.
      
      tcp_data_queue() can drop dst too.
      
      This to benefit from a hot cache line and avoid the receiver, possibly
      on another cpu, to dirty this cache line himself.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84af32c
    • E
      net: speedup udp receive path · 4b0b72f7
      Eric Dumazet 提交于
      Since commit 95766fff ([UDP]: Add memory accounting.), 
      each received packet needs one extra sock_lock()/sock_release() pair.
      
      This added latency because of possible backlog handling. Then later,
      ticket spinlocks added yet another latency source in case of DDOS.
      
      This patch introduces lock_sock_bh() and unlock_sock_bh()
      synchronization primitives, avoiding one atomic operation and backlog
      processing.
      
      skb_free_datagram_locked() uses them instead of full blown
      lock_sock()/release_sock(). skb is orphaned inside locked section for
      proper socket memory reclaim, and finally freed outside of it.
      
      UDP receive path now take the socket spinlock only once.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b0b72f7
    • N
      sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4) · 5fa782c2
      Neil Horman 提交于
      Ok, version 4
      
      Change Notes:
      1) Minor cleanups, from Vlads notes
      
      Summary:
      
      Hey-
      	Recently, it was reported to me that the kernel could oops in the
      following way:
      
      <5> kernel BUG at net/core/skbuff.c:91!
      <5> invalid operand: 0000 [#1]
      <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
      ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
      vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
      ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
      snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
      pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
      mptbase sd_mod scsi_mod
      <5> CPU:    0
      <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
      <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL)
      <5> EIP is at skb_over_panic+0x1f/0x2d
      <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
      <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
      <5> ds: 007b   es: 007b   ss: 0068
      <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
      <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
      e0c2947d
      <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
      df653490
      <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
      00000004
      <5> Call Trace:
      <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
      <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
      <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
      <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
      <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
      <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
      <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
      <5>  [<c01555a4>] cache_grow+0x140/0x233
      <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
      <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
      <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
      <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
      <5>  [<c02d005e>] nf_iterate+0x40/0x81
      <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
      <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
      <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
      <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
      <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
      <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
      <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
      <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
      <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
      <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
      <5>  [<c012a7b1>] __do_softirq+0x35/0x79
      <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
      <5>  [<c01094de>] do_softirq+0x46/0x4d
      
      Its an skb_over_panic BUG halt that results from processing an init chunk in
      which too many of its variable length parameters are in some way malformed.
      
      The problem is in sctp_process_unk_param:
      if (NULL == *errp)
      	*errp = sctp_make_op_error_space(asoc, chunk,
      					 ntohs(chunk->chunk_hdr->length));
      
      	if (*errp) {
      		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
      				 WORD_ROUND(ntohs(param.p->length)));
      		sctp_addto_chunk(*errp,
      			WORD_ROUND(ntohs(param.p->length)),
      				  param.v);
      
      When we allocate an error chunk, we assume that the worst case scenario requires
      that we have chunk_hdr->length data allocated, which would be correct nominally,
      given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
      we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
      chunk, so the worst case situation in which all parameters are in violation
      requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
      
      The result of this error is that a deliberately malformed packet sent to a
      listening host can cause a remote DOS, described in CVE-2010-1173:
      http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
      
      I've tested the below fix and confirmed that it fixes the issue.  We move to a
      strategy whereby we allocate a fixed size error chunk and ignore errors we don't
      have space to report.  Tested by me successfully
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fa782c2
    • J
      mac80211: notify driver about IBSS status · 8fc214ba
      Johannes Berg 提交于
      Some drivers (e.g. iwlwifi) need to know and try
      to figure it out based on other things, but making
      it explicit is definitely better.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      8fc214ba
    • S
      mac80211: fix supported rates IE if AP doesn't give us it's rates · 76f27364
      Stanislaw Gruszka 提交于
      If AP do not provide us supported rates before assiociation, send
      all rates we are supporting instead of empty information element.
      
      v1 -> v2: Add comment.
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      76f27364