1. 09 2月, 2011 1 次提交
    • D
      net: Fix lockdep regression caused by initializing netdev queues too early. · 8d3bdbd5
      David S. Miller 提交于
      In commit aa942104 ("net: init ingress
      queue") we moved the allocation and lock initialization of the queues
      into alloc_netdev_mq() since register_netdevice() is way too late.
      
      The problem is that dev->type is not setup until the setup()
      callback is invoked by alloc_netdev_mq(), and the dev->type is
      what determines the lockdep class to use for the locks in the
      queues.
      
      Fix this by doing the queue allocation after the setup() callback
      runs.
      
      This is safe because the setup() callback is not allowed to make any
      state changes that need to be undone on error (memory allocations,
      etc.).  It may, however, make state changes that are undone by
      free_netdev() (such as netif_napi_add(), which is done by the
      ipoib driver's setup routine).
      
      The previous code also leaked a reference to the &init_net namespace
      object on RX/TX queue allocation failures.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d3bdbd5
  2. 03 2月, 2011 1 次提交
  3. 01 2月, 2011 1 次提交
  4. 30 1月, 2011 2 次提交
    • E
      net: Fix ip link add netns oops · 13ad1774
      Eric W. Biederman 提交于
      Ed Swierk <eswierk@bigswitch.com> writes:
      > On 2.6.35.7
      >  ip link add link eth0 netns 9999 type macvlan
      > where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang:
      > [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d
      >  [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
      >  [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0
      >  [10663.821944] Oops: 0000 [#1] SMP
      >  [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
      >  [10663.821959] CPU 3
      >  [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class
      >  [10663.822155]
      >  [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO
      >  [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
      >  [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286
      >  [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000
      >  [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000
      >  [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041
      >  [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818
      >  [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000
      >  [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000
      >  [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      >  [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0
      >  [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      >  [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      >  [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0)
      >  [10663.822236] Stack:
      >  [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6
      >  [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818
      >  [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413
      >  [10663.822281] Call Trace:
      >  [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0
      >  [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90
      >  [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0
      >  [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570
      >  [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570
      >  [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20
      >  [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290
      >  [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290
      >  [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0
      >  [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40
      >  [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0
      >  [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0
      >  [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120
      >  [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
      >  [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150
      >  [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
      >  [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80
      >  [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110
      >  [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70
      >  [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0
      >  [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0
      > [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560
      >  [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
      >  [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150
      >  [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350
      >  [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
      >  [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55
      >  [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
      >  [10663.822627] RSP <ffff88014aebf7b8>
      >  [10663.822631] CR2: 000000000000006d
      >  [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]---
      
      This bug was introduced in:
      commit 81adee47
      Author: Eric W. Biederman <ebiederm@aristanetworks.com>
      Date:   Sun Nov 8 00:53:51 2009 -0800
      
          net: Support specifying the network namespace upon device creation.
      
          There is no good reason to not support userspace specifying the
          network namespace during device creation, and it makes it easier
          to create a network device and pass it to a child network namespace
          with a well known name.
      
          We have to be careful to ensure that the target network namespace
          for the new device exists through the life of the call.  To keep
          that logic clear I have factored out the network namespace grabbing
          logic into rtnl_link_get_net.
      
          In addtion we need to continue to pass the source network namespace
          to the rtnl_link_ops.newlink method so that we can find the base
          device source network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      
      Where apparently I forgot to add error handling to the path where we create
      a new network device in a new network namespace, and pass in an invalid pid.
      
      Cc: stable@kernel.org
      Reported-by: NEd Swierk <eswierk@bigswitch.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13ad1774
    • H
      gro: Reset dev pointer on reuse · 66c46d74
      Herbert Xu 提交于
      On older kernels the VLAN code may zero skb->dev before dropping
      it and causing it to be reused by GRO.
      
      Unfortunately we didn't reset skb->dev in that case which causes
      the next GRO user to get a bogus skb->dev pointer.
      
      This particular problem no longer happens with the current upstream
      kernel due to changes in VLAN processing.
      
      However, for correctness we should still reset the skb->dev pointer
      in the GRO reuse function in case a future user does the same thing.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66c46d74
  5. 28 1月, 2011 2 次提交
  6. 25 1月, 2011 3 次提交
    • E
      net: clear heap allocation for ethtool_get_regs() · b7c7d01a
      Eugene Teo 提交于
      There is a conflict between commit b00916b1 and a77f5db3. This patch resolves
      the conflict by clearing the heap allocation in ethtool_get_regs().
      
      Cc: stable@kernel.org
      Signed-off-by: NEugene Teo <eugeneteo@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7c7d01a
    • M
      GRO: fix merging a paged skb after non-paged skbs · d1dc7abf
      Michal Schmidt 提交于
      Suppose that several linear skbs of the same flow were received by GRO. They
      were thus merged into one skb with a frag_list. Then a new skb of the same flow
      arrives, but it is a paged skb with data starting in its frags[].
      
      Before adding the skb to the frag_list skb_gro_receive() will of course adjust
      the skb to throw away the headers. It correctly modifies the page_offset and
      size of the frag, but it leaves incorrect information in the skb:
       ->data_len is not decreased at all.
       ->len is decreased only by headlen, as if no change were done to the frag.
      Later in a receiving process this causes skb_copy_datagram_iovec() to return
      -EFAULT and this is seen in userspace as the result of the recv() syscall.
      
      In practice the bug can be reproduced with the sfc driver. By default the
      driver uses an adaptive scheme when it switches between using
      napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
      reproduced when under rx load with enough successful GRO merging the driver
      decides to switch from the former to the latter.
      
      Manual control is also possible, so reproducing this is easy with netcat:
       - on machine1 (with sfc): nc -l 12345 > /dev/null
       - on machine2: nc machine1 12345 < /dev/zero
       - on machine1:
         echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
         echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
       - See that nc has quit suddenly.
      
      [v2: Modified by Eric Dumazet to avoid advancing skb->data past the end
           and to use a temporary variable.]
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1dc7abf
    • E
      net: arp_ioctl() must hold RTNL · c506653d
      Eric Dumazet 提交于
      Commit 941666c2 "net: RCU conversion of dev_getbyhwaddr() and
      arp_ioctl()" introduced a regression, reported by Jamie Heilman.
      "arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert
      in pneigh_lookup()
      
      Removing RTNL requirement from arp_ioctl() was a mistake, just revert
      that part.
      Reported-by: NJamie Heilman <jamie@audible.transient.net>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c506653d
  7. 20 1月, 2011 2 次提交
  8. 19 1月, 2011 1 次提交
  9. 14 1月, 2011 1 次提交
    • E
      net: remove dev_txq_stats_fold() · 1ac9ad13
      Eric Dumazet 提交于
      After recent changes, (percpu stats on vlan/tunnels...), we dont need
      anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
      
      Only remaining users are ixgbe, sch_teql, gianfar & macvlan :
      
      1) ixgbe can be converted to use existing tx_ring counters.
      
      2) macvlan incremented txq->tx_dropped, it can use the
      dev->stats.tx_dropped counter.
      
      3) sch_teql : almost revert ab35cd4b (Use net_device internal stats)
          Now we have ndo_get_stats64(), use it, even for "unsigned long"
      fields (No need to bring back a struct net_device_stats)
      
      4) gianfar adds a stats structure per tx queue to hold
      tx_bytes/tx_packets
      
      This removes a lockdep warning (and possible lockup) in rndis gadget,
      calling dev_get_stats() from hard IRQ context.
      
      Ref: http://www.spinics.net/lists/netdev/msg149202.htmlReported-by: NNeil Jones <neiljay@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
      CC: Michal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ac9ad13
  10. 13 1月, 2011 1 次提交
  11. 11 1月, 2011 2 次提交
  12. 10 1月, 2011 8 次提交
  13. 07 1月, 2011 1 次提交
  14. 24 12月, 2010 1 次提交
    • D
      Revert "ipv4: Allow configuring subnets as local addresses" · e0584649
      David S. Miller 提交于
      This reverts commit 4465b469.
      
      Conflicts:
      
      	net/ipv4/fib_frontend.c
      
      As reported by Ben Greear, this causes regressions:
      
      > Change 4465b469 caused rules
      > to stop matching the input device properly because the
      > FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find().
      >
      > This breaks rules such as:
      >
      > ip rule add pref 512 lookup local
      > ip rule del pref 0 lookup local
      > ip link set eth2 up
      > ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2
      > ip rule add to 172.16.0.102 iif eth2 lookup local pref 10
      > ip rule add iif eth2 lookup 10001 pref 20
      > ip route add 172.16.0.0/24 dev eth2 table 10001
      > ip route add unreachable 0/0 table 10001
      >
      > If you had a second interface 'eth0' that was on a different
      > subnet, pinging a system on that interface would fail:
      >
      >   [root@ct503-60 ~]# ping 192.168.100.1
      >   connect: Invalid argument
      Reported-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0584649
  15. 22 12月, 2010 2 次提交
    • E
      filter: optimize accesses to ancillary data · 12b16dad
      Eric Dumazet 提交于
      We can translate pseudo load instructions at filter check time to
      dedicated instructions to speed up filtering and avoid one switch().
      libpcap currently uses SKF_AD_PROTOCOL, but custom filters probably use
      other ancillary accesses.
      
      Note : I made the assertion that ancillary data was always accessed with
      BPF_LD|BPF_?|BPF_ABS instructions, not with BPF_LD|BPF_?|BPF_IND ones
      (offset given by K constant, not by K + X register)
      
      On x86_64, this saves a few bytes of text :
      
      # size net/core/filter.o.*
         text	   data	    bss	    dec	    hex	filename
         4864	      0	      0	   4864	   1300	net/core/filter.o.new
         4944	      0	      0	   4944	   1350	net/core/filter.o.old
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b16dad
    • E
      net: timestamp cloned packet in dev_queue_xmit_nit · 70978182
      Eric Dumazet 提交于
      Le vendredi 17 décembre 2010 à 10:26 +0100, Eric Dumazet a écrit :
      
      >
      > I think we can add this after latest Changli patch :
      >
      > He does one skb_clone() before calling the sniffers.
      > We could set timestamp on this clone, instead of original skb.
      >
      > Problem solved.
      >
      
      [PATCH net-next-2.6] net: timestamp cloned packet in dev_queue_xmit_nit
      
      Now we do one clone of skb if at least one sniffer might take packet,
      we also can do the skb timestamping on the clone and let original packet
      unchanged.
      
      This is a generalization of commit 8caf1539 (net: sch_netem: Fix an
      inconsistency in ingress netem timestamps.)
      
      This way, we can have a good idea when packets are delivered to our
      stack (tcpdump -i ifb0), while a tcpdump on original device gives
      timestamps right before ingressing.
      
      This also speedup our stack, avoiding taking timestamps if not needed.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Changli Gao <xiaosuo@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jarek Poplawski <jarkao2@gmail.com>
      Acked-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70978182
  16. 21 12月, 2010 1 次提交
  17. 20 12月, 2010 2 次提交
  18. 17 12月, 2010 5 次提交
  19. 15 12月, 2010 1 次提交
    • T
      workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync() · afe2c511
      Tejun Heo 提交于
      cancel_rearming_delayed_work[queue]() has been superceded by
      cancel_delayed_work_sync() quite some time ago.  Convert all the
      in-kernel users.  The conversions are completely equivalent and
      trivial.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: netdev@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: xfs-masters@oss.sgi.com
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: netfilter-devel@vger.kernel.org
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      afe2c511
  20. 11 12月, 2010 2 次提交