1. 27 1月, 2012 3 次提交
    • W
      ipv6: Fix ip_gre lockless xmits. · f2b3ee9e
      Willem de Bruijn 提交于
      Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK.  Sit and
      ipip set this unconditionally in ops->setup, but gre enables it
      conditionally after parameter passing in ops->newlink. This is
      not called during tunnel setup as below, however, so GRE tunnels are
      still taking the lock.
      
      modprobe ip_gre
      ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
      ip link set test0 up
      ip addr add 10.6.0.1 dev test0
       # cat /sys/class/net/test0/features
       # $DIR/test_tunnel_xmit 10 10.5.2.1
      ip route add 10.5.2.0/24 dev test0
      ip tunnel del test0
      
      The newlink callback is only called in rtnl_netlink, and only if
      the device is new, as it calls register_netdevice internally. Gre
      tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
      which calls ipgre_tunnel_locate, which calls register_netdev.
      rtnl_newlink is called at 'ip link set', but skips ops->newlink
      and the device is up with locking still enabled. The equivalent
      ipip tunnel works fine, btw (just substitute 'method gre' for
      'method ipip').
      
      On kernels before /sys/class/net/*/features was removed [1],
      the first commented out line returns 0x6000 with method gre,
      which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
      it reports 0x7000. This test cannot be used on recent kernels where
      the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
      work for tunnel devices, because they lack dev->ethtool_ops).
      
      The second commented out line calls a simple transmission test [2]
      that sends on 24 cores at maximum rate. Results of a single run:
      
      ipip:			19,372,306
      gre before patch:	 4,839,753
      gre after patch:	19,133,873
      
      This patch replicates the condition check in ipgre_newlink to
      ipgre_tunnel_locate. It works for me, both with oseq on and off.
      This is the first time I looked at rtnetlink and iproute2 code,
      though, so someone more knowledgeable should probably check the
      patch. Thanks.
      
      The tail of both functions is now identical, by the way. To avoid
      code duplication, I'll be happy to rework this and merge the two.
      
      [1] http://patchwork.ozlabs.org/patch/104610/
      [2] http://kernel.googlecode.com/files/xmit_udp_parallel.cSigned-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2b3ee9e
    • W
      xen-netfront: correct MAX_TX_TARGET calculation. · 40206dd9
      Wei Liu 提交于
      Signed-off-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40206dd9
    • E
      netns: fix net_alloc_generic() · 073862ba
      Eric Dumazet 提交于
      When a new net namespace is created, we should attach to it a "struct
      net_generic" with enough slots (even empty), or we can hit the following
      BUG_ON() :
      
      [  200.752016] kernel BUG at include/net/netns/generic.h:40!
      ...
      [  200.752016]  [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
      [  200.752016]  [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
      [  200.752016]  [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
      [  200.752016]  [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
      [  200.752016]  [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
      [  200.752016]  [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
      [  200.752016]  [<ffffffff821c2b26>] register_netdevice+0x196/0x300
      [  200.752016]  [<ffffffff821c2ca9>] register_netdev+0x19/0x30
      [  200.752016]  [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
      [  200.752016]  [<ffffffff821b5e62>] ops_init+0x42/0x180
      [  200.752016]  [<ffffffff821b600b>] setup_net+0x6b/0x100
      [  200.752016]  [<ffffffff821b6466>] copy_net_ns+0x86/0x110
      [  200.752016]  [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190
      
      net_alloc_generic() should take into account the maximum index into the
      ptr array, as a subsystem might use net_generic() anytime.
      
      This also reduces number of reallocations in net_assign_generic()
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Tested-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      073862ba
  2. 26 1月, 2012 3 次提交
    • F
      tcp: bind() optimize port allocation · fddb7b57
      Flavio Leitner 提交于
      Port autoselection finds a port and then drop the lock,
      then right after that, gets the hash bucket again and lock it.
      
      Fix it to go direct.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fddb7b57
    • F
      tcp: bind() fix autoselection to share ports · 2b05ad33
      Flavio Leitner 提交于
      The current code checks for conflicts when the application
      requests a specific port.  If there is no conflict, then
      the request is granted.
      
      On the other hand, the port autoselection done by the kernel
      fails when all ports are bound even when there is a port
      with no conflict available.
      
      The fix changes port autoselection to check if there is a
      conflict and use it if not.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b05ad33
    • J
      l2tp: l2tp_ip - fix possible oops on packet receive · 68315801
      James Chapman 提交于
      When a packet is received on an L2TP IP socket (L2TPv3 IP link
      encapsulation), the l2tpip socket's backlog_rcv function calls
      xfrm4_policy_check(). This is not necessary, since it was called
      before the skb was added to the backlog. With CONFIG_NET_NS enabled,
      xfrm4_policy_check() will oops if skb->dev is null, so this trivial
      patch removes the call.
      
      This bug has always been present, but only when CONFIG_NET_NS is
      enabled does it cause problems. Most users are probably using UDP
      encapsulation for L2TP, hence the problem has only recently
      surfaced.
      
      EIP: 0060:[<c12bb62b>] EFLAGS: 00210246 CPU: 0
      EIP is at l2tp_ip_recvmsg+0xd4/0x2a7
      EAX: 00000001 EBX: d77b5180 ECX: 00000000 EDX: 00200246
      ESI: 00000000 EDI: d63cbd30 EBP: d63cbd18 ESP: d63cbcf4
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Call Trace:
       [<c1218568>] sock_common_recvmsg+0x31/0x46
       [<c1215c92>] __sock_recvmsg_nosec+0x45/0x4d
       [<c12163a1>] __sock_recvmsg+0x31/0x3b
       [<c1216828>] sock_recvmsg+0x96/0xab
       [<c10b2693>] ? might_fault+0x47/0x81
       [<c10b2693>] ? might_fault+0x47/0x81
       [<c1167fd0>] ? _copy_from_user+0x31/0x115
       [<c121e8c8>] ? copy_from_user+0x8/0xa
       [<c121ebd6>] ? verify_iovec+0x3e/0x78
       [<c1216604>] __sys_recvmsg+0x10a/0x1aa
       [<c1216792>] ? sock_recvmsg+0x0/0xab
       [<c105a99b>] ? __lock_acquire+0xbdf/0xbee
       [<c12d5a99>] ? do_page_fault+0x193/0x375
       [<c10d1200>] ? fcheck_files+0x9b/0xca
       [<c10d1259>] ? fget_light+0x2a/0x9c
       [<c1216bbb>] sys_recvmsg+0x2b/0x43
       [<c1218145>] sys_socketcall+0x16d/0x1a5
       [<c11679f0>] ? trace_hardirqs_on_thunk+0xc/0x10
       [<c100305f>] sysenter_do_call+0x12/0x38
      Code: c6 05 8c ea a8 c1 01 e8 0c d4 d9 ff 85 f6 74 07 3e ff 86 80 00 00 00 b9 17 b6 2b c1 ba 01 00 00 00 b8 78 ed 48 c1 e8 23 f6 d9 ff <ff> 76 0c 68 28 e3 30 c1 68 2d 44 41 c1 e8 89 57 01 00 83 c4 0c
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68315801
  3. 25 1月, 2012 14 次提交
  4. 24 1月, 2012 20 次提交