1. 18 10月, 2011 1 次提交
  2. 17 10月, 2011 1 次提交
    • G
      if_link: Add additional parameter to IFLA_VF_INFO for spoof checking · 5f8444a3
      Greg Rose 提交于
      Add configuration setting for drivers to turn spoof checking on or off
      for discrete VFs.
      
      v2 - Fix indentation problem, wrap the ifla_vf_info structure in
           #ifdef __KERNEL__ to prevent user space from accessing and
           change function paramater for the spoof check setting netdev
           op from u8 to bool.
      v3 - Preset spoof check setting to -1 so that user space tools such
           as ip can detect that the driver didn't report a spoofcheck
           setting.  Prevents incorrect display of spoof check settings
           for drivers that don't report it.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5f8444a3
  3. 14 10月, 2011 1 次提交
    • E
      net: more accurate skb truesize · 87fb4b7b
      Eric Dumazet 提交于
      skb truesize currently accounts for sk_buff struct and part of skb head.
      kmalloc() roundings are also ignored.
      
      Considering that skb_shared_info is larger than sk_buff, its time to
      take it into account for better memory accounting.
      
      This patch introduces SKB_TRUESIZE(X) macro to centralize various
      assumptions into a single place.
      
      At skb alloc phase, we put skb_shared_info struct at the exact end of
      skb head, to allow a better use of memory (lowering number of
      reallocations), since kmalloc() gives us power-of-two memory blocks.
      
      Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
      aligned to cache lines, as before.
      
      Note: This patch might trigger performance regressions because of
      misconfigured protocol stacks, hitting per socket or global memory
      limits that were previously not reached. But its a necessary step for a
      more accurate memory accounting.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <ak@linux.intel.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87fb4b7b
  4. 08 10月, 2011 1 次提交
  5. 04 10月, 2011 1 次提交
  6. 29 9月, 2011 2 次提交
    • C
      net: rps: fix the support for PPPOE · 5dd17e08
      Changli Gao 提交于
      The upper protocol numbers of PPPOE are different, and should be treated
      specially.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dd17e08
    • E
      af_unix: dont send SCM_CREDENTIALS by default · 16e57262
      Eric Dumazet 提交于
      Since commit 7361c36c (af_unix: Allow credentials to work across
      user and pid namespaces) af_unix performance dropped a lot.
      
      This is because we now take a reference on pid and cred in each write(),
      and release them in read(), usually done from another process,
      eventually from another cpu. This triggers false sharing.
      
      # Events: 154K cycles
      #
      # Overhead  Command       Shared Object        Symbol
      # ........  .......  ..................  .........................
      #
          10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
           8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
           7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
           6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
           4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
           4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
           4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
           2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
           2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
           1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
           1.51%  hackbench  [kernel.kallsyms]   [k]
      __mutex_lock_interruptible_slowpath
           1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb
      
      This patch includes SCM_CREDENTIALS information in a af_unix message/skb
      only if requested by the sender, [man 7 unix for details how to include
      ancillary data using sendmsg() system call]
      
      Note: This might break buggy applications that expected SCM_CREDENTIAL
      from an unaware write() system call, and receiver not using SO_PASSCRED
      socket option.
      
      If SOCK_PASSCRED is set on source or destination socket, we still
      include credentials for mere write() syscalls.
      
      Performance boost in hackbench : more than 50% gain on a 16 thread
      machine (2 quad-core cpus, 2 threads per core)
      
      hackbench 20 thread 2000
      
      4.228 sec instead of 9.102 sec
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e57262
  7. 22 9月, 2011 1 次提交
  8. 17 9月, 2011 1 次提交
  9. 16 9月, 2011 4 次提交
    • J
      net: consolidate and fix ethtool_ops->get_settings calling · 4bc71cb9
      Jiri Pirko 提交于
      This patch does several things:
      - introduces __ethtool_get_settings which is called from ethtool code and
        from drivers as well. Put ASSERT_RTNL there.
      - dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
      - changes calling in drivers so rtnl locking is respected. In
        iboe_get_rate was previously ->get_settings() called unlocked. This
        fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
        problem. Also fixed by calling __dev_get_by_index() instead of
        dev_get_by_index() and holding rtnl_lock for both calls.
      - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
        so bnx2fc_if_create() and fcoe_if_create() are called locked as they
        are from other places.
      - use __ethtool_get_settings() in bonding code
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      
      v2->v3:
      	-removed dev_ethtool_get_settings()
      	-added ASSERT_RTNL into __ethtool_get_settings()
      	-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
      	 around it and __ethtool_get_settings() call
      v1->v2:
              add missing export_symbol
      Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bc71cb9
    • E
      net: linkwatch: allow vlans to get carrier changes faster · c37e0c99
      Eric Dumazet 提交于
      There is a time-lag of IFF_RUNNING flag consistency between vlan and
      real devices when the real devices are in problem such as link or cable
      broken.
      
      This leads to a degradation of Availability such as a delay of failover
      in HA systems using vlan since the detection of the problem at real
      device is delayed.
      
      We can avoid the linkwatch delay (~1 sec) for devices linked to another
      ones, since delay is already done for the realdev.
      
      Based on a previous patch from Mitsuo Hayasaka
      Reported-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Tested-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c37e0c99
    • M
      net: copy userspace buffers on device forwarding · 48c83012
      Michael S. Tsirkin 提交于
      dev_forward_skb loops an skb back into host networking
      stack which might hang on the memory indefinitely.
      In particular, this can happen in macvtap in bridged mode.
      Copy the userspace fragments to avoid blocking the
      sender in that case.
      
      As this patch makes skb_copy_ubufs extern now,
      I also added some documentation and made it clear
      the SKBTX_DEV_ZEROCOPY flag automatically instead
      of doing it in all callers. This can be made into a separate
      patch if people feel it's worth it.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48c83012
    • D
      net: Make flow cache namespace-aware · 0542b69e
      dpward 提交于
      flow_cache_lookup will return a cached object (or null pointer) that the
      resolver (i.e. xfrm_policy_lookup) previously found for another namespace
      using the same key/family/dir.  Instead, make the namespace part of what
      identifies entries in the cache.
      
      As before, flow_entry_valid will return 0 for entries where the namespace
      has been deleted, and they will be removed from the cache the next time
      flow_cache_gc_task is run.
      Reported-by: NAndrew Dickinson <whydna@whydna.net>
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0542b69e
  10. 27 8月, 2011 1 次提交
  11. 25 8月, 2011 3 次提交
    • E
      arp: fix rcu lockdep splat in arp_process() · 20e6074e
      Eric Dumazet 提交于
      Dave Jones reported a lockdep splat triggered by an arp_process() call
      from parp_redo().
      
      Commit faa9dcf7 (arp: RCU changes) is the origin of the bug, since
      it assumed arp_process() was called under rcu_read_lock(), which is not
      true in this particular path.
      
      Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in
      neigh_proxy_process() to take care of IPv6 side too.
      
       ===================================================
       [ INFO: suspicious rcu_dereference_check() usage. ]
       ---------------------------------------------------
       include/linux/inetdevice.h:209 invoked rcu_dereference_check() without
      protection!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 0
       4 locks held by setfiles/2123:
        #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff8114cbc4>]
      walk_component+0x1ef/0x3e8
        #1:  (&isec->lock){+.+.+.}, at: [<ffffffff81204bca>]
      inode_doinit_with_dentry+0x3f/0x41f
        #2:  (&tbl->proxy_timer){+.-...}, at: [<ffffffff8106a803>]
      run_timer_softirq+0x157/0x372
        #3:  (class){+.-...}, at: [<ffffffff8141f256>] neigh_proxy_process
      +0x36/0x103
      
       stack backtrace:
       Pid: 2123, comm: setfiles Tainted: G        W
      3.1.0-0.rc2.git7.2.fc16.x86_64 #1
       Call Trace:
        <IRQ>  [<ffffffff8108ca23>] lockdep_rcu_dereference+0xa7/0xaf
        [<ffffffff8146a0b7>] __in_dev_get_rcu+0x55/0x5d
        [<ffffffff8146a751>] arp_process+0x25/0x4d7
        [<ffffffff8146ac11>] parp_redo+0xe/0x10
        [<ffffffff8141f2ba>] neigh_proxy_process+0x9a/0x103
        [<ffffffff8106a8c4>] run_timer_softirq+0x218/0x372
        [<ffffffff8106a803>] ? run_timer_softirq+0x157/0x372
        [<ffffffff8141f220>] ? neigh_stat_seq_open+0x41/0x41
        [<ffffffff8108f2f0>] ? mark_held_locks+0x6d/0x95
        [<ffffffff81062bb6>] __do_softirq+0x112/0x25a
        [<ffffffff8150d27c>] call_softirq+0x1c/0x30
        [<ffffffff81010bf5>] do_softirq+0x4b/0xa2
        [<ffffffff81062f65>] irq_exit+0x5d/0xcf
        [<ffffffff8150dc11>] smp_apic_timer_interrupt+0x7c/0x8a
        [<ffffffff8150baf3>] apic_timer_interrupt+0x73/0x80
        <EOI>  [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158
        [<ffffffff814fc285>] ? __slab_free+0x30/0x24c
        [<ffffffff814fc283>] ? __slab_free+0x2e/0x24c
        [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
        [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
        [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
        [<ffffffff81130cb0>] kfree+0x108/0x131
        [<ffffffff81204e74>] inode_doinit_with_dentry+0x2e9/0x41f
        [<ffffffff81204fc6>] selinux_d_instantiate+0x1c/0x1e
        [<ffffffff81200f4f>] security_d_instantiate+0x21/0x23
        [<ffffffff81154625>] d_instantiate+0x5c/0x61
        [<ffffffff811563ca>] d_splice_alias+0xbc/0xd2
        [<ffffffff811b17ff>] ext4_lookup+0xba/0xeb
        [<ffffffff8114bf1e>] d_alloc_and_lookup+0x45/0x6b
        [<ffffffff8114cbea>] walk_component+0x215/0x3e8
        [<ffffffff8114cdf8>] lookup_last+0x3b/0x3d
        [<ffffffff8114daf3>] path_lookupat+0x82/0x2af
        [<ffffffff8110fc53>] ? might_fault+0xa5/0xac
        [<ffffffff8110fc0a>] ? might_fault+0x5c/0xac
        [<ffffffff8114c564>] ? getname_flags+0x31/0x1ca
        [<ffffffff8114dd48>] do_path_lookup+0x28/0x97
        [<ffffffff8114df2c>] user_path_at+0x59/0x96
        [<ffffffff811467ad>] ? cp_new_stat+0xf7/0x10d
        [<ffffffff811469a6>] vfs_fstatat+0x44/0x6e
        [<ffffffff811469ee>] vfs_lstat+0x1e/0x20
        [<ffffffff81146b3d>] sys_newlstat+0x1a/0x33
        [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158
        [<ffffffff812535fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff8150af82>] system_call_fastpath+0x16/0x1b
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20e6074e
    • I
      net: convert core to skb paged frag APIs · ea2ab693
      Ian Campbell 提交于
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea2ab693
    • E
      rps: support IPIP encapsulation · ec5efe79
      Eric Dumazet 提交于
      Skip IPIP header to get proper layer-4 information.
      
      Like GRE tunnels, this only works if rxhash is not already provided by
      the device itself (ethtool -K ethX rxhash off), to allow kernel compute
      a software rxhash.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec5efe79
  12. 23 8月, 2011 1 次提交
  13. 21 8月, 2011 1 次提交
  14. 19 8月, 2011 2 次提交
  15. 18 8月, 2011 6 次提交
  16. 12 8月, 2011 2 次提交
    • E
      net: cleanup some rcu_dereference_raw · 33d480ce
      Eric Dumazet 提交于
      RCU api had been completed and rcu_access_pointer() or
      rcu_dereference_protected() are better than generic
      rcu_dereference_raw()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33d480ce
    • E
      neigh: reduce arp latency · cd28ca0a
      Eric Dumazet 提交于
      Remove the artificial HZ latency on arp resolution.
      
      Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets
      send the ARP message immediately.
      
      Before patch :
      
      # arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
      64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms
      64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms
      64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms
      
      After patch :
      
      $ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
      64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms
      64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms
      64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd28ca0a
  17. 11 8月, 2011 2 次提交
  18. 10 8月, 2011 1 次提交
  19. 07 8月, 2011 1 次提交
    • D
      net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea
      David S. Miller 提交于
      Computers have become a lot faster since we compromised on the
      partial MD4 hash which we use currently for performance reasons.
      
      MD5 is a much safer choice, and is inline with both RFC1948 and
      other ISS generators (OpenBSD, Solaris, etc.)
      
      Furthermore, only having 24-bits of the sequence number be truly
      unpredictable is a very serious limitation.  So the periodic
      regeneration and 8-bit counter have been removed.  We compute and
      use a full 32-bit sequence number.
      
      For ipv6, DCCP was found to use a 32-bit truncated initial sequence
      number (it needs 43-bits) and that is fixed here as well.
      Reported-by: NDan Kaminsky <dan@doxpara.com>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5714ea
  20. 02 8月, 2011 2 次提交
  21. 28 7月, 2011 1 次提交
    • N
      net: add IFF_SKB_TX_SHARED flag to priv_flags · d8873315
      Neil Horman 提交于
      Pktgen attempts to transmit shared skbs to net devices, which can't be used by
      some drivers as they keep state information in skbs.  This patch adds a flag
      marking drivers as being able to handle shared skbs in their tx path.  Drivers
      are defaulted to being unable to do so, but calling ether_setup enables this
      flag, as 90% of the drivers calling ether_setup touch real hardware and can
      handle shared skbs.  A subsequent patch will audit drivers to ensure that the
      flag is set properly
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NJiri Pirko <jpirko@redhat.com>
      CC: Robert Olsson <robert.olsson@its.uu.se>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Alexey Dobriyan <adobriyan@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8873315
  22. 27 7月, 2011 1 次提交
  23. 26 7月, 2011 1 次提交
  24. 23 7月, 2011 1 次提交
    • S
      net: allow netif_carrier to be called safely from IRQ · 1821f7cd
      stephen hemminger 提交于
      As reported by Ben Greer and Froncois Romieu. The code path in
      the netif_carrier code leads it to try and disable
      a late workqueue to reenable it immediately
      netif_carrier_on
      -> linkwatch_fire_event
         -> linkwatch_schedule_work
            -> cancel_delayed_work
               -> del_timer_sync
      
      If __cancel_delayed_work is used instead then there is no
      problem of waiting for running linkwatch_event.
      
      There is a race between linkwatch_event running re-scheduling
      but it is harmless to schedule an extra scan of the linkwatch queue.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1821f7cd
  25. 22 7月, 2011 1 次提交