1. 18 5月, 2010 2 次提交
    • W
      can: sja1000 platform data fixes · 56e6943b
      Wolfgang Grandegger 提交于
      The member "clock" of struct "sja1000_platform_data" is documented as
      "CAN bus oscillator frequency in Hz" but it's actually used as the CAN
      clock frequency, which is half of it. To avoid further confusion, this
      patch fixes it by renaming the member to "osc_freq". That way, also
      non mainline users will notice the change. The platform code for the
      relevant boards is updated accordingly. Furthermore, pre-defined
      values are now used for the members "ocr" and "cdr".
      Signed-off-by: NWolfgang Grandegger <wg@grandegger.com>
      Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56e6943b
    • E
      net: add a noref bit on skb dst · 7fee226a
      Eric Dumazet 提交于
      Use low order bit of skb->_skb_dst to tell dst is not refcounted.
      
      Change _skb_dst to _skb_refdst to make sure all uses are catched.
      
      skb_dst() returns the dst, regardless of noref bit set or not, but
      with a lockdep check to make sure a noref dst is not given if current
      user is not rcu protected.
      
      New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
      (with lockdep check)
      
      skb_dst_drop() drops a reference only if skb dst was refcounted.
      
      skb_dst_force() helper is used to force a refcount on dst, when skb
      is queued and not anymore RCU protected.
      
      Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
      !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
      sock_queue_rcv_skb(), in __nf_queue().
      
      Use skb_dst_force() in dev_requeue_skb().
      
      Note: dst_use_noref() still dirties dst, we might transform it
      later to do one dirtying per jiffies.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fee226a
  2. 16 5月, 2010 4 次提交
    • C
      rtnetlink: make SR-IOV VF interface symmetric · c02db8c6
      Chris Wright 提交于
      Now we have a set of nested attributes:
      
        IFLA_VFINFO_LIST (NESTED)
          IFLA_VF_INFO (NESTED)
            IFLA_VF_MAC
            IFLA_VF_VLAN
            IFLA_VF_TX_RATE
      
      This allows a single set to operate on multiple attributes if desired.
      Among other things, it means a dump can be replayed to set state.
      
      The current interface has yet to be released, so this seems like
      something to consider for 2.6.34.
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c02db8c6
    • E
      net: Consistent skb timestamping · 3b098e2d
      Eric Dumazet 提交于
      With RPS inclusion, skb timestamping is not consistent in RX path.
      
      If netif_receive_skb() is used, its deferred after RPS dispatch.
      
      If netif_rx() is used, its done before RPS dispatch.
      
      This can give strange tcpdump timestamps results.
      
      I think timestamping should be done as soon as possible in the receive
      path, to get meaningful values (ie timestamps taken at the time packet
      was delivered by NIC driver to our stack), even if NAPI already can
      defer timestamping a bit (RPS can help to reduce the gap)
      
      Tom Herbert prefer to sample timestamps after RPS dispatch. In case
      sampling is expensive (HPET/acpi_pm on x86), this makes sense.
      
      Let admins switch from one mode to another, using a new
      sysctl, /proc/sys/net/core/netdev_tstamp_prequeue
      
      Its default value (1), means timestamps are taken as soon as possible,
      before backlog queueing, giving accurate timestamps.
      
      Setting a 0 value permits to sample timestamps when processing backlog,
      after RPS dispatch, to lower the load of the pre-RPS cpu.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b098e2d
    • J
      net: adjust handle_macvlan to pass port struct to hook · a14462f1
      Jiri Pirko 提交于
      Now there's null check here and also again in the hook. Looking at bridge bits
      which are simmilar, port structure is rcu_dereferenced right away in
      handle_bridge and passed to hook. Looks nicer.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a14462f1
    • O
      sysctl: add proc_do_large_bitmap · 9f977fb7
      Octavian Purdila 提交于
      The new function can be used to read/write large bitmaps via /proc. A
      comma separated range format is used for compact output and input
      (e.g. 1,3-4,10-10).
      
      Writing into the file will first reset the bitmap then update it
      based on the given input.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f977fb7
  3. 13 5月, 2010 2 次提交
  4. 12 5月, 2010 5 次提交
    • R
      revert "procfs: provide stack information for threads" and its fixup commits · 34441427
      Robin Holt 提交于
      Originally, commit d899bf7b ("procfs: provide stack information for
      threads") attempted to introduce a new feature for showing where the
      threadstack was located and how many pages are being utilized by the
      stack.
      
      Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
      applied to fix the NO_MMU case.
      
      Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
      64-bit") was applied to fix a bug in ia32 executables being loaded.
      
      Commit 9ebd4eba ("procfs: fix /proc/<pid>/stat stack pointer for kernel
      threads") was applied to fix a bug which had kernel threads printing a
      userland stack address.
      
      Commit 1306d603 ('proc: partially revert "procfs: provide stack
      information for threads"') was then applied to revert the stack pages
      being used to solve a significant performance regression.
      
      This patch nearly undoes the effect of all these patches.
      
      The reason for reverting these is it provides an unusable value in
      field 28.  For x86_64, a fork will result in the task->stack_start
      value being updated to the current user top of stack and not the stack
      start address.  This unpredictability of the stack_start value makes
      it worthless.  That includes the intended use of showing how much stack
      space a thread has.
      
      Other architectures will get different values.  As an example, ia64
      gets 0.  The do_fork() and copy_process() functions appear to treat the
      stack_start and stack_size parameters as architecture specific.
      
      I only partially reverted c44972f1 ("procfs: disable per-task stack usage
      on NOMMU") .  If I had completely reverted it, I would have had to change
      mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
      configured.  Since I could not test the builds without significant effort,
      I decided to not change mm/Makefile.
      
      I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
      information for threads on 64-bit") .  I left the KSTK_ESP() change in
      place as that seemed worthwhile.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34441427
    • J
      netfilter: xtables: change hotdrop pointer to direct modification · b4ba2611
      Jan Engelhardt 提交于
      Since xt_action_param is writable, let's use it. The pointer to
      'bool hotdrop' always worried (8 bytes (64-bit) to write 1 byte!).
      Surprisingly results in a reduction in size:
      
         text    data     bss filename
      5457066  692730  357892 vmlinux.o-prev
      5456554  692730  357892 vmlinux.o
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      b4ba2611
    • J
      netfilter: xtables: deconstify struct xt_action_param for matches · 62fc8051
      Jan Engelhardt 提交于
      In future, layer-3 matches will be an xt module of their own, and
      need to set the fragoff and thoff fields. Adding more pointers would
      needlessy increase memory requirements (esp. so for 64-bit, where
      pointers are wider).
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      62fc8051
    • J
    • J
      netfilter: xtables: combine struct xt_match_param and xt_target_param · de74c169
      Jan Engelhardt 提交于
      The structures carried - besides match/target - almost the same data.
      It is possible to combine them, as extensions are evaluated serially,
      and so, the callers end up a little smaller.
      
        text  data  bss  filename
      -15318   740  104  net/ipv4/netfilter/ip_tables.o
      +15286   740  104  net/ipv4/netfilter/ip_tables.o
      -15333   540  152  net/ipv6/netfilter/ip6_tables.o
      +15269   540  152  net/ipv6/netfilter/ip6_tables.o
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      de74c169
  5. 11 5月, 2010 4 次提交
    • P
      ipv6: ip6mr: support multiple tables · d1db275d
      Patrick McHardy 提交于
      This patch adds support for multiple independant multicast routing instances,
      named "tables".
      
      Userspace multicast routing daemons can bind to a specific table instance by
      issuing a setsockopt call using a new option MRT6_TABLE. The table number is
      stored in the raw socket data and affects all following ip6mr setsockopt(),
      getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
      is created with a default routing rule pointing to it. Newly created pim6reg
      devices have the table number appended ("pim6regX"), with the exception of
      devices created in the default table, which are named just "pim6reg" for
      compatibility reasons.
      
      Packets are directed to a specific table instance using routing rules,
      similar to how regular routing rules work. Currently iif, oif and mark
      are supported as keys, source and destination addresses could be supported
      additionally.
      
      Example usage:
      
      - bind pimd/xorp/... to a specific table:
      
      uint32_t table = 123;
      setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));
      
      - create routing rules directing packets to the new table:
      
      # ip -6 mrule add iif eth0 lookup 123
      # ip -6 mrule add oif eth0 lookup 123
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      d1db275d
    • P
      6bd52143
    • P
      f30a7784
    • P
      ipv6: ip6mr: remove net pointer from struct mfc6_cache · b5aa30b1
      Patrick McHardy 提交于
      Now that cache entries in unres_queue don't need to be distinguished by their
      network namespace pointer anymore, we can remove it from struct mfc6_cache
      add pass the namespace as function argument to the functions that need it.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      b5aa30b1
  6. 08 5月, 2010 1 次提交
    • J
      cfg80211/mac80211: better channel handling · f444de05
      Johannes Berg 提交于
      Currently (all tested with hwsim) you can do stupid
      things like setting up an AP on a certain channel,
      then adding another virtual interface and making
      that associate on another channel -- this will make
      the beaconing to move channel but obviously without
      the necessary IEs data update.
      
      In order to improve this situation, first make the
      configuration APIs (cfg80211 and nl80211) aware of
      multi-channel operation -- we'll eventually need
      that in the future anyway. There's one userland API
      change and one API addition. The API change is that
      now SET_WIPHY must be called with virtual interface
      index rather than only wiphy index in order to take
      effect for that interface -- luckily all current
      users (hostapd) do that. For monitor interfaces, the
      old setting is preserved, but monitors are always
      slaved to other devices anyway so no guarantees.
      
      The second userland API change is the introduction
      of a per virtual interface SET_CHANNEL command, that
      hostapd should use going forward to make it easier
      to understand what's going on (it can automatically
      detect a kernel with this command).
      
      Other than mac80211, no existing cfg80211 drivers
      are affected by this change because they only allow
      a single virtual interface.
      
      mac80211, however, now needs to be aware that the
      channel settings are per interface now, and needs
      to disallow (for now) real multi-channel operation,
      which is another important part of this patch.
      
      One of the immediate benefits is that you can now
      start hostapd to operate on a hardware that already
      has a connection on another virtual interface, as
      long as you specify the same channel.
      
      Note that two things are left unhandled (this is an
      improvement -- not a complete fix):
      
       * different HT/no-HT modes
      
         currently you could start an HT AP and then
         connect to a non-HT network on the same channel
         which would configure the hardware for no HT;
         that can be fixed fairly easily
      
       * CSA
      
         An AP we're connected to on a virtual interface
         might indicate switching channels, and in that
         case we would follow it, regardless of how many
         other interfaces are operating; this requires
         more effort to fix but is pretty rare after all
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      f444de05
  7. 07 5月, 2010 2 次提交
  8. 06 5月, 2010 2 次提交
    • D
      ffb27362
    • W
      netpoll: add generic support for bridge and bonding devices · 0e34e931
      WANG Cong 提交于
      This whole patchset is for adding netpoll support to bridge and bonding
      devices. I already tested it for bridge, bonding, bridge over bonding,
      and bonding over bridge. It looks fine now.
      
      To make bridge and bonding support netpoll, we need to adjust
      some netpoll generic code. This patch does the following things:
      
      1) introduce two new priv_flags for struct net_device:
         IFF_IN_NETPOLL which identifies we are processing a netpoll;
         IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
         at run-time;
      
      2) introduce one new method for netdev_ops:
         ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
           removed.
      
      3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
         export netpoll_send_skb() and netpoll_poll_dev() which will be used later;
      
      4) hide a pointer to struct netpoll in struct netpoll_info, ditto.
      
      5) introduce ->real_dev for struct netpoll.
      
      6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
         netconsole before releasing a slave, to avoid deadlocks.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e34e931
  9. 05 5月, 2010 2 次提交
    • E
      net: __alloc_skb() speedup · ec7d2f2c
      Eric Dumazet 提交于
      With following patch I can reach maximum rate of my pktgen+udpsink
      simulator :
      - 'old' machine : dual quad core E5450  @3.00GHz
      - 64 UDP rx flows (only differ by destination port)
      - RPS enabled, NIC interrupts serviced on cpu0
      - rps dispatched on 7 other cores. (~130.000 IPI per second)
      - SLAB allocator (faster than SLUB in this workload)
      - tg3 NIC
      - 1.080.000 pps without a single drop at NIC level.
      
      Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
      first sk_buff cache line, the second to prefetch the shinfo part.
      
      Also using one memset() to initialize all skb_shared_info fields instead
      of one by one to reduce number of instructions, using long word moves.
      
      All skb_shared_info fields before 'dataref' are cleared in 
      __alloc_skb().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec7d2f2c
    • L
      cgroup: Check task_lock in task_subsys_state() · 1ce7e4ff
      Li Zefan 提交于
      Expand task_subsys_state()'s rcu_dereference_check() to include the full
      locking rule as documented in Documentation/cgroups/cgroups.txt by adding
      a check for task->alloc_lock being held.
      
      This fixes an RCU false positive when resuming from suspend. The warning
      comes from freezer cgroup in cgroup_freezing_or_frozen().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1ce7e4ff
  10. 04 5月, 2010 2 次提交
  11. 03 5月, 2010 3 次提交
    • M
      tun: add ioctl to modify vnet header size · d9d52b51
      Michael S. Tsirkin 提交于
      virtio added mergeable buffers mode where 2 bytes of extra info is put
      after vnet header but before actual data (tun does not need this data).
      In hindsight, it would have been better to add the new info *before* the
      packet: as it is, users need a lot of tricky code to skip the extra 2
      bytes in the middle of the iovec, and in fact applications seem to get
      it wrong, and only work with specific iovec layout.  The fact we might
      need to split iovec also means we might in theory overflow iovec max
      size.
      
      This patch adds a simpler way for applications to handle this,
      and future proofs the interface against further extensions,
      by making the size of the virtio net header configurable
      from userspace. As a result, tun driver will simply
      skip the extra 2 bytes on both input and output.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      d9d52b51
    • D
    • C
      net: fix softnet_stat · dee42870
      Changli Gao 提交于
      Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
      without any protection. And enqueue_to_backlog should update the netdev_rx_stat
      of the target CPU.
      
      This patch renames softnet_data.total to softnet_data.processed: the number of
      packets processed in uppper levels(IP stacks).
      
      softnet_stat data is moved into softnet_data.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/linux/netdevice.h |   17 +++++++----------
       net/core/dev.c            |   26 ++++++++++++--------------
       net/sched/sch_generic.c   |    2 +-
       3 files changed, 20 insertions(+), 25 deletions(-)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dee42870
  12. 02 5月, 2010 2 次提交
    • D
      net: Inline skb_pull() in eth_type_trans(). · 47d29646
      David S. Miller 提交于
      In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot")
      we uninlined skb_pull.
      
      But in some critical paths it makes sense to inline this thing
      and it helps performance significantly.
      
      Create an skb_pull_inline() so that we can do this in a way that
      serves also as annotation.
      
      Based upon a patch by Eric Dumazet.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d29646
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482
  13. 01 5月, 2010 1 次提交
  14. 29 4月, 2010 1 次提交
  15. 28 4月, 2010 5 次提交
  16. 27 4月, 2010 2 次提交