1. 20 2月, 2009 2 次提交
  2. 18 2月, 2009 1 次提交
  3. 16 2月, 2009 10 次提交
    • V
      sctp: Inherit all socket options from parent correctly. · 914e1c8b
      Vlad Yasevich 提交于
      During peeloff/accept() sctp needs to save the parent socket state
      into the new socket so that any options set on the parent are
      inherited by the child socket.  This was found when the
      parent/listener socket issues SO_BINDTODEVICE, but the
      data was misrouted after a route cache flush.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      914e1c8b
    • V
      sctp: Fix the RTO-doubling on idle-link heartbeats · faee47cd
      Vlad Yasevich 提交于
      SCTP incorrectly doubles rto ever time a Hearbeat chunk
      is generated.   However RFC 4960 states:
      
         On an idle destination address that is allowed to heartbeat, it is
         recommended that a HEARTBEAT chunk is sent once per RTO of that
         destination address plus the protocol parameter 'HB.interval', with
         jittering of +/- 50% of the RTO value, and exponential backoff of the
         RTO if the previous HEARTBEAT is unanswered.
      
      Essentially, of if the heartbean is unacknowledged, do we double the RTO.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faee47cd
    • V
      sctp: Clean up sctp checksumming code · 4458f04c
      Vlad Yasevich 提交于
      The sctp crc32c checksum is always generated in little endian.
      So, we clean up the code to treat it as little endian and remove
      all the __force casts.
      
      Suggested by Herbert Xu.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4458f04c
    • L
      sctp: Allow to disable SCTP checksums via module parameter · 06e86806
      Lucas Nussbaum 提交于
      This is a new version of my patch, now using a module parameter instead
      of a sysctl, so that the option is harder to find. Please note that,
      once the module is loaded, it is still possible to change the value of
      the parameter in /sys/module/sctp/parameters/, which is useful if you
      want to do performance comparisons without rebooting.
      
      Computation of SCTP checksums significantly affects the performance of
      SCTP. For example, using two dual-Opteron 246 connected using a Gbe
      network, it was not possible to achieve more than ~730 Mbps, compared to
      941 Mbps after disabling SCTP checksums.
      Unfortunately, SCTP checksum offloading in NICs is not commonly
      available (yet).
      
      By default, checksums are still enabled, of course.
      Signed-off-by: NLucas Nussbaum <lucas.nussbaum@ens-lyon.fr>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06e86806
    • P
      ip: support for TX timestamps on UDP and RAW sockets · 51f31cab
      Patrick Ohly 提交于
      Instructions for time stamping outgoing packets are take from the
      socket layer and later copied into the new skb.
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51f31cab
    • P
      net: socket infrastructure for SO_TIMESTAMPING · 20d49473
      Patrick Ohly 提交于
      The overlap with the old SO_TIMESTAMP[NS] options is handled so
      that time stamping in software (net_enable_timestamp()) is
      enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
      is set.  It's disabled if all of these are off.
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20d49473
    • P
      net: infrastructure for hardware time stamping · ac45f602
      Patrick Ohly 提交于
      The additional per-packet information (16 bytes for time stamps, 1
      byte for flags) is stored for all packets in the skb_shared_info
      struct. This implementation detail is hidden from users of that
      information via skb_* accessor functions. A separate struct resp.
      union is used for the additional information so that it can be
      stored/copied easily outside of skb_shared_info.
      
      Compared to previous implementations (reusing the tstamp field
      depending on the context, optional additional structures) this
      is the simplest solution. It does not extend sk_buff itself.
      
      TX time stamping is implemented in software if the device driver
      doesn't support hardware time stamping.
      
      The new semantic for hardware/software time stamping around
      ndo_start_xmit() is based on two assumptions about existing
      network device drivers which don't support hardware time
      stamping and know nothing about it:
       - they leave the new skb_shared_tx unmodified
       - the keep the connection to the originating socket in skb->sk
         alive, i.e., don't call skb_orphan()
      
      Given that skb_shared_tx is new, the first assumption is safe.
      The second is only true for some drivers. As a result, software
      TX time stamping currently works with the bnx2 driver, but not
      with the unmodified igb driver (the two drivers this patch series
      was tested with).
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac45f602
    • P
      net: new user space API for time stamping of incoming and outgoing packets · cb9eff09
      Patrick Ohly 提交于
      User space can request hardware and/or software time stamping.
      Reporting of the result(s) via a new control message is enabled
      separately for each field in the message because some of the
      fields may require additional computation and thus cause overhead.
      User space can tell the different kinds of time stamps apart
      and choose what suits its needs.
      
      When a TX timestamp operation is requested, the TX skb will be cloned
      and the clone will be time stamped (in hardware or software) and added
      to the socket error queue of the skb, if the skb has a socket
      associated with it.
      
      The actual TX timestamp will reach userspace as a RX timestamp on the
      cloned packet. If timestamping is requested and no timestamping is
      done in the device driver (potentially this may use hardware
      timestamping), it will be done in software after the device's
      start_hard_xmit routine.
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9eff09
    • P
      timecompare: generic infrastructure to map between two time bases · a75244c3
      Patrick Ohly 提交于
      Mapping from a struct timecounter to a time returned by functions like
      ktime_get_real() is implemented. This is sufficient to use this code
      in a network device driver which wants to support hardware time
      stamping and transformation of hardware time stamps to system time.
      
      The interface could have been made more versatile by not depending on
      a time counter, but this wasn't done to avoid writing glue code
      elsewhere.
      
      The method implemented here is the one used and analyzed under the name
      "assisted PTP" in the LCI PTP paper:
      http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdfAcked-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a75244c3
    • P
      clocksource: allow usage independent of timekeeping.c · a038a353
      Patrick Ohly 提交于
      So far struct clocksource acted as the interface between time/timekeeping.c
      and hardware. This patch generalizes the concept so that a similar
      interface can also be used in other contexts. For that it introduces
      new structures and related functions *without* touching the existing
      struct clocksource.
      
      The reasons for adding these new structures to clocksource.[ch] are
      * the APIs are clearly related
      * struct clocksource could be cleaned up to use the new structs
      * avoids proliferation of files with similar names (timesource.h?
        timecounter.h?)
      
      As outlined in the discussion with John Stultz, this patch adds
      * struct cyclecounter: stateless API to hardware which counts clock cycles
      * struct timecounter: stateful utility code built on a cyclecounter which
        provides a nanosecond counter
      * only the function to read the nanosecond counter; deltas are used internally
        and not exposed to users of timecounter
      
      The code does no locking of the shared state. It must be called at least
      as often as the cycle counter wraps around to detect these wrap arounds.
      Both is the responsibility of the timecounter user.
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a038a353
  4. 15 2月, 2009 2 次提交
  5. 14 2月, 2009 5 次提交
  6. 13 2月, 2009 1 次提交
    • A
      net: don't use in_atomic() in gfp_any() · 99709372
      Andrew Morton 提交于
      The problem is that in_atomic() will return false inside spinlocks if
      CONFIG_PREEMPT=n.  This will lead to deadlockable GFP_KERNEL allocations
      from spinlocked regions.
      
      Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
      will instead use GFP_ATOMIC from this callsite.  Hence we won't get the
      might_sleep() debugging warnings which would have informed us of the buggy
      callsites.
      
      Solve both these problems by switching to in_interrupt().  Now, if someone
      runs a gfp_any() allocation from inside spinlock we will get the warning
      if CONFIG_PREEMPT=y.
      
      I reviewed all callsites and most of them were too complex for my little
      brain and none of them documented their interface requirements.  I have no
      idea what this patch will do.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99709372
  7. 12 2月, 2009 2 次提交
    • H
      syscall define: fix uml compile bug · 6c597963
      Heiko Carstens 提交于
      With the new system call defines we get this on uml:
      
      arch/um/sys-i386/built-in.o: In function `sys_call_table':
      (.rodata+0x308): undefined reference to `sys_sigprocmask'
      
      Reason for this is that uml passes the preprocessor option
      -Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel.
      This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to
      SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system
      call named sys_kernel_sigprocmask.  However sys_sigprocmask is missing
      because of this.
      
      To avoid macro expansion for the system call name just concatenate the
      name at first define instead of carrying it through severel levels.
      This was pointed out by Al Viro.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NWANG Cong <wangcong@zeuux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c597963
    • L
      cgroups: fix lockdep subclasses overflow · cfebe563
      Li Zefan 提交于
      I enabled all cgroup subsystems when compiling kernel, and then:
       # mount -t cgroup -o net_cls xxx /mnt
       # mkdir /mnt/0
      
      This showed up immediately:
       BUG: MAX_LOCKDEP_SUBCLASSES too low!
       turning off the locking correctness validator.
      
      It's caused by the cgroup hierarchy lock:
      	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
      		struct cgroup_subsys *ss = subsys[i];
      		if (ss->root == root)
      			mutex_lock_nested(&ss->hierarchy_mutex, i);
      	}
      
      Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
      MAX_LOCKDEP_SUBCLASSES is 8.
      
      This patch uses different lockdep keys for different subsystems.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfebe563
  8. 11 2月, 2009 5 次提交
  9. 10 2月, 2009 7 次提交
  10. 09 2月, 2009 4 次提交
  11. 08 2月, 2009 1 次提交