提交 · 4b32d0a4e9ec07808a5c406a416c6576c986b047 · OpenHarmony / kernel_linux

18 7月, 2007 5 次提交

D
[NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability · 12972621
由 Denis Cheng 提交于 7月 18, 2007
```
Signed-off-by: NDenis Cheng <crquan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
12972621

[NET]: merge dev_unicast_discard and dev_mc_discard into one · 26cc2522

由 Denis Cheng 提交于 7月 18, 2007

this two functions could share the dev->_xmit_lock acquired context.
Signed-off-by: NDenis Cheng <crquan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26cc2522

[NET]: move dev_mc_discard from dev_mcast.c to dev.c · 456ad75c

由 Denis Cheng 提交于 7月 18, 2007

Because this function is only called by unregister_netdevice,
this moving could make this non-global function static,
and also remove its declaration in netdevice.h;

Any further, function __dev_addr_discard is also just called by
dev_mc_discard and dev_unicast_discard, keeping this two functions
both in one c file could make __dev_addr_discard also static
and remove its declaration in netdevice.h;

Futhermore, the sequential call to dev_unicast_discard and then
dev_mc_discard in unregister_netdevice have a similar mechanism that:
(netif_tx_lock_bh / __dev_addr_discard / netif_tx_unlock_bh),
they should merged into one to eliminate duplicates in acquiring and
releasing the dev->_xmit_lock, this would be done in my following patch.
Signed-off-by: NDenis Cheng <crquan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

456ad75c

[NET]: gen_estimator deadlock fix · 0929c2dd

由 Ranko Zivojnovic 提交于 7月 16, 2007

-Fixes ABBA deadlock noted by Patrick McHardy <kaber@trash.net>:

> There is at least one ABBA deadlock, est_timer() does:
> read_lock(&est_lock)
> spin_lock(e->stats_lock) (which is dev->queue_lock)
>
> and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> calls htb_destroy_class, then gen_kill_estimator and this
> write_locks est_lock.

To fix the ABBA deadlock the rate estimators are now kept on an rcu list.

-The est_lock changes the use from protecting the list to protecting
the update to the 'bstat' pointer in order to avoid NULL dereferencing.

-The 'interval' member of the gen_estimator structure removed as it is
not needed.
Signed-off-by: NRanko Zivojnovic <ranko@spidernet.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0929c2dd

Freezer: make kernel threads nonfreezable by default · 83144186

由 Rafael J. Wysocki 提交于 7月 17, 2007

Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NNigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83144186

17 7月, 2007 2 次提交

Revert "[NET]: Fix races in net_rx_action vs netpoll." · 2e27afb3

由 Linus Torvalds 提交于 7月 16, 2007

This reverts commit 29578624.

Ingo Molnar reports complete breakage with his e1000 card (no
networking, card reports transmit timeouts), and bisected it down to
this commit.  Let's figure out what went wrong, but not keep breaking
machines until we do.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Olaf Kirch <olaf.kirch@oracle.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e27afb3

O_CLOEXEC for SCM_RIGHTS · 4a19542e

由 Ulrich Drepper 提交于 7月 15, 2007

Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.

The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag.  I think this bit
is not used otherwise but the networking people will know better.

This new flag is not recognized by recvfrom and recv.  These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.

The patch must be applied on the patch which introduced O_CLOEXEC.  It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.

Here's a test program to make sure the code works.  It's so much longer than
the actual patch...

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endif

int
main (int argc, char *argv[])
{
  if (argc > 1)
    {
      int fd = atol (argv[1]);
      printf ("child: fd = %d\n", fd);
      if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
        {
          puts ("file descriptor valid in child");
          return 1;
        }
      return 0;

    }

  struct sockaddr_un sun;
  strcpy (sun.sun_path, "./testsocket");
  sun.sun_family = AF_UNIX;

  char databuf[] = "hello";
  struct iovec iov[1];
  iov[0].iov_base = databuf;
  iov[0].iov_len = sizeof (databuf);

  union
  {
    struct cmsghdr hdr;
    char bytes[CMSG_SPACE (sizeof (int))];
  } buf;
  struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
                        .msg_control = buf.bytes,
                        .msg_controllen = sizeof (buf) };
  struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);

  cmsg->cmsg_level = SOL_SOCKET;
  cmsg->cmsg_type = SCM_RIGHTS;
  cmsg->cmsg_len = CMSG_LEN (sizeof (int));

  msg.msg_controllen = cmsg->cmsg_len;

  pid_t child = fork ();
  if (child == -1)
    error (1, errno, "fork");
  if (child == 0)
    {
      int sock = socket (PF_UNIX, SOCK_STREAM, 0);
      if (sock < 0)
        error (1, errno, "socket");

      if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
        error (1, errno, "bind");
      if (listen (sock, SOMAXCONN) < 0)
        error (1, errno, "listen");

      int conn = accept (sock, NULL, NULL);
      if (conn == -1)
        error (1, errno, "accept");

      *(int *) CMSG_DATA (cmsg) = sock;
      if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
        error (1, errno, "sendmsg");

      return 0;
    }

  /* For a test suite this should be more robust like a
     barrier in shared memory.  */
  sleep (1);

  int sock = socket (PF_UNIX, SOCK_STREAM, 0);
  if (sock < 0)
    error (1, errno, "socket");

  if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
    error (1, errno, "connect");
  unlink (sun.sun_path);

  *(int *) CMSG_DATA (cmsg) = -1;

  if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
    error (1, errno, "recvmsg");

  int fd = *(int *) CMSG_DATA (cmsg);
  if (fd == -1)
    error (1, 0, "no descriptor received");

  char fdname[20];
  snprintf (fdname, sizeof (fdname), "%d", fd);
  execl ("/proc/self/exe", argv[0], fdname, NULL);
  puts ("execl failed");
  return 1;
}

[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: NUlrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a19542e

15 7月, 2007 4 次提交

[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices. · 6460d948

由 Michael Chan 提交于 7月 14, 2007

Add ethtool utility function to set or clear IPV6_CSUM feature flag.
Modify tg3.c and bnx2.c to use this function when doing ethtool -K
to change tx checksum.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6460d948

[NET]: Add macvlan driver · b863ceb7

由 Patrick McHardy 提交于 7月 14, 2007

Add macvlan driver, which allows to create virtual ethernet devices
based on MAC address.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b863ceb7

[NET]: dev_mcast: add multicast list synchronization helpers · a0a400d7

由 Patrick McHardy 提交于 7月 14, 2007

The method drivers currently use to synchronize multicast lists is not
very pretty:

- walk the multicast list
- search each entry on a copy of the previous list
- if new add to lower device
- walk the copy of the previous list
- search each entry on the current list
- if removed delete from lower device
- copy entire list

This patch adds a new field to struct dev_addr_list to store the
synchronization state and adds two helper functions for synchronization
and cleanup.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0a400d7

[NET]: Add net_device change_rx_mode callback · 24023451

由 Patrick McHardy 提交于 7月 14, 2007

Currently the set_multicast_list (and set_rx_mode) callbacks are
responsible for configuring the device according to the IFF_PROMISC,
IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in
case of set_rx_mode).

These callbacks can be invoked from BH context without the rtnl_mutex
by dev_mc_add/dev_mc_delete, which makes reading the device flags and
promiscous/allmulti count racy. For real hardware drivers that just
commit all changes to the hardware this is not a real problem since
the stack guarantees to call them for every change, so at least the
final call will not race and commit the correct configuration to the
hardware.

For software devices that want to synchronize promiscous and multicast
state to an underlying device however this can cause corruption of the
underlying device's flags or promisc/allmulti counts.

When the software device is concurrently put in promiscous or allmulti
mode while set_multicast_list is invoked from bottem half context, the
device might synchronize the change to the underlying device without
holding the rtnl_mutex, which races with concurrent changes to the
underlying device.

Add a dev->change_rx_flags hook that is invoked when any of the flags
that affect rx filtering change (under the rtnl_mutex), which allows
drivers to perform synchronization immediately and only synchronize
the address lists in set_multicast_list/set_rx_mode.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24023451

13 7月, 2007 1 次提交

dmaengine: make clients responsible for managing channels · d379b01e

由 Dan Williams 提交于 7月 09, 2007

The current implementation assumes that a channel will only be used by one
client at a time.  In order to enable channel sharing the dmaengine core is
changed to a model where clients subscribe to channel-available-events.
Instead of tracking how many channels a client wants and how many it has
received the core just broadcasts the available channels and lets the
clients optionally take a reference.  The core learns about the clients'
needs at dma_event_callback time.

In support of multiple operation types, clients can specify a capability
mask to only be notified of channels that satisfy a certain set of
capabilities.

Changelog:
* removed DMA_TX_ARRAY_INIT, no longer needed
* dma_client_chan_free -> dma_chan_release: switch to global reference
  counting only at device unregistration time, before it was also happening
  at client unregistration time
* clients now return dma_state_client to dmaengine (ack, dup, nak)
* checkpatch.pl fixes
* fixup merge with git-ioat

Cc: Chris Leech <christopher.leech@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>

d379b01e

12 7月, 2007 3 次提交

[RTNETLINK]: rtnl_link: allow specifying initial device address · 0e06877c

由 Patrick McHardy 提交于 7月 11, 2007

Drivers need to validate the initial addresses in their netlink attribute
validation function or manually reject them if they can't support this.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e06877c

[RTNETLINK]: rtnl_link API simplification · 2d85cba2

由 Patrick McHardy 提交于 7月 11, 2007

All drivers need to unregister their devices in the module unload function.
While doing so they must hold the rtnl and atomically unregister the
rtnl_link ops as well. This makes the rtnl_link_unregister function that
takes the rtnl itself completely useless.

Provide default newlink/dellink functions, make __rtnl_link_unregister and
rtnl_link_unregister unregister all devices with matching rtnl_link_ops and
change the existing users to take advantage of that.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d85cba2

[NET]: Fix races in net_rx_action vs netpoll. · 29578624

由 Olaf Kirch 提交于 7月 11, 2007

Keep netpoll/poll_napi from messing with the poll_list.
Only net_rx_action is allowed to manipulate the list.
Signed-off-by: NOlaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29578624

11 7月, 2007 21 次提交

[NET]: Fix gen_estimator timer removal race · 6b25d30b

由 Patrick McHardy 提交于 7月 09, 2007

As noticed by Jarek Poplawski <jarkao2@o2.pl>, the timer removal in
gen_kill_estimator races with the timer function rearming the timer.

Check whether the timer list is empty before rearming the timer
in the timer function to fix this.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NJarek Poplawski <jarkao2@o2.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b25d30b

[NETPOLL]: Fix a leak-n-bug in netpoll_cleanup() · 1498b3f1

由 Satyam Sharma 提交于 7月 09, 2007

93ec2c72 applied excessive duct tape to
the netpoll beast's netpoll_cleanup(), thus substituting one leak with
another, and opening up a little buglet :-)

net_device->npinfo (netpoll_info) is a shared and refcounted object and
cannot simply be set NULL the first time netpoll_cleanup() is called.
Otherwise, further netpoll_cleanup()'s see np->dev->npinfo == NULL and
become no-ops, thus leaking. And it's a bug too: the first call to
netpoll_cleanup() would thus (annoyingly) "disable" other (still alive)
netpolls too. Maybe nobody noticed this because netconsole (only user
of netpoll) never supported multiple netpoll objects earlier.

This is a trivial and obvious one-line fixlet.
Signed-off-by: NSatyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1498b3f1

[NET]: "wrong timeout value in sk_wait_data()": cleanups · 6f11df83

由 Andrew Morton 提交于 7月 09, 2007

- save 4 bytes

- it's read-mostly.
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NVasily Averin <vvs@sw.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f11df83

[NET]: Make some network-related proc files use seq_list_xxx helpers · 60f0438a

由 Pavel Emelianov 提交于 7月 09, 2007

This includes /proc/net/protocols, /proc/net/rxrpc_calls and
/proc/net/rxrpc_connections files.

All three need seq_list_start_head to show some header.
Signed-off-by: NPavel Emelianov <xemul@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60f0438a

[NETFILTER]: x_tables: add TRACE target · ba9dda3a

由 Jozsef Kadlecsik 提交于 7月 07, 2007

The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPatrick NcHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba9dda3a

[PKTGEN]: IPSEC support · a553e4a6

由 Jamal Hadi Salim 提交于 7月 02, 2007

Added transport mode ESP support for starters.  I will send more of
these modes and types once i have resolved the tunnel mode isses.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a553e4a6

[PKTGEN]: Introduce sequential flows · 007a531b

由 Jamal Hadi Salim 提交于 7月 02, 2007

By default all flows in pktgen are randomly selected.
This patch introduces ability to have all defined flows to
be sent sequentially. Robert defined randomness to be the
default behavior.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007a531b

[PKTGEN]: Centralize packet overhead tracking · 16dab72f

由 Jamal Hadi Salim 提交于 7月 02, 2007

Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16dab72f

[NET]: Fix secondary unicast/multicast address count maintenance · 61cbc2fc

由 Patrick McHardy 提交于 6月 30, 2007

When a reference to an existing address is increased or decreased without
hitting zero, the address count is incorrectly adjusted.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61cbc2fc

[CORE] Stack changes to add multiqueue hardware support API · f25f4e44

由 Peter P Waskiewicz Jr 提交于 7月 06, 2007

Add the multiqueue hardware device support API to the core network
stack.  Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.

Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f25f4e44

[NET]: Fix TX checksum feature check · a298830c

由 Herbert Xu 提交于 6月 28, 2007

This patch fixes a boolean error in the new TX checksum check
that causes bogus TSO packets to be generated.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a298830c

[NET]: dev: secondary unicast address support · 4417da66

由 Patrick McHardy 提交于 6月 27, 2007

Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4417da66

[NET]: dev_mcast: switch to generic net_device address lists · 3fba5a8b

由 Patrick McHardy 提交于 6月 27, 2007

Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fba5a8b

[NET]: dev: introduce generic net_device address lists · bf742482

由 Patrick McHardy 提交于 6月 27, 2007

Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf742482

[NET]: dev_mcast: unexport dev_mc_upload · 75ebe8f7

由 Patrick McHardy 提交于 6月 27, 2007

dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75ebe8f7

[NET]: IPV6 checksum offloading in network devices · d212f87b

由 Stephen Hemminger 提交于 6月 27, 2007

The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
 * adds NETIF_F_IPV6_CSUM for those devices
 * fixes bnx2 and tg3 devices that need it
 * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
 * fixes assumptions about NETIF_F_ALL_CSUM in nat
 * adjusts bridge union of checksumming computation
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d212f87b

[RTNETLINK]: Fix rtnetlink compat attribute patch · 2371baa4

由 Patrick McHardy 提交于 6月 26, 2007

Sent the wrong patch previously.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2371baa4

[RTNETLINK]: Add nested compat attribute · afdc3238

由 Patrick McHardy 提交于 6月 25, 2007

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
        [ compat contents ]
        struct rtattr {
                .rta_len        = total size,
                .rta_type       = type,
        } rta;
        struct old_structure struct;

        [ nested top-level attribute ]
        struct rtattr {
                .rta_len        = nest size,
                .rta_type       = type,
        } nest_attr;

        [ optional 0 .. n nested attributes ]
        struct rtattr {
                .rta_len        = private attribute len,
                .rta_type       = private attribute typ,
        } nested_attr;
        struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afdc3238

[SKBUFF]: Keep track of writable header len of headerless clones · 334a8132

由 Patrick McHardy 提交于 6月 25, 2007

Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.

The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.

I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.

Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:

- sendfile eth0, no NAT:	sys     0m0.388s
- sendfile eth0, NAT:		sys     0m1.835s
- sendfile eth0: NAT + path:	sys     0m0.370s	(~ -80%)

- sendfile lo, no NAT:		sys     0m0.258s
- sendfile lo, NAT:		sys     0m2.609s
- sendfile lo, NAT + patch:	sys     0m0.260s	(~ -90%)

- sendmsg eth0, no NAT:		sys     0m2.508s
- sendmsg eth0, NAT:		sys     0m2.539s
- sendmsg eth0, NAT + patch:	sys     0m2.445s	(no change)

- sendmsg lo, no NAT:		sys	0m2.151s
- sendmsg lo, NAT:		sys     0m3.557s
- sendmsg lo, NAT + patch:	sys     0m2.159s	(~ -40%)

I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

334a8132

[RTNETLINK]: Link creation API · 38f7b870

由 Patrick McHardy 提交于 6月 13, 2007

Add rtnetlink API for creating, changing and deleting software devices.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38f7b870

[RTNETLINK]: Split up rtnl_setlink · 0157f60c

由 Patrick McHardy 提交于 6月 13, 2007

Split up rtnl_setlink into a function performing validation and a function
performing the actual changes. This allows to share the modifcation logic
with rtnl_newlink, which is introduced by the next patch.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0157f60c

06 7月, 2007 3 次提交

[NETPOLL]: Fixups for 'fix soft lockup when removing module' · 25442caf

由 Jarek Poplawski 提交于 7月 05, 2007

>From my recent patch:

> >    #1
> >    Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
> >    required a work function should always (unconditionally) rearm with
> >    delay > 0 - otherwise it would endlessly loop. This patch replaces
> >    this function with cancel_delayed_work(). Later kernel versions don't
> >    require this, so here it's only for uniformity.

But Oleg Nesterov <oleg@tv-sign.ru> found:

> But 2.6.22 doesn't need this change, why it was merged?
> 
> In fact, I suspect this change adds a race,
...

His description was right (thanks), so this patch reverts #1.
Signed-off-by: NJarek Poplawski <jarkao2@o2.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25442caf

[NET]: net/core/netevent.c should #include <net/netevent.h> · 94b83419

由 Adrian Bunk 提交于 7月 05, 2007

Every file should include the headers containing the prototypes for
its global functions.
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94b83419

[NET] skbuff: remove export of static symbol · 2cd052e4

由 Johannes Berg 提交于 7月 05, 2007

skb_clone_fraglist is static so it shouldn't be exported.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cd052e4

29 6月, 2007 1 次提交

[NETPOLL] netconsole: fix soft lockup when removing module · 17200811

由 Jarek Poplawski 提交于 6月 28, 2007

#1
Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
required a work function should always (unconditionally) rearm with
delay > 0 - otherwise it would endlessly loop. This patch replaces
this function with cancel_delayed_work(). Later kernel versions don't
require this, so here it's only for uniformity.

#2
After deleting a timer in cancel_[rearming_]delayed_work() there could
stay a last skb queued in npinfo->txq causing a memory leak after
kfree(npinfo).

Initial patch & testing by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: NJarek Poplawski <jarkao2@o2.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17200811

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多