提交 · 8571a19c4ac140f1a507f3e7eb716892afa27109 · OpenHarmony / kernel_linux

28 1月, 2011 3 次提交

由 Eric Dumazet 提交于 1月 26, 2011

Commit c6d14c84 (net: Introduce for_each_netdev_rcu() iterator)
added a race in dev_seq_next().

The rcu_dereference() call should be done _before_ testing the end of
list, or we might return a wrong net_device if a concurrent thread
changes net_device list under us.

Note : discovered thanks to a sparse warning :

net/core/dev.c:3919:9: error: incompatible types in comparison expression
(different address spaces)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccf43438

inetpeer: Mark metrics as "new" in fresh inetpeer entries. · 144001bd

由 David S. Miller 提交于 1月 27, 2011

Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

144001bd

D
inetpeer: Add metrics storage to inetpeer entries. · 60659823
由 David S. Miller 提交于 1月 26, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
60659823

27 1月, 2011 1 次提交

net: Implement read-only protection and COW'ing of metrics. · 62fa8a84

由 David S. Miller 提交于 1月 26, 2011

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62fa8a84

25 1月, 2011 4 次提交

net: reduce and unify printk level in netdev_fix_features() · acd1130e

由 Michał Mirosław 提交于 1月 24, 2011

Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acd1130e

net: change netdev->features to u32 · 04ed3e74

由 Michał Mirosław 提交于 1月 24, 2011

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04ed3e74

net: RPS: Enable hardware acceleration of RFS · c445477d

由 Ben Hutchings 提交于 1月 19, 2011

Allow drivers for multiqueue hardware with flow filter tables to
accelerate RFS.  The driver must:

1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion
IRQs (in queue order).  This will provide a mapping from CPUs to the
queues for which completions are handled nearest to them.

2. Implement net_device_ops::ndo_rx_flow_steer.  This operation adds
or replaces a filter steering the given flow to the given RX queue, if
possible.

3. Periodically remove filters for which rps_may_expire_flow() returns
true.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c445477d

lib: cpu_rmap: CPU affinity reverse-mapping · c39649c3

由 Ben Hutchings 提交于 1月 19, 2011

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
library functions to support a generic reverse-mapping from CPUs to
objects with affinity and the specific case where the objects are
IRQs.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c39649c3

24 1月, 2011 5 次提交

Remove MAYBE_BUILD_BUG_ON · 1765e3a4

由 Rusty Russell 提交于 1月 24, 2011

Now BUILD_BUG_ON() can handle optimizable constants, we don't need
MAYBE_BUILD_BUG_ON any more.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1765e3a4

BUILD_BUG_ON: make it handle more cases · 7ef88ad5

由 Rusty Russell 提交于 1月 24, 2011

BUILD_BUG_ON used to use the optimizer to do code elimination or fail
at link time; it was changed to first the size of a negative array (a
nicer compile time error), then (in
8c87df45) to a bitfield.

This forced us to change some non-constant cases to MAYBE_BUILD_BUG_ON();
as Jan points out in that commit, it didn't work as intended anyway.

bitfields: needs a literal constant at parse time, and can't be put under
	"if (__builtin_constant_p(x))" for example.
negative array: can handle anything, but if the compiler can't tell it's
	a constant, silently has no effect.
link time: breaks link if the compiler can't determine the value, but the
	linker output is not usually as informative as a compiler error.

If we use the negative-array-size method *and* the link time trick,
we get the ability to use BUILD_BUG_ON() under __builtin_constant_p()
branches, and maximal ability for the compiler to detect errors at
build time.

We also document it thoroughly.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Jan Beulich <JBeulich@novell.com>
Acked-by: NHollis Blanchard <hollisb@us.ibm.com>

7ef88ad5

param: add null statement to compiled-in module params · b75be420

由 Linus Walleij 提交于 1月 05, 2011

Add an unused struct declaration statement requiring a
terminating semicolon to the compile-in case to provoke an
error if __MODULE_INFO() is used without the terminating
semicolon. Previously MODULE_ALIAS("foo") (no semicolon)
compiled fine if MODULE was not selected.

Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: NLinus Walleij <linus.walleij@stericsson.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b75be420

module: fix linker error for MODULE_VERSION when !MODULE and CONFIG_SYSFS=n · 3b90a5b2

由 Rusty Russell 提交于 1月 24, 2011

lib/built-in.o:(__modver+0x8): undefined reference to `__modver_version_show'
lib/built-in.o:(__modver+0x2c): undefined reference to `__modver_version_show'

Simplest to just not emit anything: if they've disabled SYSFS they probably
want the smallest kernel possible.
Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

3b90a5b2

module: show version information for built-in modules in sysfs · e94965ed

由 Dmitry Torokhov 提交于 12月 15, 2010

Currently only drivers that are built as modules have their versions
shown in /sys/module/<module_name>/version, but this information might
also be useful for built-in drivers as well. This especially important
for drivers that do not define any parameters - such drivers, if
built-in, are completely invisible from userspace.

This patch changes MODULE_VERSION() macro so that in case when we are
compiling built-in module, version information is stored in a separate
section. Kernel then uses this data to create 'version' sysfs attribute
in the same fashion it creates attributes for module parameters.
Signed-off-by: NDmitry Torokhov <dtor@vmware.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

e94965ed

23 1月, 2011 1 次提交

genirq: Add IRQ affinity notifiers · cd7eab44

由 Ben Hutchings 提交于 1月 19, 2011

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Cc: linux-net-drivers@solarflare.com
Cc: Tom Herbert <therbert@google.com>
Cc: David Miller <davem@davemloft.net>
LKML-Reference: <1295470904.11126.84.camel@bwh-desktop>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

cd7eab44

22 1月, 2011 4 次提交

cfg80211: Extend channel to frequency mapping for 802.11j · 59eb21a6

由 Bruno Randolf 提交于 1月 17, 2011

Extend channel to frequency mapping for 802.11j Japan 4.9GHz band, according to
IEEE802.11 section 17.3.8.3.2 and Annex J. Because there are now overlapping
channel numbers in the 2GHz and 5GHz band we can't map from channel to
frequency without knowing the band. This is no problem as in most contexts we
know the band. In places where we don't know the band (and WEXT compatibility)
we assume the 2GHz band for channels below 14.

This patch does not implement all channel to frequency mappings defined in
802.11, it's just an extension for 802.11j 20MHz channels. 5MHz and 10MHz
channels as well as 802.11y channels have been omitted.

The following drivers have been updated to reflect the API changes:
iwl-3945, iwl-agn, iwmc3200wifi, libertas, mwl8k, rt2x00, wl1251, wl12xx.
The drivers have been compile-tested only.
Signed-off-by: NBruno Randolf <br1@einfach.org>
Signed-off-by: NBrian Prodoehl <bprodoehl@gmail.com>
Acked-by: NLuciano Coelho <coelho@ti.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

59eb21a6

mm: System without MMU do not need pte_mkwrite · 3dece370

由 Michal Simek 提交于 1月 21, 2011

The patch "thp: export maybe_mkwrite" (commit 14fd403f) breaks
systems without MMU.

Error log:

    CC      arch/microblaze/mm/init.o
  In file included from include/linux/mman.h:14,
                   from arch/microblaze/mm/consistent.c:24:
  include/linux/mm.h: In function 'maybe_mkwrite':
  include/linux/mm.h:482: error: implicit declaration of function 'pte_mkwrite'
  include/linux/mm.h:482: error: incompatible types in assignment
Signed-off-by: NMichal Simek <monstr@monstr.eu>
CC: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dece370

RTC: Propagate error handling via rtc_timer_enqueue properly · aa0be0f4

由 John Stultz 提交于 1月 20, 2011

In cases where RTC hardware does not support alarms, the virtualized
RTC interfaces did not have a way to propagate the error up to userland.

This patch extends rtc_timer_enqueue so it catches errors from the hardware
and returns them upwards to the virtualized interfaces. To simplify error
handling, it also internalizes the management of the timer->enabled bit
into rtc_timer_enqueue and rtc_timer_remove.

Also makes rtc_timer_enqueue and rtc_timer_remove static.
Reported-by: NDavid Daney <ddaney@caviumnetworks.com>
Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Diagnosed-by: NDavid Daney <ddaney@caviumnetworks.com>
Tested-by: NDavid Daney <ddaney@caviumnetworks.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
LKML-Reference: <1295565973-14358-1-git-send-email-john.stultz@linaro.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

aa0be0f4

rtc: Cleanup removed UIE emulation declaration · 1daeddd5

由 John Stultz 提交于 1月 13, 2011

rtc_dev_update_irq_enable_emul was removed in commit
042620a0 (UIE emulation is
now handled via hrtimer), but the declaration was missed.
This patch cleans it up.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
LKML-Reference: <1294939849-20608-1-git-send-email-john.stultz@linaro.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1daeddd5

21 1月, 2011 10 次提交

genirq: Remove __do_IRQ · 1c77ff22

由 Thomas Gleixner 提交于 1月 19, 2011

All architectures are finally converted. Remove the cruft.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>

1c77ff22

net_sched: accurate bytes/packets stats/rates · 9190b3b3

由 Eric Dumazet 提交于 1月 20, 2011

In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9190b3b3

D
net: Add safe reverse SKB queue walkers. · 686a2955
由 David S. Miller 提交于 1月 20, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
686a2955

ACPI: Introduce acpi_os_ioremap() · 2d6d9fd3

由 Rafael J. Wysocki 提交于 1月 19, 2011

Commit ca9b600b ("ACPI / PM: Make suspend_nvs_save() use
acpi_os_map_memory()") attempted to prevent the code in osl.c and nvs.c
from using different ioremap() variants by making the latter use
acpi_os_map_memory() for mapping the NVS pages.  However, that also
requires acpi_os_unmap_memory() to be used for unmapping them, which
causes synchronize_rcu() to be executed many times in a row
unnecessarily and introduces substantial delays during resume on some
systems.

Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c
introduce acpi_os_ioremap() calling ioremap_cache() and make the code in
both osl.c and nvs.c use it.
Reported-by: NJeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d6d9fd3

memcg: fix USED bit handling at uncharge in THP · ca3e0214

由 KAMEZAWA Hiroyuki 提交于 1月 20, 2011

Now, under THP:

at charge:
  - PageCgroupUsed bit is set to all page_cgroup on a hugepage.
    ....set to 512 pages.
at uncharge
  - PageCgroupUsed bit is unset on the head page.

So, some pages will remain with "Used" bit.

This patch fixes that Used bit is set only to the head page.
Used bits for tail pages will be set at splitting if necessary.

This patch adds this lock order:
   compound_lock() -> page_cgroup_move_lock().

[akpm@linux-foundation.org: fix warning]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ca3e0214

dccp: clean up unused DCCP_STATE_MASK definition · d18046b3

由 Shan Wei 提交于 1月 19, 2011

Remove unused DCCP_STATE_MASK macro.
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d18046b3

net_sched: RCU conversion of stab · a2da570d

由 Eric Dumazet 提交于 1月 20, 2011

This patch converts stab qdisc management to RCU, so that we can perform
the qdisc_calculate_pkt_len() call before getting qdisc lock.

This shortens the lock's held time in __dev_xmit_skb().

This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
cache misses and so reducing latencies.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2da570d

net_sched: move TCQ_F_THROTTLED flag · fd245a4a

由 Eric Dumazet 提交于 1月 20, 2011

In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd245a4a

netfilter: nf_conntrack: fix linker error with NF_CONNTRACK_TIMESTAMP=n · 2f1e3176

由 Patrick McHardy 提交于 1月 20, 2011

net/built-in.o: In function `nf_conntrack_init_net':
net/netfilter/nf_conntrack_core.c:1521:
	undefined reference to `nf_conntrack_tstamp_init'
net/netfilter/nf_conntrack_core.c:1531:
	undefined reference to `nf_conntrack_tstamp_fini'

Add dummy inline functions for the =n case to fix this.
Reported-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

2f1e3176

netfilter: xtables: add missing header inclusions for headers_check · 06988b06

由 Jan Engelhardt 提交于 1月 20, 2011

Resolve these warnings on `make headers_check`:

usr/include/linux/netfilter/xt_CT.h:7: found __[us]{8,16,32,64} type
without #include <linux/types.h>
...
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

06988b06

20 1月, 2011 12 次提交

netfilter: xtables: remove duplicate member · ba12b130

由 Jan Engelhardt 提交于 1月 20, 2011

Accidentally missed removing the old out-of-union "inverse" member,
which caused the struct size to change which then gives size mismatch
warnings when using an old iptables.

It is interesting to see that gcc did not warn about this before.
(Filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47376 )
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

ba12b130

lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6

由 Tejun Heo 提交于 1月 20, 2011

During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized.  During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.

lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available.  While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2ce802f6

netfilter: xtables: remove extraneous header that slipped in · 5d844928

由 Jan Engelhardt 提交于 1月 20, 2011

Commit 0b8ad876 (netfilter: xtables: add missing header files to export
list) erroneously added this.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

5d844928

net_sched: implement a root container qdisc sch_mqprio · b8970f0b

由 John Fastabend 提交于 1月 17, 2011

This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.

Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mqprio_qopt {
	__u8    num_tc;
	__u8    prio_tc_map[TC_BITMASK + 1];
	__u8    hw;
	__u16   count[TC_MAX_QUEUE];
	__u16   offset[TC_MAX_QUEUE];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.

The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8970f0b

net: implement mechanism for HW based QOS · 4f57c087

由 John Fastabend 提交于 1月 17, 2011

This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.

Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.

By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.

With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.

To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.

This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f57c087

net_device: add support for network device groups · cbda10fa

由 Vlad Dogaru 提交于 1月 13, 2011

Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.
Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org>
Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbda10fa

sctp: user perfect name for Delayed SACK Timer option · 4580ccc0

由 Shan Wei 提交于 1月 18, 2011

The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
not SCTP_DELAYED_ACK.

Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
for making compatibility with existing applications.

Reference:
8.1.19.  Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
（http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Acked-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4580ccc0

netfilter: xtables: connlimit revision 1 · cc4fc022

由 Jan Engelhardt 提交于 1月 18, 2011

This adds destination address-based selection. The old "inverse"
member is overloaded (memory-wise) with a new "flags" variable,
similar to how J.Park did it with xt_string rev 1. Since revision 0
userspace only sets flag 0x1, no great changes are made to explicitly
test for different revisions.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

cc4fc022

Bluetooth: Fix race condition with conn->sec_level · 765c2a96

由 Johan Hedberg 提交于 1月 19, 2011

The conn->sec_level value is supposed to represent the current level of
security that the connection has. However, by assigning to it before
requesting authentication it will have the wrong value during the
authentication procedure. To fix this a pending_sec_level variable is
added which is used to track the desired security level while making
sure that sec_level always represents the current level of security.
Signed-off-by: NJohan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi>

765c2a96

mac80211: allow advertising correct maximum aggregate size · 5dd36bc9

由 Johannes Berg 提交于 1月 18, 2011

Currently, mac80211 always advertises that it may send
up to 64 subframes in an aggregate. This is fine, since
it's the max, but might as well be set to zero instead
since it doesn't have any information.

However, drivers might have that information, so allow
them to set a variable giving it, which will then be
used. The default of zero will be fine since to the
peer that means we don't know and it will just use its
own limit for the buffer size.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

5dd36bc9

mac80211: track receiver's aggregation reorder buffer size · 0b01f030

由 Johannes Berg 提交于 1月 18, 2011

The aggregation code currently doesn't implement the
buffer size negotiation. It will always request a max
buffer size (which is fine, if a little pointless, as
the mac80211 code doesn't know and might just use 0
instead), but if the peer requests a smaller size it
isn't possible to honour this request.

In order to fix this, look at the buffer size in the
addBA response frame, keep track of it and pass it to
the driver in the ampdu_action callback when called
with the IEEE80211_AMPDU_TX_OPERATIONAL action. That
way the driver can limit the number of subframes in
aggregates appropriately.

Note that this doesn't fix any drivers apart from the
addition of the new argument -- they all need to be
updated separately to use this variable!
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

0b01f030

mac80211: add hw configuration for max ampdu buffer size · df6ba5d8

由 Luciano Coelho 提交于 1月 12, 2011

Some devices don't support the maximum AMDPU buffer size of 64, so we
need to add an option to configure this in the hardware configuration.
This value will be used in the ADDBA response instead of the value
suggested in the request, if the latter is greater than the max
supported.
Signed-off-by: NLuciano Coelho <coelho@ti.com>
Tested-by: NJuuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

df6ba5d8

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多