提交 · 906e073f3e842877b59d669b25aa76f65ba775b3 · openeuler / Kernel

23 1月, 2014 38 次提交

net/ipv4: queue work on power efficient wq · 906e073f

由 viresh kumar 提交于 1月 22, 2014

Workqueue used in ipv4 layer have no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that an idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces normal workqueues with power efficient versions. This
doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
enabled.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

906e073f

ipv6: enable anycast addresses as source addresses for datagrams · 7c90cc2d

由 FX Le Bail 提交于 1月 22, 2014

This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.

- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
  on given interface or is global.
- Uses it in ip6_datagram_send_ctl().
Signed-off-by: NFrancois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c90cc2d

F
net: update comments of "struct msghdr" with the more accurate RFC3542 ones · eb97768a
由 FX Le Bail 提交于 1月 22, 2014
```
Signed-off-by: NFrancois-Xavier Le Bail <fx.lebail@yahoo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
eb97768a

tuntap: Fix for a race in accessing numqueues · fa35864e

由 Dominic Curran 提交于 1月 22, 2014

A patch for fixing a race between queue selection and changing queues
was introduced in commit 92bb73ea("tuntap: fix a possible race between
queue selection and changing queues").

The fix was to prevent the driver from re-reading the tun->numqueues
more than once within tun_select_queue() using ACCESS_ONCE().

We have been experiancing 'Divide-by-zero' errors in tun_net_xmit()
since we moved from 3.6 to 3.10, and believe that they come from a
simular source where the value of tun->numqueues changes to zero
between the first and a subsequent read of tun->numqueues.

The fix is a simular use of ACCESS_ONCE(), as well as a multiply
instead of a divide in the if statement.
Signed-off-by: NDominic Curran <dominic.curran@citrix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMax Krasnyansky <maxk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa35864e

bridge: Remove unnecessary vlan_put_tag in br_handle_vlan · bdf4351b

由 Toshiaki Makita 提交于 1月 22, 2014

br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing
port is the bridge device.
This is unnecessary because __netif_receive_skb_core() can handle skbs
with HW accelerated vlan tag. In current implementation,
__netif_receive_skb_core() needs to extract the vlan tag embedded in skb
data. This could cause low network performance especially when receiving
frames at a high frame rate on the bridge device.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdf4351b

tcp: metrics: Fix rcu-race when deleting multiple entries · 00ca9c5b

由 Christoph Paasch 提交于 1月 21, 2014

In bbf852b9 I introduced the tmlist, which allows to delete
multiple entries from the cache that match a specified destination if no
source-IP is specified.

However, as the cache is an RCU-list, we should not create this tmlist, as
it will change the tcpm_next pointer of the element that will be deleted
and so a thread iterating over the cache's entries while holding the
RCU-lock might get "redirected" to this tmlist.

This patch fixes this, by reverting back to the old behavior prior to
bbf852b9, which means that we simply change the tcpm_next
pointer of the previous element (pp) to jump over the one we are
deleting.
The difference is that we call kfree_rcu() directly on the cache entry,
which allows us to delete multiple entries from the list.

Fixes: bbf852b9 (tcp: metrics: Delete all entries matching a certain destination)
Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00ca9c5b

Merge branch 'bonding' · 7705b104

由 David S. Miller 提交于 1月 22, 2014

Ding Tianhong says:

====================
bonding: fix primary problem for bonding

If the slave's name changed, and the bond params primary is exist,
the bond should deal with the situation in two ways:

1) If the slave was the primary slave yet, clean the primary slave
   and reselect active slave.
2) If the slave's new name is as same as bond primary, set the slave
   as primary slave and reselect active slave.

If the new primary is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.

Update bonding.txt for primary description.

v2.1->v1: Because there are too many indentions and useless verification, so rewrite
	  the logic for updating the primary slave.
	  Modify some comments for to clean the typos.

v3->v2.1: Veaceslav disagree the first patch and modify the logic for it
	  (bonding: update the primary slave when changing slave's name)
	  and resend it himself (bonding: handle slave's name change with primary_slave logic),
	  so remove the first patch and send the last two patches.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7705b104

bonding: update bonding.txt for primary description · e1d206a7

由 dingtianhong 提交于 1月 18, 2014

Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1d206a7

bonding: clean the primary slave if there is no slave matching new primary · c59ab673

由 dingtianhong 提交于 1月 18, 2014

If the new primay is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c59ab673

net: vxlan: convert to act as a pernet subsystem · 783c1463

由 Daniel Borkmann 提交于 1月 22, 2014

As per suggestion from Eric W. Biederman, vxlan should be using
{un,}register_pernet_subsys() instead of {un,}register_pernet_device()
to ensure the vxlan_net structure is initialized before and cleaned
up after all network devices in a given network namespace i.e. when
dealing with network notifiers. This is similarly handeled already in
commit 91e2ff35 ("net: Teach vlans to cleanup as a pernet subsystem")
and, thus, improves upon fd27e0d4 ("net: vxlan: do not use vxlan_net
before checking event type"). Just as in 91e2ff35, we do not need
to explicitly handle deletion of vxlan devices as network namespace
exit calls dellink on all remaining virtual devices, and
rtnl_link_unregister() calls dellink on all outstanding devices in that
network namespace, so we can entirely drop the pernet exit operation
as well. Moreover, on vxlan module exit, rcu_barrier() is called by
netns since commit 3a765eda ("netns: Add an explicit rcu_barrier
to unregister_pernet_{device|subsys}"), so this may be omitted. Tested
with various scenarios and works well on my side.
Suggested-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

783c1463

sch_htb: let skb->priority refer to non-leaf class · 29824310

由 Harry Mason 提交于 1月 17, 2014

If the class in skb->priority is not a leaf, apply filters from the
selected class, not the qdisc. This lets netfilter or user space
partially classify the packet.
Signed-off-by: NHarry Mason <harry.mason@smoothwall.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29824310

af_packet: Add Queue mapping mode to af_packet fanout operation · 2d36097d

由 Neil Horman 提交于 1月 22, 2014

This patch adds a queue mapping mode to the fanout operation of af_packet
sockets.  This allows user space af_packet users to better filter on flows
ingressing and egressing via a specific hardware queue, and avoids the potential
packet reordering that can occur when FANOUT_CPU is being used and irq affinity
varies.

Tested successfully by myself.  applies to net-next
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d36097d

Merge branch 'bonding_option_api' · b414ac9a

由 David S. Miller 提交于 1月 22, 2014

Nikolay Aleksandrov says:

====================
bonding: introduce new option API

This patchset's goal is to introduce a new option API which should be used
to properly describe the bonding options with their mode dependcies and
requirements. With this patchset applied we get centralized option
manipulation, automatic RTNL acquire per option setting, automatic option
range checking, mode dependcy checking and other various flags which are
described in detail in patch 01's commit message and comments.
Also the parameter passing is changed to use a specialized structure which
is initialized to a value depending on the needs.
The main exported functions are:
 __bond_opt_set() - set an option (RTNL should be acquired prior)
 bond_opt_init(val|str) - init a bond_opt_value struct for value or string
                          parameter passing
 bond_opt_tryset_rtnl() - function which tries to acquire rtnl, mainly used
                          for sysfs
 bond_opt_parse - used to parse or check for valid values
 bond_opt_get - retrieve a pointer to bond_option struct for some option
 bond_opt_get_val - retrieve a pointer to a bond_opt_value struct for
                    some value

The same functions are used to set an option via sysfs and netlink, just
the parameter that's passed is usually initialized in a different way.
The converted options have multiple style fixes, there're some longer
lines but they looked either ugly or were strings/pr_warnings, if you
think some line would be better broken just let me know :-) there're
also a few sscanf false-positive warnings.
I decided to keep the "unsuppmodes" way of mode dep checking since it's
straight forward, if we make a more general way for checking dependencies
it'll be easy to change it.

Future plans for this work include:
 - Automatic sysfs generation from the bond_opts[].
 - Use of the API in bond_check_params() and thus cleaning it up (this has
   actually started, I'll take care of the rest in a separate patch)
 - Clean up all option-unrelated files of option definitions and functions

I've tried to leave as much documentation as possible, if there's anything
unclear please let me know. One more thing, I haven't moved all
option-related functions from bonding.h to the new bond_options.h, this
will be done in a separate patch, it's in my todo list.

This patchset has been tested by setting each converted option via sysfs
and netlink to a couple of wrong values, a couple of correct values and
some random values, also for the opts that have flags they have been
tested as well.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b414ac9a

bonding: convert slaves to use the new option API · 0e2e5b66

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so slaves would use
the new bonding option API. Also move the option to its own set function
in bond_options.c and fix some style errors.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e2e5b66

bonding: convert lp_interval to use the new option API · 4325b374

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so lp_interval would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4325b374

bonding: convert resend_igmp to use the new option API · 105c8fb6

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so resend_igmp would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

105c8fb6

bonding: convert all_slaves_active to use the new option API · 3df01162

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so all_slaves_active would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3df01162

bonding: convert queue_id to use the new option API · 24089ba1

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so queue_id would use
the new bonding option API. Also move it to its own set function in
bond_options.c and fix some style errors.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24089ba1

bonding: convert active_slave to use the new option API · d1fbd3ed

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so active_slave would use
the new bonding option API. Also some trivial/style fixes.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1fbd3ed

bonding: convert use_carrier to use the new option API · 0fff0608

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so use_carrier would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fff0608

bonding: convert primary_reselect to use the new option API · 388d3a6d

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so primary_reselect would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

388d3a6d

bonding: convert primary to use the new option API · 180222f0

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so primary would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

180222f0

bonding: convert miimon to use the new option API · b98d9c66

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so miimon would use
the new bonding option API. The "default" definition has been removed as
it was 0.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b98d9c66

bonding: convert num_peer_notif to use the new option API · ef56becb

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so num_peer_notif would use
the new bonding option API.
When the auto-sysfs generation is done an alias should be added for
this option as there're currently 2 entries in sysfs for it.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef56becb

bonding: convert ad_select to use the new option API · 9e5f5eeb

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so ad_select would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e5f5eeb

bonding: convert min_links to use the new option API · 633ddc9e

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so min_links would use
the new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

633ddc9e

bonding: convert lacp_rate to use the new option API · d3131de7

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so lacp_rate would use
the new bonding option API. Also some trivial/style error fixes.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3131de7

bonding: convert updelay to use the new option API · e4994612

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so updelay would use
the new bonding option API. Also some trivial style fixes.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4994612

bonding: convert downdelay to use the new option API · 25a9b54a

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so downdelay would use
the new bonding option API. Also some trivial style fixes.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25a9b54a

bonding: convert arp_ip_target to use the new option API · 4fb0ef58

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so arp_ip_target would use
the new bonding option API. This option is an exception because of
the way it's currently implemented that's why its netlink code is
a bit different from the other options to keep the functionality as
before and at the same time to have a single set function.

This patch also fixes a few stylistic errors.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fb0ef58

bonding: convert arp_interval to use the new option API · 7bdb04ed

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so arp_interval would use
the new bonding option API. The "default" definition has been removed as
it was 0.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bdb04ed

bonding: convert fail_over_mac to use the new option API · 1df6b6aa

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so fail_over_mac would use
the new bonding option API. Also fixes a trivial copy/paste error in
bond_check_params where the wrong variable was used for the error msg.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1df6b6aa

bonding: convert arp_all_targets to use the new option API · edf36b24

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so arp_all_targets would use the
new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edf36b24

bonding: convert arp_validate to use the new option API · 16228881

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so arp_validate would use the
new bonding option API. Also fix some trivial/style errors.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16228881

bonding: convert xmit_hash_policy to use the new option API · a4b32ce7

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so xmit_hash_policy would use the
new bonding option API. Also fix some trivial/style errors.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4b32ce7

bonding: convert packets_per_slave to use the new option API · aa59d851

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary changes so packets_per_slave would use the
new bonding option API.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa59d851

bonding: convert mode setting to use the new option API · 2b3798d5

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch makes the bond's mode setting use the new option API and
adds support for dependency printing which relies on having an entry for
the mode option in the bond_opts[] array.
Also add the ability to print the mode name when mode dependency fails
and fix some trivial/style errors.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b3798d5

bonding: add infrastructure for an option API · 09117362

由 Nikolay Aleksandrov 提交于 1月 22, 2014

This patch adds the necessary basic infrastructure to support
centralized and unified option manipulation API for the bonding. The new
structure bond_option will be used to describe each option with its
dependencies on modes which will be checked automatically thus removing a
lot of duplicated code. Also automatic range checking is added for
some options. Currently the option setting function requires RTNL to
be acquired prior to calling it, since many options already rely on RTNL
it seemed like the best choice to protect all against common race
conditions.
In order to add an option the following steps need to be done:
1. Add an entry BOND_OPT_<option> to bond_options.h so it gets a unique id
   and a bit corresponding to the id
2. Add a bond_option entry to the bond_opts[] array in bond_options.c which
   describes the option, its dependencies and its manipulation function
3. Add code to export the option through sysfs and/or as a module parameter
   (the sysfs export will be made automatically in the future)

The options can have different flags set, currently the following are
supported:
BOND_OPTFLAG_NOSLAVES - require that the bond device has no slaves prior
                        to setting the option
BOND_OPTFLAG_IFDOWN - require that the bond device is down prior to
                      setting the option
BOND_OPTFLAG_RAWVAL - don't parse the value but return it raw for the
                      option to parse

There's a new value structure to describe different types of values
which can have the following flags:
BOND_VALFLAG_DEFAULT - marks the default option (permanent string alias
                       to this option is "default")
BOND_VALFLAG_MIN - the minimum value that this option can have
BOND_VALFLAG_MAX - the maximum value that this option can have

An example would be nice here, so if we have an option which can have
the values "off"(2), "special"(4, default) and supports a range, say
16 - 32, it should be defined as follows:
"off", 2,
"special", 4, BOND_VALFLAG_DEFAULT,
"rangemin", 16, BOND_VALFLAG_MIN,
"rangemax", 32, BOND_VALFLAG_MAX
So we have the valid intervals: [2, 2], [4, 4], [16, 32]
Also the valid strings: "off" = 2, "special" and "default" = 4
                        "rangemin" = 16, "rangemax" = 32

BOND_VALFLAG_(MIN|MAX) can be used to specify a valid range for an
option, if MIN is omitted then 0 is considered as a minimum. If an
exact match is found in the values[] table it will be returned,
otherwise the range is tried (if available).

The option parameter passing is done by using a special structure called
bond_opt_value which can take either a string or a value to parse. One
of the bond_opt_init(val|str) macros should be used depending on which
one does the user want to parse (string or value). Then a call to
__bond_opt_set should be done under RTNL.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09117362

22 1月, 2014 2 次提交

Merge branch 'reciprocal' · 374d1125

由 David S. Miller 提交于 1月 21, 2014

Hannes Frederic Sowa says:

====================
reciprocal_divide update

This patch is on top of aee636c4 ("bpf: do not use reciprocal
divide") from Eric that sits in net tree. It will not create a merge
conflict, but it depends on this one, so we suggest, if possible, to
merge net into net-next.

We are proposing this change with only small modifications from the
v2 version, namely updating the name of trim to reciprocal_scale
(as commented on by Ben Hutchings and Eric Dumazet, thanks!).

We thought about introducing the reciprocal_divide algorithm in
parallel to the one already used by the kernel but faced organizational
issues, leading us to the conclusion that it is best to just replace
the old one: We could not come up with names for the different
implementations and also with a way to describe the differences to
guide developers which one to choose in which situation. This is
because we cannot specify the correct semantics for the version
which is currently used by the kernel. Altough it seems to not be
causing problems in the kernel, we cannot surely say so in the
case of flex_array for the future. Current usage seems ok, but
future users could run into problems.

Changelog:

v1->v2:
 - changed name to prandom_u32_max in p1
 - changed name to trim in p2
 - reworked code in p3
v2->v3:
 - p1 and p3 stays unchanged, only small update in commit
   message in p3
 - changed name to reciprocal_scale in p2
 - fixed kernel doc format
v3->v4:
 - pseduo -> pseudo (thanks to Tilman Schmidt)
v4->v5:
 - fix pseduo -> pseudo for real now, sorry for the noise
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

374d1125

reciprocal_divide: update/correction of the algorithm · 809fa972

由 Hannes Frederic Sowa 提交于 1月 22, 2014

Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c4
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a95 resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

809fa972

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功