提交 · 0fa7b39131576dd1baa6ca17fca53c65d7f62249 · xiphi1978 / linux

23 1月, 2015 4 次提交

nl80211: fix per-station group key get/del and memory leak · 0fa7b391

由 Johannes Berg 提交于 1月 23, 2015

In case userspace attempts to obtain key information for or delete a
unicast key, this is currently erroneously rejected unless the driver
sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it
was never noticed.

Fix that, and while at it fix a potential memory leak: the error path
in the get_key() function was placed after allocating a message but
didn't free it - move it to a better place. Luckily admin permissions
are needed to call this operation.

Cc: stable@vger.kernel.org
Fixes: e31b8213 ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

0fa7b391

mac80211: properly set CCK flag in radiotap · 3a5c5e81

由 Mathy Vanhoef 提交于 1月 20, 2015

Fix a regression introduced by commit a5e70697 ("mac80211: add radiotap flag
and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was
incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by
using the CCK flag again.

Cc: stable@vger.kernel.org
Fixes: a5e70697 ("mac80211: add radiotap flag and handling for 5/10 MHz")
Signed-off-by: NMathy Vanhoef <vanhoefm@gmail.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

3a5c5e81

mac80211: correct header length calculation · fb142f4b

由 Fred Chou 提交于 1月 20, 2015

HT Control field may also be present in management frames, as defined
in 8.2.4.1.10 of 802.11-2012. Account for this in calculation of header
length.
Signed-off-by: NFred Chou <fred.chou.nd@gmail.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

fb142f4b

mac80211: only roll back station states for WDS when suspending · 2af81d67

由 Luciano Coelho 提交于 1月 21, 2015

In normal cases (i.e. when we are fully associated), cfg80211 takes
care of removing all the stations before calling suspend in mac80211.

But in the corner case when we suspend during authentication or
association, mac80211 needs to roll back the station states.  But we
shouldn't roll back the station states in the suspend function,
because this is taken care of in other parts of the code, except for
WDS interfaces.  For AP types of interfaces, cfg80211 takes care of
disconnecting all stations before calling the driver's suspend code.
For station interfaces, this is done in the quiesce code.

For WDS interfaces we still need to do it here, so move the code into
a new switch case for WDS.

Cc: stable@kernel.org [3.15+]
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2af81d67

07 1月, 2015 2 次提交

cfg80211: fix deadlock during reg chan check · 20658702

由 Arik Nemtsov 提交于 12月 29, 2014

If a P2P GO is active, the cfg80211_reg_can_beacon function will take
the wdev lock, in its call to cfg80211_go_permissive_chan. But the wdev lock
is already taken by the parent channel-checking function, causing a
deadlock.
Split the checking code into two parts. The first part will check if the
wdev is active and saves the channel under the wdev lock. The second part
will check actual channel validity according to type.
Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: NIlan Peer <ilan.peer@intel.com>
Reviewed-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

20658702

mac80211: uninitialized return val in __ieee80211_sta_handle_tspec_ac_params · cc72f6e2

由 John Linville 提交于 1月 06, 2015

The return value should be initialized to false so that there's a
valid return value when there are no sessions that need work to be
done on them. Luckily, the side effect of using the uninitialized
value is an extra harmless driver call.

Coverity: CID 1260096
Fixes: 02219b3a ("mac80211: add WMM admission control support")
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
[extend commit message]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

cc72f6e2

05 1月, 2015 1 次提交

Revert "mac80211: Fix accounting of the tailroom-needed counter" · 1e359a5d

由 Johannes Berg 提交于 1月 05, 2015

This reverts commit ca34e3b5.

It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

Cc: stable@vger.kernel.org
Fixes: ca34e3b5 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: NChristopher Chavez <chrischavez@gmx.us>
Bisected-by: NLarry Finger <Larry.Finger@lwfinger.net>
Debugged-by: NChristian Lamparter <chunkeey@googlemail.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

1e359a5d

17 12月, 2014 1 次提交

mac80211: free management frame keys when removing station · 28a9bc68

由 Johannes Berg 提交于 12月 17, 2014

When writing the code to allow per-station GTKs, I neglected to
take into account the management frame keys (index 4 and 5) when
freeing the station and only added code to free the first four
data frame keys.

Fix this by iterating the array of keys over the right length.

Cc: stable@vger.kernel.org
Fixes: e31b8213 ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

28a9bc68

12 12月, 2014 10 次提交

cfg80211: correctly check ad-hoc channels · 185076d6

由 Arik Nemtsov 提交于 12月 03, 2014

Ad-hoc requires beaconing for regulatory purposes. Validate that the
channel is valid for beaconing, and not only enabled.
Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: NLuis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

185076d6

cfg80211: don't WARN about two consecutive Country IE hint · 70dcec5a

由 Emmanuel Grumbach 提交于 12月 02, 2014

This can happen and there is no point in added more
detection code lower in the stack. Catching these in one
single point (cfg80211) is enough. Stop WARNING about this
case.

This fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=89001

Cc: stable@vger.kernel.org
Fixes: 2f1c6c57 ("cfg80211: process non country IE conflicting first")
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

70dcec5a

mac80211: update the channel context after channel switch · 722ddb0d

由 Emmanuel Grumbach 提交于 11月 30, 2014

When the channel switch has been made, a vif is now using
the channel context which was reserved. When that happens,
we need to update the channel context since its parameters
may change.

I hit a case in which I switched to a 40Mhz channel but the
reserved channel context was still on 20Mhz. The rate control
would try to send 40Mhz packets on a 20Mhz channel context and
that made iwlwifi's firmware unhappy.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

722ddb0d

nl80211: check matches array length before acessing it · f89f46cf

由 Luciano Coelho 提交于 12月 01, 2014

If the userspace passes a malformed sched scan request (or a net
detect wowlan configuration) by adding a NL80211_ATTR_SCHED_SCAN_MATCH
attribute without any nested matchsets, a NULL pointer dereference
will occur.  Fix this by checking that we do have matchsets in our
array before trying to access it.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
IP: [<ffffffffa002fd69>] nl80211_parse_sched_scan.part.67+0x6e9/0x900 [cfg80211]
PGD 865c067 PUD 865b067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: iwlmvm(O) iwlwifi(O) mac80211(O) cfg80211(O) compat(O) [last unloaded: compat]
CPU: 2 PID: 2442 Comm: iw Tainted: G           O   3.17.2 #31
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880013800790 ti: ffff880008d80000 task.ti: ffff880008d80000
RIP: 0010:[<ffffffffa002fd69>]  [<ffffffffa002fd69>] nl80211_parse_sched_scan.part.67+0x6e9/0x900 [cfg80211]
RSP: 0018:ffff880008d838d0  EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000143c RSI: 0000000000000000 RDI: ffff880008ee8dd0
RBP: ffff880008d83948 R08: 0000000000000002 R09: 0000000000000019
R10: ffff88001d1b3c40 R11: 0000000000000002 R12: ffff880019e85e00
R13: 00000000fffffed4 R14: ffff880009757800 R15: 0000000000001388
FS:  00007fa3b6d13700(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000024 CR3: 0000000008670000 CR4: 00000000000006e0
Stack:
 ffff880009757800 ffff880000000001 0000000000000000 ffff880008ee84e0
 0000000000000000 ffff880009757800 00000000fffffed4 ffff880008d83948
 ffffffff814689c9 ffff880009757800 ffff880008ee8000 0000000000000000
Call Trace:
 [<ffffffff814689c9>] ? nla_parse+0xb9/0x120
 [<ffffffffa00306de>] nl80211_set_wowlan+0x75e/0x960 [cfg80211]
 [<ffffffff810bf3d5>] ? mark_held_locks+0x75/0xa0
 [<ffffffff8161a77b>] genl_family_rcv_msg+0x18b/0x360
 [<ffffffff810bf66d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8161a9d4>] genl_rcv_msg+0x84/0xc0
 [<ffffffff8161a950>] ? genl_family_rcv_msg+0x360/0x360
 [<ffffffff81618e79>] netlink_rcv_skb+0xa9/0xd0
 [<ffffffff81619458>] genl_rcv+0x28/0x40
 [<ffffffff816184a5>] netlink_unicast+0x105/0x180
 [<ffffffff8161886f>] netlink_sendmsg+0x34f/0x7a0
 [<ffffffff8105a097>] ? kvm_clock_read+0x27/0x40
 [<ffffffff815c644d>] sock_sendmsg+0x8d/0xc0
 [<ffffffff811a75c9>] ? might_fault+0xb9/0xc0
 [<ffffffff811a756e>] ? might_fault+0x5e/0xc0
 [<ffffffff815d5d26>] ? verify_iovec+0x56/0xe0
 [<ffffffff815c73e0>] ___sys_sendmsg+0x3d0/0x3e0
 [<ffffffff810a7be8>] ? sched_clock_cpu+0x98/0xd0
 [<ffffffff810611b4>] ? __do_page_fault+0x254/0x580
 [<ffffffff810bb39f>] ? up_read+0x1f/0x40
 [<ffffffff810611b4>] ? __do_page_fault+0x254/0x580
 [<ffffffff812146ed>] ? __fget_light+0x13d/0x160
 [<ffffffff815c7b02>] __sys_sendmsg+0x42/0x80
 [<ffffffff815c7b52>] SyS_sendmsg+0x12/0x20
 [<ffffffff81751f69>] system_call_fastpath+0x16/0x1b

Fixes: ea73cbce ("nl80211: fix scheduled scan RSSI matchset attribute confusion")
Cc: stable@vger.kernel.org [3.15+]
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

f89f46cf

cfg80211: avoid mem leak on driver hint set · 34f05f54

由 Arik Nemtsov 提交于 12月 04, 2014

In the already-set and intersect case of a driver-hint, the previous
wiphy regdomain was not freed before being reset with a copy of the
cfg80211 regdomain.

Cc: stable@vger.kernel.org
Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: NLuis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

34f05f54

cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers · 08f6f147

由 Jouni Malinen 提交于 12月 11, 2014

The VHT supported channel width field is a two bit integer, not a
bitfield. cfg80211_chandef_usable() was interpreting it incorrectly and
ended up rejecting 160 MHz channel width if the driver indicated support
for both 160 and 80+80 MHz channels.

Cc: stable@vger.kernel.org (3.16+)
Fixes: 3d9d1d66 ("nl80211/cfg80211: support VHT channel configuration")
       (however, no real drivers had 160 MHz support it until 3.16)
Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

08f6f147

mac80211: fix multicast LED blinking and counter · d025933e

由 Andreas Müller 提交于 12月 12, 2014

As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount"
stopped being incremented after the use-after-free fix. Furthermore, the
RX-LED will be triggered by every multicast frame (which wouldn't happen
before) which wouldn't allow the LED to rest at all.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the
patch.

Cc: stable@vger.kernel.org
Fixes: b8fff407 ("mac80211: fix use-after-free in defragmentation")
Signed-off-by: NAndreas Müller <goo@stapelspeicher.org>
[rewrite commit message]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

d025933e

mac80211: avoid using uninitialized stack data · 7e6225a1

由 Jes Sorensen 提交于 12月 10, 2014

Avoid a case where we would access uninitialized stack data if the AP
advertises HT support without 40MHz channel support.

Cc: stable@vger.kernel.org
Fixes: f3000e1b ("mac80211: fix broken use of VHT/20Mhz with some APs")
Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

7e6225a1

net: dsa: propagate error code from dsa_slave_phy_setup · 9697f1cd

由 Florian Fainelli 提交于 12月 11, 2014

In case we cannot attach to our slave netdevice PHY, error out and
propagate that error up to the caller: dsa_slave_create().

Fixes: 0d8bcdd3 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: NAndrey Volkov <andrey.volkov@nexvision.fr>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9697f1cd

net: dsa: handle non-existing PHYs on switch internal bus · 53013c77

由 Florian Fainelli 提交于 12月 11, 2014

In case there is no PHY at the designated address on the internal
switch, we would basically de-reference a null pointer here:

dsa_slave_phy_setup(...)
{
	p->phy = ds->slave_mii_bus->phy_map[p->port];
	phy_connect_direct(slave_dev, p->phy, dsa_slave_adjust_link,
				      ^------

This can be triggered when the platform configuration (platform_data or
Device Tree) indicates there should be a PHY device at this address, but
the HW is non-responsive, such that we cannot attach a PHY device at
this specific location.

Fix this by checking the return value prior to calling
phy_connect_direct().

CC: Andrew Lunn <andrew@lunn.ch>
Fixes: b31f65fb ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus")
Reported-by: NBrian Norris <computersforpeace@gmail.com>
Signed-off-by: NAndrey Volkov <andrey.volkov@nexvision.fr>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53013c77

11 12月, 2014 12 次提交

net: sock: fix access via invalid file descriptor · 198bf1b0

由 Alexei Starovoitov 提交于 12月 10, 2014

0day robot reported the following crash:
[   21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
[   21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2

It's due to bpf_prog_get() returning ERR_PTR.
Check it properly.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

198bf1b0

net: introduce helper macro for_each_cmsghdr · f95b414e

由 Gu Zheng 提交于 12月 11, 2014

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f95b414e

mm: memcontrol: lockless page counters · 3e32cb2e

由 Johannes Weiner 提交于 12月 10, 2014

Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page.  The
counter interface is also convoluted and does too many things.

Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it.  The translation from and to bytes then only
happens when interfacing with userspace.

The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:

vanilla:

   18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
         1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
            24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
     1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
 1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
     1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )

     132.474343877 seconds time elapsed                                          ( +-  0.21% )

lockless:

   12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
           832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
            15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
     1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
 2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
     1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )

      91.369330729 seconds time elapsed                                          ( +-  0.45% )

On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.

Notable differences between the old and new API:

- res_counter_charge() and res_counter_charge_nofail() become
  page_counter_try_charge() and page_counter_charge() resp. to match
  the more common kernel naming scheme of try_do()/do()

- res_counter_uncharge_until() is only ever used to cancel a local
  counter and never to uncharge bigger segments of a hierarchy, so
  it's replaced by the simpler page_counter_cancel()

- res_counter_set_limit() is replaced by page_counter_limit(), which
  expects its callers to serialize against themselves

- res_counter_memparse_write_strategy() is replaced by
  page_counter_limit(), which rounds down to the nearest page size -
  rather than up.  This is more reasonable for explicitely requested
  hard upper limits.

- to keep charging light-weight, page_counter_try_charge() charges
  speculatively, only to roll back if the result exceeds the limit.
  Because of this, a failing bigger charge can temporarily lock out
  smaller charges that would otherwise succeed.  The error is bounded
  to the difference between the smallest and the biggest possible
  charge size, so for memcg, this means that a failing THP charge can
  send base page charges into reclaim upto 2MB (4MB) before the limit
  would have been reached.  This should be acceptable.

[akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NVladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e32cb2e

irda: Convert function pointer arrays and uses to const · 785c20a0

由 Joe Perches 提交于 12月 10, 2014

Making things const is a good thing.

(x86-64 defconfig with all irda)
$ size net/irda/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 109276	   1868	    244	 111388	  1b31c	net/irda/built-in.o.new
 108828	   2316	    244	 111388	  1b31c	net/irda/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

785c20a0

llc: Make llc_sap_action_t function pointer arrays const · 22bbf5f3

由 Joe Perches 提交于 12月 10, 2014

It's better when function pointer arrays aren't modifiable.

Net change:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61193	  12758	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22bbf5f3

llc: Make llc_conn_ev_qfyr_t function pointer arrays const · 9b373069

由 Joe Perches 提交于 12月 10, 2014

It's better when function pointer arrays aren't modifiable.

Net change from original:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61065	  12886	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b373069

llc: Make function pointer arrays const · 14b7d95f

由 Joe Perches 提交于 12月 10, 2014

It's better when function pointer arrays aren't modifiable.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14b7d95f

net: replace remaining users of arch_fast_hash with jhash · 87545899

由 Daniel Borkmann 提交于 12月 10, 2014

This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.

While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.

Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.

  [1] https://patchwork.ozlabs.org/patch/299369/
  [2] https://patchwork.ozlabs.org/patch/418756/

Cc: Neil Brown <neilb@suse.de>
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87545899

netlink: use jhash as hashfn for rhashtable · 7f19fc5e

由 Daniel Borkmann 提交于 12月 10, 2014

For netlink, we shouldn't be using arch_fast_hash() as a hashing
discipline, but rather jhash() instead.

Since netlink sockets can be opened by any user, a local attacker
would be able to easily create collisions with the DPDK-derived
arch_fast_hash(), which trades off performance for security by
using crc32 CPU instructions on x86_64.

While it might have a legimite use case in other places, it should
be avoided in netlink context, though. As rhashtable's API is very
flexible, we could later on still decide on other hashing disciplines,
if legitimate.

Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f19fc5e

tipc: fix broadcast wakeup contention after congestion · 340b6e59

由 Richard Alpe 提交于 12月 10, 2014

commit 908344cd ("tipc: fix bug in multicast congestion handling")
introduced a race in the broadcast link wakeup functionality.

This patch eliminates this broadcast link wakeup race caused by
operation on the wakeup list without proper locking. If this race
hit and corrupted the list all subsequent wakeup messages would be
lost, resulting in a considerable memory leak.
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

340b6e59

net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb · fd11a83d

由 Alexander Duyck 提交于 12月 09, 2014

This change pulls the core functionality out of __netdev_alloc_skb and
places them in a new function named __alloc_rx_skb. The reason for doing
this is to make these bits accessible to a new function __napi_alloc_skb.
In addition __alloc_rx_skb now has a new flags value that is used to
determine which page frag pool to allocate from. If the SKB_ALLOC_NAPI
flag is set then the NAPI pool is used. The advantage of this is that we
do not have to use local_irq_save/restore when accessing the NAPI pool from
NAPI context.

In my test setup I saw at least 11ns of savings using the napi_alloc_skb
function versus the netdev_alloc_skb function, most of this being due to
the fact that we didn't have to call local_irq_save/restore.

The main use case for napi_alloc_skb would be for things such as copybreak
or page fragment based receive paths where an skb is allocated after the
data has been received instead of before.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd11a83d

net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag · ffde7328

由 Alexander Duyck 提交于 12月 09, 2014

This patch splits the netdev_alloc_frag function up so that it can be used
on one of two page frag pools instead of being fixed on the
netdev_alloc_cache.  By doing this we can add a NAPI specific function
__napi_alloc_frag that accesses a pool that is only used from softirq
context.  The advantage to this is that we do not need to call
local_irq_save/restore which can be a significant savings.

I also took the opportunity to refactor the core bits that were placed in
__alloc_page_frag.  First I updated the allocation to do either a 32K
allocation or an order 0 page.  This is based on the changes in commmit
d9b2938a where it was found that latencies could be reduced in case of
failures.  Then I also rewrote the logic to work from the end of the page to
the start.  By doing this the size value doesn't have to be used unless we
have run out of space for page fragments.  Finally I cleaned up the atomic
bits so that we just do an atomic_sub_and_test and if that returns true then
we set the page->_count via an atomic_set.  This way we can remove the extra
conditional for the atomic_read since it would have led to an atomic_inc in
the case of success anyway.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffde7328

10 12月, 2014 10 次提交

net: sched: cls: use nla_nest_cancel instead of nlmsg_trim · 6ea3b446

由 Jiri Pirko 提交于 12月 09, 2014

To cancel nesting, this function is more convenient.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ea3b446

net: fix suspicious rcu_dereference_check in net/sched/sch_fq_codel.c · 69204cf7

由 Valdis.Kletnieks@vt.edu 提交于 12月 09, 2014

commit 46e5da40 (net: qdisc: use rcu prefix and silence
 sparse warnings) triggers a spurious warning:

net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage!

The code should be using the _bh variant of rcu_dereference.
Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69204cf7

tcp: fix more NULL deref after prequeue changes · 0f85feae

由 Eric Dumazet 提交于 12月 09, 2014

When I cooked commit c3658e8d ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)

Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.
Reported-by: NDann Frazier <dann.frazier@canonical.com>
Bisected-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: ca777eff ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f85feae

tipc: avoid double lock 'spin_lock:&seq->lock' · 023160bc

由 Ying Xue 提交于 12月 09, 2014

The commit fb9962f3 ("tipc: ensure all name sequences are properly
protected with its lock") involves below errors:

net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock'
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

023160bc

rocker: remove swdev mode · 1d460b98

由 Roopa Prabhu 提交于 12月 08, 2014

Remove use of 'swdev' mode in rocker. rocker dev offloads
can use the BRIDGE_FLAGS_SELF to indicate offload to hardware.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d460b98

net: avoid to call skb_queue_len again · e008f3f0

由 Li RongQing 提交于 12月 08, 2014

the queue length of sd->input_pkt_queue has been put into qlen,
and impossible to change, since hold the lock
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e008f3f0

tcp: refine TSO autosizing · 605ad7f1

由 Eric Dumazet 提交于 12月 07, 2014

Commit 95bd09eb ("tcp: TSO packets automatic sizing") tried to
control TSO size, but did this at the wrong place (sendmsg() time)

At sendmsg() time, we might have a pessimistic view of flow rate,
and we end up building very small skbs (with 2 MSS per skb).

This is bad because :

 - It sends small TSO packets even in Slow Start where rate quickly
   increases.
 - It tends to make socket write queue very big, increasing tcp_ack()
   processing time, but also increasing memory needs, not necessarily
   accounted for, as fast clones overhead is currently ignored.
 - Lower GRO efficiency and more ACK packets.

Servers with a lot of small lived connections suffer from this.

Lets instead fill skbs as much as possible (64KB of payload), but split
them at xmit time, when we have a precise idea of the flow rate.
skb split is actually quite efficient.

Patch looks bigger than necessary, because TCP Small Queue decision now
has to take place after the eventual split.

As Neal suggested, introduce a new tcp_tso_autosize() helper, so that
tcp_tso_should_defer() can be synchronized on same goal.

Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable
contains number of mss that we can put in GSO packet, and is not
related to the autosizing goal anymore.

Tested:

40 ms rtt link

nstat >/dev/null
netperf -H remote -l -2000000 -- -s 1000000
nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"

Before patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/s

 87380 2000000 2000000    0.36         44.22
IpInReceives                    600                0.0
IpOutRequests                   599                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2033249            0.0

After patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380 2000000 2000000    0.36       44.27
IpInReceives                    221                0.0
IpOutRequests                   232                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2013953            0.0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

605ad7f1

A
skb_copy_datagram_iovec() can die · d3a9632f
由 Al Viro 提交于 11月 24, 2014
```
no callers other than itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d3a9632f

switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives · e5a4b0bb

由 Al Viro 提交于 11月 24, 2014

... making both non-draining.  That means that tcp_recvmsg() becomes
non-draining.  And _that_ would break iscsit_do_rx_data() unless we
	a) make sure tcp_recvmsg() is uniformly non-draining (it is)
	b) make sure it copes with arbitrary (including shifted)
iov_iter (it does, all it uses is iov_iter primitives)
	c) make iscsit_do_rx_data() initialize ->msg_iter only once.

Fortunately, (c) is doable with minimal work and we are rid of one
the two places where kernel send/recvmsg users would be unhappy with
non-draining behaviour.

Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
The exception is skcipher_recvmsg() and it also isn't hard to convert
to primitives (iov_iter_get_pages() is needed there).  That'll wait
a bit - there's some interplay with ->sendmsg() path for that one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e5a4b0bb

first fruits - kill l2cap ->memcpy_fromiovec() · 17836394

由 Al Viro 提交于 11月 24, 2014

Just use copy_from_iter().  That's what this method is trying to do
in all cases, in a very convoluted fashion.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

17836394