- 23 1月, 2015 4 次提交
-
-
由 Johannes Berg 提交于
In case userspace attempts to obtain key information for or delete a unicast key, this is currently erroneously rejected unless the driver sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it was never noticed. Fix that, and while at it fix a potential memory leak: the error path in the get_key() function was placed after allocating a message but didn't free it - move it to a better place. Luckily admin permissions are needed to call this operation. Cc: stable@vger.kernel.org Fixes: e31b8213 ("cfg80211/mac80211: allow per-station GTKs") Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Mathy Vanhoef 提交于
Fix a regression introduced by commit a5e70697 ("mac80211: add radiotap flag and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by using the CCK flag again. Cc: stable@vger.kernel.org Fixes: a5e70697 ("mac80211: add radiotap flag and handling for 5/10 MHz") Signed-off-by: NMathy Vanhoef <vanhoefm@gmail.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Fred Chou 提交于
HT Control field may also be present in management frames, as defined in 8.2.4.1.10 of 802.11-2012. Account for this in calculation of header length. Signed-off-by: NFred Chou <fred.chou.nd@gmail.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Luciano Coelho 提交于
In normal cases (i.e. when we are fully associated), cfg80211 takes care of removing all the stations before calling suspend in mac80211. But in the corner case when we suspend during authentication or association, mac80211 needs to roll back the station states. But we shouldn't roll back the station states in the suspend function, because this is taken care of in other parts of the code, except for WDS interfaces. For AP types of interfaces, cfg80211 takes care of disconnecting all stations before calling the driver's suspend code. For station interfaces, this is done in the quiesce code. For WDS interfaces we still need to do it here, so move the code into a new switch case for WDS. Cc: stable@kernel.org [3.15+] Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 07 1月, 2015 2 次提交
-
-
由 Arik Nemtsov 提交于
If a P2P GO is active, the cfg80211_reg_can_beacon function will take the wdev lock, in its call to cfg80211_go_permissive_chan. But the wdev lock is already taken by the parent channel-checking function, causing a deadlock. Split the checking code into two parts. The first part will check if the wdev is active and saves the channel under the wdev lock. The second part will check actual channel validity according to type. Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: NIlan Peer <ilan.peer@intel.com> Reviewed-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 John Linville 提交于
The return value should be initialized to false so that there's a valid return value when there are no sessions that need work to be done on them. Luckily, the side effect of using the uninitialized value is an extra harmless driver call. Coverity: CID 1260096 Fixes: 02219b3a ("mac80211: add WMM admission control support") Signed-off-by: NJohn W. Linville <linville@tuxdriver.com> [extend commit message] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 05 1月, 2015 1 次提交
-
-
由 Johannes Berg 提交于
This reverts commit ca34e3b5. It turns out that the p54 and cw2100 drivers assume that there's tailroom even when they don't say they really need it. However, there's currently no way for them to explicitly say they do need it, so for now revert this. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331. Cc: stable@vger.kernel.org Fixes: ca34e3b5 ("mac80211: Fix accounting of the tailroom-needed counter") Reported-by: NChristopher Chavez <chrischavez@gmx.us> Bisected-by: NLarry Finger <Larry.Finger@lwfinger.net> Debugged-by: NChristian Lamparter <chunkeey@googlemail.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 17 12月, 2014 1 次提交
-
-
由 Johannes Berg 提交于
When writing the code to allow per-station GTKs, I neglected to take into account the management frame keys (index 4 and 5) when freeing the station and only added code to free the first four data frame keys. Fix this by iterating the array of keys over the right length. Cc: stable@vger.kernel.org Fixes: e31b8213 ("cfg80211/mac80211: allow per-station GTKs") Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 12 12月, 2014 10 次提交
-
-
由 Arik Nemtsov 提交于
Ad-hoc requires beaconing for regulatory purposes. Validate that the channel is valid for beaconing, and not only enabled. Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Emmanuel Grumbach 提交于
This can happen and there is no point in added more detection code lower in the stack. Catching these in one single point (cfg80211) is enough. Stop WARNING about this case. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=89001 Cc: stable@vger.kernel.org Fixes: 2f1c6c57 ("cfg80211: process non country IE conflicting first") Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Emmanuel Grumbach 提交于
When the channel switch has been made, a vif is now using the channel context which was reserved. When that happens, we need to update the channel context since its parameters may change. I hit a case in which I switched to a 40Mhz channel but the reserved channel context was still on 20Mhz. The rate control would try to send 40Mhz packets on a 20Mhz channel context and that made iwlwifi's firmware unhappy. Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Luciano Coelho 提交于
If the userspace passes a malformed sched scan request (or a net detect wowlan configuration) by adding a NL80211_ATTR_SCHED_SCAN_MATCH attribute without any nested matchsets, a NULL pointer dereference will occur. Fix this by checking that we do have matchsets in our array before trying to access it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 IP: [<ffffffffa002fd69>] nl80211_parse_sched_scan.part.67+0x6e9/0x900 [cfg80211] PGD 865c067 PUD 865b067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: iwlmvm(O) iwlwifi(O) mac80211(O) cfg80211(O) compat(O) [last unloaded: compat] CPU: 2 PID: 2442 Comm: iw Tainted: G O 3.17.2 #31 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff880013800790 ti: ffff880008d80000 task.ti: ffff880008d80000 RIP: 0010:[<ffffffffa002fd69>] [<ffffffffa002fd69>] nl80211_parse_sched_scan.part.67+0x6e9/0x900 [cfg80211] RSP: 0018:ffff880008d838d0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 000000000000143c RSI: 0000000000000000 RDI: ffff880008ee8dd0 RBP: ffff880008d83948 R08: 0000000000000002 R09: 0000000000000019 R10: ffff88001d1b3c40 R11: 0000000000000002 R12: ffff880019e85e00 R13: 00000000fffffed4 R14: ffff880009757800 R15: 0000000000001388 FS: 00007fa3b6d13700(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 0000000008670000 CR4: 00000000000006e0 Stack: ffff880009757800 ffff880000000001 0000000000000000 ffff880008ee84e0 0000000000000000 ffff880009757800 00000000fffffed4 ffff880008d83948 ffffffff814689c9 ffff880009757800 ffff880008ee8000 0000000000000000 Call Trace: [<ffffffff814689c9>] ? nla_parse+0xb9/0x120 [<ffffffffa00306de>] nl80211_set_wowlan+0x75e/0x960 [cfg80211] [<ffffffff810bf3d5>] ? mark_held_locks+0x75/0xa0 [<ffffffff8161a77b>] genl_family_rcv_msg+0x18b/0x360 [<ffffffff810bf66d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8161a9d4>] genl_rcv_msg+0x84/0xc0 [<ffffffff8161a950>] ? genl_family_rcv_msg+0x360/0x360 [<ffffffff81618e79>] netlink_rcv_skb+0xa9/0xd0 [<ffffffff81619458>] genl_rcv+0x28/0x40 [<ffffffff816184a5>] netlink_unicast+0x105/0x180 [<ffffffff8161886f>] netlink_sendmsg+0x34f/0x7a0 [<ffffffff8105a097>] ? kvm_clock_read+0x27/0x40 [<ffffffff815c644d>] sock_sendmsg+0x8d/0xc0 [<ffffffff811a75c9>] ? might_fault+0xb9/0xc0 [<ffffffff811a756e>] ? might_fault+0x5e/0xc0 [<ffffffff815d5d26>] ? verify_iovec+0x56/0xe0 [<ffffffff815c73e0>] ___sys_sendmsg+0x3d0/0x3e0 [<ffffffff810a7be8>] ? sched_clock_cpu+0x98/0xd0 [<ffffffff810611b4>] ? __do_page_fault+0x254/0x580 [<ffffffff810bb39f>] ? up_read+0x1f/0x40 [<ffffffff810611b4>] ? __do_page_fault+0x254/0x580 [<ffffffff812146ed>] ? __fget_light+0x13d/0x160 [<ffffffff815c7b02>] __sys_sendmsg+0x42/0x80 [<ffffffff815c7b52>] SyS_sendmsg+0x12/0x20 [<ffffffff81751f69>] system_call_fastpath+0x16/0x1b Fixes: ea73cbce ("nl80211: fix scheduled scan RSSI matchset attribute confusion") Cc: stable@vger.kernel.org [3.15+] Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Arik Nemtsov 提交于
In the already-set and intersect case of a driver-hint, the previous wiphy regdomain was not freed before being reset with a copy of the cfg80211 regdomain. Cc: stable@vger.kernel.org Signed-off-by: NArik Nemtsov <arikx.nemtsov@intel.com> Acked-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Jouni Malinen 提交于
The VHT supported channel width field is a two bit integer, not a bitfield. cfg80211_chandef_usable() was interpreting it incorrectly and ended up rejecting 160 MHz channel width if the driver indicated support for both 160 and 80+80 MHz channels. Cc: stable@vger.kernel.org (3.16+) Fixes: 3d9d1d66 ("nl80211/cfg80211: support VHT channel configuration") (however, no real drivers had 160 MHz support it until 3.16) Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Andreas Müller 提交于
As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount" stopped being incremented after the use-after-free fix. Furthermore, the RX-LED will be triggered by every multicast frame (which wouldn't happen before) which wouldn't allow the LED to rest at all. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the patch. Cc: stable@vger.kernel.org Fixes: b8fff407 ("mac80211: fix use-after-free in defragmentation") Signed-off-by: NAndreas Müller <goo@stapelspeicher.org> [rewrite commit message] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Jes Sorensen 提交于
Avoid a case where we would access uninitialized stack data if the AP advertises HT support without 40MHz channel support. Cc: stable@vger.kernel.org Fixes: f3000e1b ("mac80211: fix broken use of VHT/20Mhz with some APs") Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Florian Fainelli 提交于
In case we cannot attach to our slave netdevice PHY, error out and propagate that error up to the caller: dsa_slave_create(). Fixes: 0d8bcdd3 ("net: dsa: allow for more complex PHY setups") Signed-off-by: NAndrey Volkov <andrey.volkov@nexvision.fr> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
In case there is no PHY at the designated address on the internal switch, we would basically de-reference a null pointer here: dsa_slave_phy_setup(...) { p->phy = ds->slave_mii_bus->phy_map[p->port]; phy_connect_direct(slave_dev, p->phy, dsa_slave_adjust_link, ^------ This can be triggered when the platform configuration (platform_data or Device Tree) indicates there should be a PHY device at this address, but the HW is non-responsive, such that we cannot attach a PHY device at this specific location. Fix this by checking the return value prior to calling phy_connect_direct(). CC: Andrew Lunn <andrew@lunn.ch> Fixes: b31f65fb ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus") Reported-by: NBrian Norris <computersforpeace@gmail.com> Signed-off-by: NAndrey Volkov <andrey.volkov@nexvision.fr> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 12月, 2014 12 次提交
-
-
由 Alexei Starovoitov 提交于
0day robot reported the following crash: [ 21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 [ 21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2 It's due to bpf_prog_get() returning ERR_PTR. Check it properly. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gu Zheng 提交于
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Weiner 提交于
Memory is internally accounted in bytes, using spinlock-protected 64-bit counters, even though the smallest accounting delta is a page. The counter interface is also convoluted and does too many things. Introduce a new lockless word-sized page counter API, then change all memory accounting over to it. The translation from and to bytes then only happens when interfacing with userspace. The removed locking overhead is noticable when scaling beyond the per-cpu charge caches - on a 4-socket machine with 144-threads, the following test shows the performance differences of 288 memcgs concurrently running a page fault benchmark: vanilla: 18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% ) 1,380,638 context-switches # 0.074 K/sec ( +- 0.75% ) 24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% ) 1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% ) 50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% ) 1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% ) 1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% ) 132.474343877 seconds time elapsed ( +- 0.21% ) lockless: 12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% ) 832,850 context-switches # 0.068 K/sec ( +- 0.54% ) 15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% ) 1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% ) 32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% ) 2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% ) 1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% ) 91.369330729 seconds time elapsed ( +- 0.45% ) On top of improved scalability, this also gets rid of the icky long long types in the very heart of memcg, which is great for 32 bit and also makes the code a lot more readable. Notable differences between the old and new API: - res_counter_charge() and res_counter_charge_nofail() become page_counter_try_charge() and page_counter_charge() resp. to match the more common kernel naming scheme of try_do()/do() - res_counter_uncharge_until() is only ever used to cancel a local counter and never to uncharge bigger segments of a hierarchy, so it's replaced by the simpler page_counter_cancel() - res_counter_set_limit() is replaced by page_counter_limit(), which expects its callers to serialize against themselves - res_counter_memparse_write_strategy() is replaced by page_counter_limit(), which rounds down to the nearest page size - rather than up. This is more reasonable for explicitely requested hard upper limits. - to keep charging light-weight, page_counter_try_charge() charges speculatively, only to roll back if the result exceeds the limit. Because of this, a failing bigger charge can temporarily lock out smaller charges that would otherwise succeed. The error is bounded to the difference between the smallest and the biggest possible charge size, so for memcg, this means that a failing THP charge can send base page charges into reclaim upto 2MB (4MB) before the limit would have been reached. This should be acceptable. [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse] [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE] Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVladimir Davydov <vdavydov@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Making things const is a good thing. (x86-64 defconfig with all irda) $ size net/irda/built-in.o* text data bss dec hex filename 109276 1868 244 111388 1b31c net/irda/built-in.o.new 108828 2316 244 111388 1b31c net/irda/built-in.o.old Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
It's better when function pointer arrays aren't modifiable. Net change: $ size net/llc/built-in.o.* text data bss dec hex filename 61193 12758 1344 75295 1261f net/llc/built-in.o.new 47113 27030 1344 75487 126df net/llc/built-in.o.old Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
It's better when function pointer arrays aren't modifiable. Net change from original: $ size net/llc/built-in.o.* text data bss dec hex filename 61065 12886 1344 75295 1261f net/llc/built-in.o.new 47113 27030 1344 75487 126df net/llc/built-in.o.old Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
It's better when function pointer arrays aren't modifiable. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32 accelerated flow hash if available"), and other remaining arch_fast_hash() users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") where it has been used as a hash function for bloom filtering. While we think that these users are actually not much of concern, it has been requested to remove the arch_fast_hash() library bits that arose from [1] entirely as per recent discussion [2]. The main argument is that using it as a hash may introduce bias due to its linearity (see avalanche criterion) and thus makes it less clear (though we tried to document that) when this security/performance trade-off is actually acceptable for a general purpose library function. Lets therefore avoid any further confusion on this matter and remove it to prevent any future accidental misuse of it. For the time being, this is going to make hashing of flow keys a bit more expensive in the ovs case, but future work could reevaluate a different hashing discipline. [1] https://patchwork.ozlabs.org/patch/299369/ [2] https://patchwork.ozlabs.org/patch/418756/ Cc: Neil Brown <neilb@suse.de> Cc: Francesco Fusco <fusco@ntop.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
For netlink, we shouldn't be using arch_fast_hash() as a hashing discipline, but rather jhash() instead. Since netlink sockets can be opened by any user, a local attacker would be able to easily create collisions with the DPDK-derived arch_fast_hash(), which trades off performance for security by using crc32 CPU instructions on x86_64. While it might have a legimite use case in other places, it should be avoided in netlink context, though. As rhashtable's API is very flexible, we could later on still decide on other hashing disciplines, if legitimate. Reference: http://thread.gmane.org/gmane.linux.kernel/1844123 Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NThomas Graf <tgraf@suug.ch> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
commit 908344cd ("tipc: fix bug in multicast congestion handling") introduced a race in the broadcast link wakeup functionality. This patch eliminates this broadcast link wakeup race caused by operation on the wakeup list without proper locking. If this race hit and corrupted the list all subsequent wakeup messages would be lost, resulting in a considerable memory leak. Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change pulls the core functionality out of __netdev_alloc_skb and places them in a new function named __alloc_rx_skb. The reason for doing this is to make these bits accessible to a new function __napi_alloc_skb. In addition __alloc_rx_skb now has a new flags value that is used to determine which page frag pool to allocate from. If the SKB_ALLOC_NAPI flag is set then the NAPI pool is used. The advantage of this is that we do not have to use local_irq_save/restore when accessing the NAPI pool from NAPI context. In my test setup I saw at least 11ns of savings using the napi_alloc_skb function versus the netdev_alloc_skb function, most of this being due to the fact that we didn't have to call local_irq_save/restore. The main use case for napi_alloc_skb would be for things such as copybreak or page fragment based receive paths where an skb is allocated after the data has been received instead of before. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch splits the netdev_alloc_frag function up so that it can be used on one of two page frag pools instead of being fixed on the netdev_alloc_cache. By doing this we can add a NAPI specific function __napi_alloc_frag that accesses a pool that is only used from softirq context. The advantage to this is that we do not need to call local_irq_save/restore which can be a significant savings. I also took the opportunity to refactor the core bits that were placed in __alloc_page_frag. First I updated the allocation to do either a 32K allocation or an order 0 page. This is based on the changes in commmit d9b2938a where it was found that latencies could be reduced in case of failures. Then I also rewrote the logic to work from the end of the page to the start. By doing this the size value doesn't have to be used unless we have run out of space for page fragments. Finally I cleaned up the atomic bits so that we just do an atomic_sub_and_test and if that returns true then we set the page->_count via an atomic_set. This way we can remove the extra conditional for the atomic_read since it would have led to an atomic_inc in the case of success anyway. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 12月, 2014 10 次提交
-
-
由 Jiri Pirko 提交于
To cancel nesting, this function is more convenient. Signed-off-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Valdis.Kletnieks@vt.edu 提交于
commit 46e5da40 (net: qdisc: use rcu prefix and silence sparse warnings) triggers a spurious warning: net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage! The code should be using the _bh variant of rcu_dereference. Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When I cooked commit c3658e8d ("tcp: fix possible NULL dereference in tcp_vX_send_reset()") I missed other spots we could deref a NULL skb_dst(skb) Again, if a socket is provided, we do not need skb_dst() to get a pointer to network namespace : sock_net(sk) is good enough. Reported-by: NDann Frazier <dann.frazier@canonical.com> Bisected-by: NDann Frazier <dann.frazier@canonical.com> Tested-by: NDann Frazier <dann.frazier@canonical.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: ca777eff ("tcp: remove dst refcount false sharing for prequeue mode") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
The commit fb9962f3 ("tipc: ensure all name sequences are properly protected with its lock") involves below errors: net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock' Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
Remove use of 'swdev' mode in rocker. rocker dev offloads can use the BRIDGE_FLAGS_SELF to indicate offload to hardware. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NScott Feldman <sfeldma@gmail.com> Signed-off-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Li RongQing 提交于
the queue length of sd->input_pkt_queue has been put into qlen, and impossible to change, since hold the lock Signed-off-by: NLi RongQing <roy.qing.li@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Commit 95bd09eb ("tcp: TSO packets automatic sizing") tried to control TSO size, but did this at the wrong place (sendmsg() time) At sendmsg() time, we might have a pessimistic view of flow rate, and we end up building very small skbs (with 2 MSS per skb). This is bad because : - It sends small TSO packets even in Slow Start where rate quickly increases. - It tends to make socket write queue very big, increasing tcp_ack() processing time, but also increasing memory needs, not necessarily accounted for, as fast clones overhead is currently ignored. - Lower GRO efficiency and more ACK packets. Servers with a lot of small lived connections suffer from this. Lets instead fill skbs as much as possible (64KB of payload), but split them at xmit time, when we have a precise idea of the flow rate. skb split is actually quite efficient. Patch looks bigger than necessary, because TCP Small Queue decision now has to take place after the eventual split. As Neal suggested, introduce a new tcp_tso_autosize() helper, so that tcp_tso_should_defer() can be synchronized on same goal. Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable contains number of mss that we can put in GSO packet, and is not related to the autosizing goal anymore. Tested: 40 ms rtt link nstat >/dev/null netperf -H remote -l -2000000 -- -s 1000000 nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets" Before patch : Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/s 87380 2000000 2000000 0.36 44.22 IpInReceives 600 0.0 IpOutRequests 599 0.0 TcpOutSegs 1397 0.0 IpExtOutOctets 2033249 0.0 After patch : Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 2000000 2000000 0.36 44.27 IpInReceives 221 0.0 IpOutRequests 232 0.0 TcpOutSegs 1397 0.0 IpExtOutOctets 2013953 0.0 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Al Viro 提交于
no callers other than itself. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... making both non-draining. That means that tcp_recvmsg() becomes non-draining. And _that_ would break iscsit_do_rx_data() unless we a) make sure tcp_recvmsg() is uniformly non-draining (it is) b) make sure it copes with arbitrary (including shifted) iov_iter (it does, all it uses is iov_iter primitives) c) make iscsit_do_rx_data() initialize ->msg_iter only once. Fortunately, (c) is doable with minimal work and we are rid of one the two places where kernel send/recvmsg users would be unhappy with non-draining behaviour. Actually, that makes all but one of ->recvmsg() instances iov_iter-clean. The exception is skcipher_recvmsg() and it also isn't hard to convert to primitives (iov_iter_get_pages() is needed there). That'll wait a bit - there's some interplay with ->sendmsg() path for that one. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Just use copy_from_iter(). That's what this method is trying to do in all cases, in a very convoluted fashion. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-