- 12 7月, 2017 10 次提交
-
-
由 WANG Cong 提交于
We are not allowed to block on the RCU reader side, so can't just hold the mutex as before. As a quick fix, convert it to a spinlock. Fixes: d9f1f61c ("tap: Extending tap device create/destroy APIs") Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com> Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Cc: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guilherme G. Piccoli 提交于
Since the introduction of ULD (Upper-Layer Drivers), the MSI-X deallocating path changed in cxgb4: the driver frees the interrupts of ULD when unregistering it or on shutdown PCI handler. Problem is that if a MSI-X is not freed before deallocated in the PCI layer, it will trigger a BUG() due to still "alive" interrupt being tentatively quiesced. The below trace was observed when doing a simple unbind of Chelsio's adapter PCI function, like: "echo 001e:80:00.4 > /sys/bus/pci/drivers/cxgb4/unbind" Trace: kernel BUG at drivers/pci/msi.c:352! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c0000000005a5e60] free_msi_irqs+0xa0/0x250 LR [c0000000005a5e50] free_msi_irqs+0x90/0x250 Call Trace: [c0000000005a5e50] free_msi_irqs+0x90/0x250 (unreliable) [c0000000005a72c4] pci_disable_msix+0x124/0x180 [d000000011e06708] disable_msi+0x88/0xb0 [cxgb4] [d000000011e06948] free_some_resources+0xa8/0x160 [cxgb4] [d000000011e06d60] remove_one+0x170/0x3c0 [cxgb4] [c00000000058a910] pci_device_remove+0x70/0x110 [c00000000064ef04] device_release_driver_internal+0x1f4/0x2c0 ... This patch fixes the issue by refactoring the shutdown path of ULD on cxgb4 driver, by properly freeing and disabling interrupts on PCI remove handler too. Fixes: 0fbc81b3 ("Allocate resources dynamically for all cxgb4 ULD's") Reported-by: NHarsha Thyagaraja <hathyaga@in.ibm.com> Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kalderon, Michal 提交于
The option "h" (host order ) exists for ipv4 only. Remove the h when printing ipv6 addresses. Lead to the following smatch warning: drivers/net/ethernet/qlogic/qed/qed_iwarp.c:585 qed_iwarp_print_tcp_ramrod() warn: '%pI6' can only be followed by c drivers/net/ethernet/qlogic/qed/qed_iwarp.c:1521 qed_iwarp_print_cm_info() warn: '%pI6' can only be followed by c Fixes commit 456a5849 ("qed: iWARP CM add passive side connect") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christophe Jaillet 提交于
'alloc_dma_[rt]x_desc_resources()' functions look very close. Remove a useless initialization and use the same label name for error handling path in order to get them even closer. Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christophe Jaillet 提交于
If the first 'kmalloc_array' within the loop fails, we should free what as already been allocated, as done in all other error handling path. Fixes: ce736788 ("net: stmmac: adding multiple buffers for TX") Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christophe Jaillet 提交于
If the first 'kmalloc_array' within the loop fails, we should free what as already been allocated, as done in all other error handling path. Fixes: 54139cf3 ("net: stmmac: adding multiple buffers for rx") Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christophe Jaillet 提交于
if 'ioread32()' returns 0xFFFFFFF, we have to go through the error handling path as done everywhere else in this function. Move the 'err_free_wq' label to better match its name and its location and add a new label 'err_disable_wq'. Update the code accordingly. Fixes: 373fb087 ("enic: add devcmd2") Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
The PF driver sets up a list of firmware commands from the VF driver that needs to be forwarded to the PF for approval. This list is a 256-bit bitmap. The code that sets up the bitmap falls apart on big-endian architecture. __set_bit() does not work because it operates on long types whereas the firmware interface is defined in u32 types, causing bits in the wrong 32-bit word to be set. Fix it by setting the proper bits on an array of u32. Fixes: de68f5de ("bnxt_en: Fix bitmap declaration to work on 32-bit arches.") Reported-by: NShannon Nelson <shannon.nelson@oracle.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
When changing channels from combined to rx/tx or vice versa, the code uses the wrong "sh" parameter to determine if we are reserving rings for shared or non-shared mode. It should be using the ethtool requested "sh" parameter instead of the current "sh" parameter. Fix it by passing the "sh" parameter to bnxt_reserve_rings(). For ethtool, we will pass in the requested "sh" parameter. Fixes: 391be5c2 ("bnxt_en: Implement new scheme to reserve tx rings.") Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
.ndo_get_stats64() may not be protected by RTNL and can race with .ndo_stop() or other ethtool operations that can free the statistics memory. Fix it by setting a new flag BNXT_STATE_READ_STATS and then proceeding to read statistics memory only if the state is OPEN. The close path that frees the memory clears the OPEN state and then waits for the BNXT_STATE_READ_STATS to clear before proceeding to free the statistics memory. Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2017 1 次提交
-
-
由 Arnd Bergmann 提交于
The new IPSec offload code introduced a build error: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.o: In function `mlx5e_ipsec_build_inverse_table': ipsec_rxtx.c:(.text+0x556): undefined reference Another patch was added on top to fix the build error, but that introduced a new bug, as we now use the remainder of the division rather than the result. This makes it use the correct helper function instead. Fixes: 5dfd87b6 ("net/mlx5: IPSec, Fix 64-bit division on 32-bit builds") Fixes: 2ac9cfe7 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NIlan Tayari <ilant@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2017 7 次提交
-
-
由 Gustavo A. R. Silva 提交于
Remove useless local variables _match_, _soc_ and the code related. Notice that const struct of_device_id of_mtk_match[] = { { .compatible = "mediatek,mt2701-eth" }, {}, }; So match->data is NULL. Suggested-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
As Hongjun/Nicolas summarized in their original patch: " When a device changes from one netns to another, it's first unregistered, then the netns reference is updated and the dev is registered in the new netns. Thus, when a slave moves to another netns, it is first unregistered. This triggers a NETDEV_UNREGISTER event which is caught by the bonding driver. The driver calls bond_release(), which calls dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in the old netns). " This is a very special case, because the device is being unregistered no one should still care about the NETDEV_CHANGEMTU event triggered at this point, we can avoid broadcasting this event on this path, and avoid touching inetdev_event()/addrconf_notify() path. It requires to export __dev_set_mtu() to bonding driver. Reported-by: NHongjun Li <hongjun.li@6wind.com> Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yunsheng Lin 提交于
skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK, which cause hns_nic_net_xmit to use a freed skb. BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940... [17659.112635] alloc_debug_processing+0x18c/0x1a0 [17659.117208] __slab_alloc+0x52c/0x560 [17659.120909] kmem_cache_alloc_node+0xac/0x2c0 [17659.125309] __alloc_skb+0x6c/0x260 [17659.128837] tcp_send_ack+0x8c/0x280 [17659.132449] __tcp_ack_snd_check+0x9c/0xf0 [17659.136587] tcp_rcv_established+0x5a4/0xa70 [17659.140899] tcp_v4_do_rcv+0x27c/0x620 [17659.144687] tcp_prequeue_process+0x108/0x170 [17659.149085] tcp_recvmsg+0x940/0x1020 [17659.152787] inet_recvmsg+0x124/0x180 [17659.156488] sock_recvmsg+0x64/0x80 [17659.160012] SyS_recvfrom+0xd8/0x180 [17659.163626] __sys_trace_return+0x0/0x4 [17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13 [17659.174000] free_debug_processing+0x1d4/0x2c0 [17659.178486] __slab_free+0x240/0x390 [17659.182100] kmem_cache_free+0x24c/0x270 [17659.186062] kfree_skbmem+0xa0/0xb0 [17659.189587] __kfree_skb+0x28/0x40 [17659.193025] napi_gro_receive+0x168/0x1c0 [17659.197074] hns_nic_rx_up_pro+0x58/0x90 [17659.201038] hns_nic_rx_poll_one+0x518/0xbc0 [17659.205352] hns_nic_common_poll+0x94/0x140 [17659.209576] net_rx_action+0x458/0x5e0 [17659.213363] __do_softirq+0x1b8/0x480 [17659.217062] run_ksoftirqd+0x64/0x80 [17659.220679] smpboot_thread_fn+0x224/0x310 [17659.224821] kthread+0x150/0x170 [17659.228084] ret_from_fork+0x10/0x40 BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0... [17751.080490] __slab_alloc+0x52c/0x560 [17751.084188] kmem_cache_alloc+0x244/0x280 [17751.088238] __build_skb+0x40/0x150 [17751.091764] build_skb+0x28/0x100 [17751.095115] __alloc_rx_skb+0x94/0x150 [17751.098900] __napi_alloc_skb+0x34/0x90 [17751.102776] hns_nic_rx_poll_one+0x180/0xbc0 [17751.107097] hns_nic_common_poll+0x94/0x140 [17751.111333] net_rx_action+0x458/0x5e0 [17751.115123] __do_softirq+0x1b8/0x480 [17751.118823] run_ksoftirqd+0x64/0x80 [17751.122437] smpboot_thread_fn+0x224/0x310 [17751.126575] kthread+0x150/0x170 [17751.129838] ret_from_fork+0x10/0x40 [17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43 [17751.139951] free_debug_processing+0x1d4/0x2c0 [17751.144436] __slab_free+0x240/0x390 [17751.148051] kmem_cache_free+0x24c/0x270 [17751.152014] kfree_skbmem+0xa0/0xb0 [17751.155543] __kfree_skb+0x28/0x40 [17751.159022] napi_gro_receive+0x168/0x1c0 [17751.163074] hns_nic_rx_up_pro+0x58/0x90 [17751.167041] hns_nic_rx_poll_one+0x518/0xbc0 [17751.171358] hns_nic_common_poll+0x94/0x140 [17751.175585] net_rx_action+0x458/0x5e0 [17751.179373] __do_softirq+0x1b8/0x480 [17751.183076] run_ksoftirqd+0x64/0x80 [17751.186691] smpboot_thread_fn+0x224/0x310 [17751.190826] kthread+0x150/0x170 [17751.194093] ret_from_fork+0x10/0x40 Fixes: 13ac695e ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem") Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Nlipeng <lipeng321@huawei.com> Reported-by: NJun He <hjat2005@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yunsheng Lin 提交于
As the user manual described, the second step to write to C45 phy by mdio should be data, but not address. Here we should fix this issue. Fixes: 5b904d39 ("net: add Hisilicon Network Subsystem MDIO support") Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Nlipeng <lipeng321@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 vishnuvardhan 提交于
As per the SAMA5D3 device specification it supports Jumbo frames. But the suggested flag and length of bytes it supports was not updated in this driver config_structure. The maximum jumbo frames the device supports : 10240 bytes as per the device spec. While changing the MTU value greater than 1500, it threw error: sudo ifconfig eth1 mtu 9000 SIOCSIFMTU: Invalid argument Add this support to driver so that it works as expected and designed. Signed-off-by: Nvishnuvardhan <vardhanraj4143@gmail.com> [nicolas.ferre@microchip.com: modify slightly commit msg] Signed-off-by: NNicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
Fixes: commit d45b897b ("virtio_net: allow specifying context for rx") Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marc Gonzalez 提交于
This driver is required to work around several hardware bugs in the PCIe controller. The SMP8759 does not support legacy interrupts or IO space. Signed-off-by: NMarc Gonzalez <marc_gonzalez@sigmadesigns.com> [bhelgaas: add CONFIG_BROKEN dependency, various cleanups] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 07 7月, 2017 22 次提交
-
-
由 Christophe Jaillet 提交于
If this memory allocation fails, we should go through the error handling path as done everywhere else in this function before returning. Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
nfp_flower_metadata_cleanup() is defined but never invoked, not calling it will cause us to leak mask and statistics queue memory on the host. Fixes: 43f84b72 ("nfp: add metadata to each flow offload") Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Jurgens 提交于
ib_get_cached_subnet_prefix can technically fail, but the only way it could is not possible based on the loop conditions. Check the return value before using the variable sp to resolve a static analysis warning. -v1: - Fix check to !ret. Paul Moore Fixes: 8f408ab6 ("selinux lsm IB/core: Implement LSM notification system") Signed-off-by: NDaniel Jurgens <danielj@mellanox.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Daniel Jurgens 提交于
Check the return value from get_pkey_and_subnet_prefix to prevent using uninitialized variables. Fixes: d291f1a6 ("IB/core: Enforce PKey security on QPs") Signed-off-by: NDaniel Jurgens <danielj@mellanox.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Enric Balletbo i Serra 提交于
The suspend/resume behavior of the TPM can be controlled by setting "powered-while-suspended" in the DTS. This is useful for the cases when hardware does not power-off the TPM. Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Signed-off-by: NEnric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Roberto Sassu 提交于
tpm2_do_selftest() performs a PCR read during the TPM initialization phase. This patch replaces the PCR read code with a call to tpm2_pcr_read(). Signed-off-by: NRoberto Sassu <roberto.sassu@huawei.com> Reviewed-by: NJarkko Sakkinen <jarkko.sakkine@linux.intel.com> Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Roberto Sassu 提交于
tpm2_pcr_read() now builds the PCR read command buffer with tpm_buf functions. This solution is preferred to using a tpm2_cmd structure, as tpm_buf functions provide protection against buffer overflow. Signed-off-by: NRoberto Sassu <roberto.sassu@huawei.com> Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Colin Ian King 提交于
The pointer ilb_base_addr does not need to be in global scope, so make it static. Cleans up sparse warning: "symbol 'ilb_base_addr' was not declared. Should it be static?" Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Jarkko Sakkinen 提交于
Consolidated all the "manual" TPM startup code to a single function in order to make code flows a bit cleaner and migrate to tpm_buf. Tested-by: NStefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Azhar Shaikh 提交于
To overcome a hardware limitation on Intel Braswell systems, disable CLKRUN protocol during TPM transactions and re-enable once the transaction is completed. Signed-off-by: NAzhar Shaikh <azhar.shaikh@intel.com> Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Manuel Lauss 提交于
priv->cmd_size is never initialised if the cmd and rsp buffers reside at different addresses. Initialise it in the exit path of the function when rsp buffer has also been successfully allocated. Fixes: aa77ea0e ("tpm/tpm_crb: cache cmd_size register value."). Signed-off-by: NManuel Lauss <manuel.lauss@gmail.com> Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Jarkko Sakkinen 提交于
While cleaning up sysfs callback that prints EK we discovered a kernel memory leak. This commit fixes the issue by zeroing the buffer used for TPM command/response. The leak happen when we use either tpm_vtpm_proxy, tpm_ibmvtpm or xen-tpmfront. Cc: stable@vger.kernel.org Fixes: 08837438 ("TPM: sysfs functions consolidation") Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: NStefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Josh Zimmerman 提交于
If a TPM2 loses power without a TPM2_Shutdown command being issued (a "disorderly reboot"), it may lose some state that has yet to be persisted to NVRam, and will increment the DA counter. After the DA counter gets sufficiently large, the TPM will lock the user out. NOTE: This only changes behavior on TPM2 devices. Since TPM1 uses sysfs, and sysfs relies on implicit locking on chip->ops, it is not safe to allow this code to run in TPM1, or to add sysfs support to TPM2, until that locking is made explicit. Signed-off-by: NJosh Zimmerman <joshz@google.com> Cc: stable@vger.kernel.org Fixes: 74d6b3ce ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Josh Zimmerman 提交于
The TPM class has some common shutdown code that must be executed for all drivers. This adds some needed functionality for that. Signed-off-by: NJosh Zimmerman <joshz@google.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Fixes: 74d6b3ce ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Michal Hocko 提交于
Commit 20b2f52b ("numa: add CONFIG_MOVABLE_NODE for movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a good explanation on why it is actually useful. It makes a lot of sense to make movable node semantic opt in but we already have that because the feature has to be explicitly enabled on the kernel command line. A config option on top only makes the configuration space larger without a good reason. It also adds an additional ifdefery that pollutes the code. Just drop the config option and make it de-facto always enabled. This shouldn't introduce any change to the semantic. Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NReza Arbab <arbab@linux.vnet.ibm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Patch series "mm: per-lruvec slab stats" Josef is working on a new approach to balancing slab caches and the page cache. For this to work, he needs slab cache statistics on the lruvec level. These patches implement that by adding infrastructure that allows updating and reading generic VM stat items per lruvec, then switches some existing VM accounting sites, including the slab accounting ones, to this new cgroup-aware API. I'll follow up with more patches on this, because there is actually substantial simplification that can be done to the memory controller when we replace private memcg accounting with making the existing VM accounting sites cgroup-aware. But this is enough for Josef to base his slab reclaim work on, so here goes. This patch (of 5): To re-implement slab cache vs. page cache balancing, we'll need the slab counters at the lruvec level, which, ever since lru reclaim was moved from the zone to the node, is the intersection of the node, not the zone, and the memcg. We could retain the per-zone counters for when the page allocator dumps its memory information on failures, and have counters on both levels - which on all but NUMA node 0 is usually redundant. But let's keep it simple for now and just move them. If anybody complains we can restore the per-zone counters. [hannes@cmpxchg.org: fix oops] Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Heiko Carstens has noticed that he can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] $ cat /sys/devices/system/memory/block_size_bytes 10000000 $ cat /sys/devices/system/memory/memory5/valid_zones DMA $ echo 0 > /sys/devices/system/memory/memory5/online $ cat /sys/devices/system/memory/memory5/valid_zones Normal $ echo 1 > /sys/devices/system/memory/memory5/online Normal $ cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- The reason is that we assume that the default zone for kernel onlining is ZONE_NORMAL. This was a simplification introduced by the memory hotplug rework and it is easily fixable by checking the range overlap in the zone order and considering the first matching zone as the default one. If there is no such zone then assume ZONE_NORMAL as we have been doing so far. Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
The current memory hotplug implementation relies on having all the struct pages associate with a zone/node during the physical hotplug phase (arch_add_memory->__add_pages->__add_section->__add_zone). In the vast majority of cases this means that they are added to ZONE_NORMAL. This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory hotadd without sparsemem") and it wasn't a big deal back then because movable onlining didn't exist yet. Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable memory and portion memory") and then things got more complicated. Rather than reconsidering the zone association which was no longer needed (because the memory hotplug already depended on SPARSEMEM) a convoluted semantic of zone shifting has been developed. Only the currently last memblock or the one adjacent to the zone_movable can be onlined movable. This essentially means that the online type changes as the new memblocks are added. Let's simulate memory hot online manually $ echo 0x100000000 > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory32/valid_zones Normal Movable $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal /sys/devices/system/memory/memory34/valid_zones:Normal Movable $ echo online_movable > /sys/devices/system/memory/memory34/state $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Normal This is an awkward semantic because an udev event is sent as soon as the block is onlined and an udev handler might want to online it based on some policy (e.g. association with a node) but it will inherently race with new blocks showing up. This patch changes the physical online phase to not associate pages with any zone at all. All the pages are just marked reserved and wait for the onlining phase to be associated with the zone as per the online request. There are only two requirements - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses the latter one is not an inherent requirement and can be changed in the future. It preserves the current behavior and made the code slightly simpler. This is subject to change in future. This means that the same physical online steps as above will lead to the following state: Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Normal Movable /sys/devices/system/memory/memory32/valid_zones:Normal Movable /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Implementation: The current move_pfn_range is reimplemented to check the above requirements (allow_online_pfn_range) and then updates the respective zone (move_pfn_range_to_zone), the pgdat and links all the pages in the pfn range with the zone/node. __add_pages is updated to not require the zone and only initializes sections in the range. This allowed to simplify the arch_add_memory code (s390 could get rid of quite some of code). devm_memremap_pages is the only user of arch_add_memory which relies on the zone association because it only hooks into the memory hotplug only half way. It uses it to associate the new memory with ZONE_DEVICE but doesn't allow it to be {on,off}lined via sysfs. This means that this particular code path has to call move_pfn_range_to_zone explicitly. The original zone shifting code is kept in place and will be removed in the follow up patch for an easier review. Please note that this patch also changes the original behavior when offlining a memory block adjacent to another zone (Normal vs. Movable) used to allow to change its movable type. This will be handled later. [richard.weiyang@gmail.com: simplify zone_intersects()] Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com [richard.weiyang@gmail.com: remove duplicate call for set_page_links] Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com [akpm@linux-foundation.org: remove unused local `i'] Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NWei Yang <richard.weiyang@gmail.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
is_pageblock_removable_nolock() relies on having zone association to examine all the page blocks to check whether they are movable or free. This is just wasting of cycles when the memblock is offline. Later patch in the series will also change the time when the page is associated with a zone so we let's bail out early if the memblock is offline. Link: http://lkml.kernel.org/r/20170515085827.16474-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Memory hotplug (add_memory_resource) has to reinitialize node infrastructure if the node is offline (one which went through the complete add_memory(); remove_memory() cycle). That involves node registration to the kobj infrastructure (register_node), the proper association with cpus (register_cpu_under_node) and finally creation of node<->memblock symlinks (link_mem_sections). The last part requires to know node_start_pfn and node_spanned_pages which we currently have but a leter patch will postpone this initialization to the onlining phase which happens later. In fact we do not need to rely on the early pgdat initialization even now because the currently hot added pfn range is currently known. Split register_one_node into core which does all the common work for the boot time NUMA initialization and the hotplug (__register_one_node). register_one_node keeps the full initialization while hotplug calls __register_one_node and manually calls link_mem_sections for the proper range. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170515085827.16474-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Device memory hotplug hooks into regular memory hotplug only half way. It needs memory sections to track struct pages but there is no need/desire to associate those sections with memory blocks and export them to the userspace via sysfs because they cannot be onlined anyway. This is currently expressed by for_device argument to arch_add_memory which then makes sure to associate the given memory range with ZONE_DEVICE. register_new_memory then relies on is_zone_device_section to distinguish special memory hotplug from the regular one. While this works now, later patches in this series want to move __add_zone outside of arch_add_memory path so we have to come up with something else. Add want_memblock down the __add_pages path and use it to control whether the section->memblock association should be done. arch_add_memory then just trivially want memblock for everything but for_device hotplug. remove_memory_section doesn't need is_zone_device_section either. We can simply skip all the memblock specific cleanup if there is no memblock for the given section. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Commit c04fc586 ("mm: show node to memory section relationship with symlinks in sysfs") has added means to export memblock<->node association into the sysfs. It has also introduced get_nid_for_pfn which is a rather confusing counterpart of pfn_to_nid which checks also whether the pfn page is already initialized (page_initialized). This is done by checking page::lru != NULL which doesn't make any sense at all. Nothing in this path really relies on the lru list being used or initialized. Just remove it because this will become a problem with later patches. Thanks to Reza Arbab for testing which revealed this to be a problem (http://lkml.kernel.org/r/20170403202337.GA12482@dhcp22.suse.cz) Link: http://lkml.kernel.org/r/20170515085827.16474-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-