1. 09 9月, 2015 40 次提交
    • L
      Merge tag 'for-linus-4.3' of git://git.code.sf.net/p/openipmi/linux-ipmi · a794b4f3
      Linus Torvalds 提交于
      Pull IPMI updates from Corey Minyard:
       "Most of these have been sitting in linux-next for more than a release,
        particularly commit 0fbcf4af ("ipmi: Convert the IPMI SI ACPI
        handling to a platform device") which is probably the most complex
        patch.
      
        That is also the one that changes drivers/acpi/acpi_pnp.c.  The change
        in that file is only removing IPMI from a "special platform devices"
        list, since I convert it to the standard PNP interface.  I posted this
        one to the ACPI list twice and got no response, and it seems to work
        well in my testing, so I'm hoping it's good.
      
        Hidehiro Kawai posted a set of changes that improves the panic time
        handling in the IPMI driver.
      
        The rest of the changes are minor bug fixes or cleanups and some
        documentation"
      
      * tag 'for-linus-4.3' of git://git.code.sf.net/p/openipmi/linux-ipmi:
        ipmi:ssif: Add a module parm to specify that SMBus alerts don't work
        ipmi: add of_device_id in MODULE_DEVICE_TABLE
        ipmi: Compensate for BMCs that wont set the irq enable bit
        ipmi: Don't call receive handler in the panic context
        ipmi: Avoid touching possible corrupted lists in the panic context
        ipmi: Don't flush messages in sender() in run-to-completion mode
        ipmi: Factor out message flushing procedure
        ipmi: Remove unneeded set_run_to_completion call
        ipmi: Make some data const that was only read
        ipmi: constify SSIF ACPI device ids
        ipmi: Delete an unnecessary check before the function call "cleanup_one_si"
        char:ipmi - Change 1 to true for bool type variables during initialization.
        impi:Remove unneeded setting of module owner to THIS_MODULE in the platform structure, powernv_ipmi_driver
        ipmi: Add a comment in how messages are delivered from the lower layer
        ipmi/powernv: Fix potential invalid pointer dereference
        ipmi: Convert the IPMI SI ACPI handling to a platform device
        ipmi: Add device tree bindings information
      a794b4f3
    • L
      Merge branch 'akpm' (patches from Andrew) · f6f7a636
      Linus Torvalds 提交于
      Merge second patch-bomb from Andrew Morton:
       "Almost all of the rest of MM.  There was an unusually large amount of
        MM material this time"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
        zpool: remove no-op module init/exit
        mm: zbud: constify the zbud_ops
        mm: zpool: constify the zpool_ops
        mm: swap: zswap: maybe_preload & refactoring
        zram: unify error reporting
        zsmalloc: remove null check from destroy_handle_cache()
        zsmalloc: do not take class lock in zs_shrinker_count()
        zsmalloc: use class->pages_per_zspage
        zsmalloc: consider ZS_ALMOST_FULL as migrate source
        zsmalloc: partial page ordering within a fullness_list
        zsmalloc: use shrinker to trigger auto-compaction
        zsmalloc: account the number of compacted pages
        zsmalloc/zram: introduce zs_pool_stats api
        zsmalloc: cosmetic compaction code adjustments
        zsmalloc: introduce zs_can_compact() function
        zsmalloc: always keep per-class stats
        zsmalloc: drop unused variable `nr_to_migrate'
        mm/memblock.c: fix comment in __next_mem_range()
        mm/page_alloc.c: fix type information of memoryless node
        memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
        ...
      f6f7a636
    • L
      Merge branch 'parisc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 839fe915
      Linus Torvalds 提交于
      Pull parisc updates from Helge Deller:
       "The most important changes in this patchset are:
      
         - re-enable 64bit PCI bus addresses which were temporarily disabled
           for PA-RISC in kernel 4.2
      
         - fix the 64bit CAS operation in the LWS path which now enables us to
           enable the 64bit gcc atomic builtins even on 32bit userspace with
           64bit kernel
      
         - fix a long-standing bug which sometimes crashed kernel at bootup
           while serial interrupt wasn't registered yet"
      
      * 'parisc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Use platform_device_register_simple("rtc-generic")
        parisc: Drop CONFIG_SMP around update_cr16_clocksource()
        parisc: Use double word condition in 64bit CAS operation
        parisc: Filter out spurious interrupts in PA-RISC irq handler
        parisc: Additionally check for in_atomic() in page fault handler
        PCI,parisc: Enable 64-bit bus addresses on PA-RISC
        parisc: Define ioremap_uc and ioremap_wc
      839fe915
    • L
      Merge tag 'linux-kselftest-4.3-rc1' of... · 54283aed
      Linus Torvalds 提交于
      Merge tag 'linux-kselftest-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest update from Shuah Khan:
       "This update adds new zram test and fixes to problems found during
        testing this new zram test.  In addition, there are a few bug fixes
        and ksefltest improvement patches from Linaro developers.
      
        I will send another update later on this week to fix kselftest
        breakage due to commit 2bf9e0ab ("locking/static_keys: Provide a
        selftest") after the fix soaks in next for a couple of days"
      
      * tag 'linux-kselftest-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/zram: Makefile fix
        selftests/zram: must be run as root
        selftests: breakpoints: fix installing error on the architecture except x86
        selftests: check before install
        selftests/zram: Adding zram tests
      54283aed
    • L
      Merge tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 9a9952bb
      Linus Torvalds 提交于
      Pull iommu updates for from Joerg Roedel:
       "This time the IOMMU updates are mostly cleanups or fixes.  No big new
        features or drivers this time.  In particular the changes include:
      
         - Bigger cleanup of the Domain<->IOMMU data structures and the code
           that manages them in the Intel VT-d driver.  This makes the code
           easier to understand and maintain, and also easier to keep the data
           structures in sync.  It is also a preparation step to make use of
           default domains from the IOMMU core in the Intel VT-d driver.
      
         - Fixes for a couple of DMA-API misuses in ARM IOMMU drivers, namely
           in the ARM and Tegra SMMU drivers.
      
         - Fix for a potential buffer overflow in the OMAP iommu driver's
           debug code
      
         - A couple of smaller fixes and cleanups in various drivers
      
         - One small new feature: Report domain-id usage in the Intel VT-d
           driver to easier detect bugs where these are leaked"
      
      * tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (83 commits)
        iommu/vt-d: Really use upper context table when necessary
        x86/vt-d: Fix documentation of DRHD
        iommu/fsl: Really fix init section(s) content
        iommu/io-pgtable-arm: Unmap and free table when overwriting with block
        iommu/io-pgtable-arm: Move init-fn declarations to io-pgtable.h
        iommu/msm: Use BUG_ON instead of if () BUG()
        iommu/vt-d: Access iomem correctly
        iommu/vt-d: Make two functions static
        iommu/vt-d: Use BUG_ON instead of if () BUG()
        iommu/vt-d: Return false instead of 0 in irq_remapping_cap()
        iommu/amd: Use BUG_ON instead of if () BUG()
        iommu/amd: Make a symbol static
        iommu/amd: Simplify allocation in irq_remapping_alloc()
        iommu/tegra-smmu: Parameterize number of TLB lines
        iommu/tegra-smmu: Factor out tegra_smmu_set_pde()
        iommu/tegra-smmu: Extract tegra_smmu_pte_get_use()
        iommu/tegra-smmu: Use __GFP_ZERO to allocate zeroed pages
        iommu/tegra-smmu: Remove PageReserved manipulation
        iommu/tegra-smmu: Convert to use DMA API
        iommu/tegra-smmu: smmu_flush_ptc() wants device addresses
        ...
      9a9952bb
    • L
      Merge tag 'regmap-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · e81b594c
      Linus Torvalds 提交于
      Pull regmap updates from Mark Brown:
       "This has been a busy release for regmap.
      
        By far the biggest set of changes here are those from Markus Pargmann
        which implement support for block transfers in smbus devices.  This
        required quite a bit of refactoring but leaves us better able to
        handle odd restrictions that controllers may have and with better
        performance on smbus.
      
        Other new features include:
      
         - Fix interactions with lockdep for nested regmaps (eg, when a device
           using regmap is connected to a bus where the bus controller has a
           separate regmap).  Lockdep's default class identification is too
           crude to work without help.
      
         - Support for must write bitfield operations, useful for operations
           which require writing a bit to trigger them from Kuniori Morimoto.
      
         - Support for delaying during register patch application from Nariman
           Poushin.
      
         - Support for overriding cache state via the debugfs implementation
           from Richard Fitzgerald"
      
      * tag 'regmap-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (25 commits)
        regmap: fix a NULL pointer dereference in __regmap_init
        regmap: Support bulk reads for devices without raw formatting
        regmap-i2c: Add smbus i2c block support
        regmap: Add raw_write/read checks for max_raw_write/read sizes
        regmap: regmap max_raw_read/write getter functions
        regmap: Introduce max_raw_read/write for regmap_bulk_read/write
        regmap: Add missing comments about struct regmap_bus
        regmap: No multi_write support if bus->write does not exist
        regmap: Split use_single_rw internally into use_single_read/write
        regmap: Fix regmap_bulk_write for bus writes
        regmap: regmap_raw_read return error on !bus->read
        regulator: core: Print at debug level on debugfs creation failure
        regmap: Fix regmap_can_raw_write check
        regmap: fix typos in regmap.c
        regmap: Fix integertypes for register address and value
        regmap: Move documentation to regmap.h
        regmap: Use different lockdep class for each regmap init call
        thermal: sti: Add parentheses around bridge->ops->regmap_init call
        mfd: vexpress: Add parentheses around bridge->ops->regmap_init call
        regmap: debugfs: Fix misuse of IS_ENABLED
        ...
      e81b594c
    • L
      Merge tag 'fbdev-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux · fa815580
      Linus Torvalds 提交于
      Pull fbdev updates from Tomi Valkeinen:
       "Minor fixes and cleanups"
      
      * tag 'fbdev-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
        video: fbdev: atmel_lcdfb: remove useless include
        video: fbdev: pxa168fb: Use devm_clk_get
        fbdev: ssd1307fb: fix error return code
        fbdev: fix snprintf() limit in show_bl_curve()
        video: fbdev: s3c-fb: Constify platform_device_id
        video: fbdev: atmel: fix warning for const return value
        video: fbdev: Drop owner assignment from platform_driver
        video: fbdev: Drop owner assignment from i2c_driver
        fbdev: remove unnecessary memset in vfb
        framebuffer: disable vgacon on microblaze arch
        fbdev: udlfb: remove unneeded initialization in few places
        fbdev: Allow compile test of GPIO consumers if !GPIOLIB
        fbdev: fix cea_modes array size
      fa815580
    • L
      Merge tag 'mmc-v4.3' of git://git.linaro.org/people/ulf.hansson/mmc · 85579ad7
      Linus Torvalds 提交于
      Pull MMC updates from Ulf Hansson:
       "MMC core:
         - Fix a race condition in the request handling
         - Skip trim commands for some buggy kingston eMMCs
         - An optimization and a correction for erase groups
         - Set CMD23 quirk for some Sandisk cards
      
        MMC host:
         - sdhci: Give GPIO CD higher precedence and don't poll when it's used
         - sdhci: Fix DMA memory leakage
         - sdhci: Some updates for clock management
         - sdhci-of-at91: introduce driver for the Atmel SDMMC
         - sdhci-of-arasan: Add support for sdhci-5.1
         - sdhci-esdhc-imx: Add support for imx7d which also supports HS400
         - sdhci: A collection of fixes and improvements for various sdhci hosts
         - omap_hsmmc: Modernization of the regulator code
         - dw_mmc: A couple of fixes for DMA and PIO mode
         - usdhi6rol0: A few fixes and support probe deferral for regulators
         - pxamci: Convert to use dmaengine
         - sh_mmcif: Fix the suspend process in a short term solution
         - tmio: Adjust timeout for commands
         - sunxi: Fix timeout while gating/ungating clock"
      
      * tag 'mmc-v4.3' of git://git.linaro.org/people/ulf.hansson/mmc: (67 commits)
        mmc: android-goldfish: remove incorrect __iomem annotation
        mmc: core: fix race condition in mmc_wait_data_done
        mmc: host: omap_hsmmc: remove CONFIG_REGULATOR check
        mmc: host: omap_hsmmc: use ios->vdd for setting vmmc voltage
        mmc: host: omap_hsmmc: use regulator_is_enabled to find pbias status
        mmc: host: omap_hsmmc: enable/disable vmmc_aux regulator based on previous state
        mmc: host: omap_hsmmc: don't use ->set_power to set initial regulator state
        mmc: host: omap_hsmmc: avoid pbias regulator enable on power off
        mmc: host: omap_hsmmc: add separate function to set pbias
        mmc: host: omap_hsmmc: add separate functions for enable/disable supply
        mmc: host: omap_hsmmc: return error if any of the regulator APIs fail
        mmc: host: omap_hsmmc: remove unnecessary pbias set_voltage
        mmc: host: omap_hsmmc: use mmc_host's vmmc and vqmmc
        mmc: host: omap_hsmmc: use the ocrmask provided by the vmmc regulator
        mmc: host: omap_hsmmc: cleanup omap_hsmmc_reg_get()
        mmc: host: omap_hsmmc: return on fatal errors from omap_hsmmc_reg_get
        mmc: host: omap_hsmmc: use devm_regulator_get_optional() for vmmc
        mmc: sdhci-of-at91: fix platform_no_drv_owner.cocci warnings
        mmc: sh_mmcif: Fix suspend process
        mmc: usdhi6rol0: fix error return code
        ...
      85579ad7
    • L
      Merge tag 'platform-drivers-x86-v4.3-1' of... · 3af6e98f
      Linus Torvalds 提交于
      Merge tag 'platform-drivers-x86-v4.3-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull x86 platform driver updates from Darren Hart:
       "Significant work on toshiba_acpi, including new hardware support,
        refactoring, and cleanups.  Extend device support for asus, ideapad,
        and acer systems.  New surface pro 3 buttons driver.  Misc minor
        cleanups for thinkpad and hp-wireless.
      
        acer-wmi:
         - No rfkill on HP Omen 15 wifi
      
        thinkpad_acpi:
         - Remove side effects from vdbg_printk -> no_printk macro
      
        surface pro 3:
         - Add support driver for Surface Pro 3 buttons
      
        hp-wireless:
         - remove unneeded goto/label in hpwl_init
      
        ideapad-laptop:
         - add alternative representation for Yoga 2 to DMI table
         - Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
      
        asus-laptop:
         - Add key found on Asus F3M
      
        MAINTAINERS:
         - Remove Toshiba Linux mailing list address
      
        toshiba_acpi:
         - Bump driver version to 0.23
         - Remove unnecessary checks and returns in HCI/SCI functions
         - Refactor *{get, set} functions return value
         - Remove "*not supported" feature prints
         - Change *available functions return type
         - Add set_fan_status function
         - Change some variables to avoid warnings from ninja-check
         - Reorder toshiba_acpi_alt_keymap entries
         - Remove unused wireless defines
         - Transflective backlight updates
         - Avoid registering input device on WMI event laptops
         - Add /dev/toshiba_acpi device
         - Adapt /proc/acpi/toshiba/keys to TOS1900 devices"
      
      * tag 'platform-drivers-x86-v4.3-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (21 commits)
        acer-wmi: No rfkill on HP Omen 15 wifi
        thinkpad_acpi: Remove side effects from vdbg_printk -> no_printk macro
        surface pro 3: Add support driver for Surface Pro 3 buttons
        hp-wireless: remove unneeded goto/label in hpwl_init
        ideapad-laptop: add alternative representation for Yoga 2 to DMI table
        asus-laptop: Add key found on Asus F3M
        MAINTAINERS: Remove Toshiba Linux mailing list address
        ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
        toshiba_acpi: Bump driver version to 0.23
        toshiba_acpi: Remove unnecessary checks and returns in HCI/SCI functions
        toshiba_acpi: Refactor *{get, set} functions return value
        toshiba_acpi: Remove "*not supported" feature prints
        toshiba_acpi: Change *available functions return type
        toshiba_acpi: Add set_fan_status function
        toshiba_acpi: Change some variables to avoid warnings from ninja-check
        toshiba_acpi: Reorder toshiba_acpi_alt_keymap entries
        toshiba_acpi: Remove unused wireless defines
        toshiba_acpi: Transflective backlight updates
        toshiba_acpi: Avoid registering input device on WMI event laptops
        toshiba_acpi: Add /dev/toshiba_acpi device
        ...
      3af6e98f
    • L
      Merge branch 'i2c/for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · acceba59
      Linus Torvalds 提交于
      Pull i2c updates from Wolfram Sang:
       "Features:
      
         - new drivers: Renesas EMEV2, register based MUX, NXP LPC2xxx
         - core: scans DT and assigns wakeup interrupts.  no driver changes needed.
         - core: some refcouting issues fixed and better API for that
         - core: new helper function for best effort block read emulation
         - slave framework: proper DT bindings and userspace instantiation
         - some bigger work for xiic, pxa, omap drivers
      
        .. and quite a number of smaller driver fixes, cleanups, improvements"
      
      * 'i2c/for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (65 commits)
        i2c: mux: reg Change ioread endianness for readback
        i2c: mux: reg: fix compilation warnings
        i2c: mux: reg: simplify register size checking
        i2c: muxes: fix leaked i2c adapter device node references
        i2c: allow specifying separate wakeup interrupt in device tree
        of/irq: export of_get_irq_byname()
        i2c: xgene-slimpro: dma_mapping_error() doesn't return an error code
        i2c: Replace I2C_CROS_EC_TUNNEL dependency
        eeprom: at24: use i2c_smbus_read_i2c_block_data_or_emulated
        i2c: core: Add support for best effort block read emulation
        i2c: lpc2k: add driver
        i2c: mux: Add register-based mux i2c-mux-reg
        i2c: dt: describe generic bindings
        i2c: slave: print warning if slave flag not set
        i2c: support 10 bit and slave addresses in sysfs 'new_device'
        i2c: take address space into account when checking for used addresses
        i2c: apply DT flags when probing
        i2c: make address check indpendent from client struct
        i2c: rename address check functions
        i2c: apply address offset for slaves, too
        ...
      acceba59
    • L
      Merge tag 'rtc-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · c1917615
      Linus Torvalds 提交于
      Pull RTC updates from Alexandre Belloni:
       "Core:
         - use is_visible() to control sysfs attributes
         - switch wakealarm attribute to DEVICE_ATTR_RW
         - make rtc_does_wakealarm() return boolean
         - properly manage lifetime of dev and cdev in rtc device
         - remove unnecessary device_get() in rtc_device_unregister
         - fix double free in rtc_register_device() error path
      
        New drivers:
         - NXP LPC24xx
         - Xilinx Zynq MP
         - Dialog DA9062
      
        Subsystem wide cleanups:
         - fix drivers that consider 0 as a valid IRQ in client->irq
         - Drop (un)likely before IS_ERR(_OR_NULL)
         - drop the remaining owner assignment for i2c_driver and
           platform_driver
         - module autoload fixes
      
        Drivers:
         - 88pm80x: add device tree support
         - abx80x: fix RTC write bit
         - ab8500: Add a sentinel to ab85xx_rtc_ids[]
         - armada38x: Align RTC set time procedure with the official errata
         - as3722: correct month value
         - at91sam9: cleanups
         - at91rm9200: get and use slow clock and cleanups
         - bq32k: remove redundant check
         - cmos: century support, proper fix for the spurious wakeup
         - ds1307: cleanups and wakeup irq support
         - ds1374: Remove unused variable
         - ds1685: Use module_platform_driver
         - ds3232: fix WARNING trace in resume function
         - gemini: fix ptr_ret.cocci warnings
         - mt6397: implement suspend/resume
         - omap: support internal and external clock enabling
         - opal: Enable alarms only when opal supports tpo
         - pcf2127: use OFS flag to detect unreliable date and warn the user
         - pl031: fix typo for author email
         - rx8025: huge cleanup and fixes
         - sa1100/pxa: share common code
         - s5m: fix to update ctrl register
         - s3c: fix clocks and wakeup, cleanup
         - sirfsoc: use regmap
         - nvram_read()/nvram_write() functions for cmos, ds1305, ds1307,
           ds1343, ds1511, ds1553, ds1742, m48t59, rp5c01, stk17ta8, tx4939
         - use rtc_valid_tm() error code when reading date/time instead of 0
           for isl12022, pcf2123, pcf2127"
      
      * tag 'rtc-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (90 commits)
        rtc: abx80x: fix RTC write bit
        rtc: ab8500: Add a sentinel to ab85xx_rtc_ids[]
        rtc: ds1374: Remove unused variable
        rtc: Fix module autoload for OF platform drivers
        rtc: Fix module autoload for rtc-{ab8500,max8997,s5m} drivers
        rtc: omap: Add external clock enabling support
        rtc: omap: Add internal clock enabling support
        ARM: dts: AM437x: Add the internal and external clock nodes for rtc
        rtc: s5m: fix to update ctrl register
        rtc: add xilinx zynqmp rtc driver
        devicetree: bindings: rtc: add bindings for xilinx zynqmp rtc
        rtc: as3722: correct month value
        ARM: config: Switch PXA27x platforms to use PXA RTC driver
        ARM: mmp: remove unused RTC register definitions
        ARM: sa1100: remove unused RTC register definitions
        rtc: sa1100/pxa: convert to run-time register mapping
        ARM: pxa: add memory resource to SA1100 RTC device
        rtc: pxa: convert to use shared sa1100 functions
        rtc: sa1100: prepare to share sa1100_rtc_ops
        rtc: ds3232: fix WARNING trace in resume function
        ...
      c1917615
    • D
      zpool: remove no-op module init/exit · df69f52d
      Dan Streetman 提交于
      Remove zpool_init() and zpool_exit(); they do nothing other than print
      "loaded" and "unloaded".
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df69f52d
    • K
      mm: zbud: constify the zbud_ops · c83db4f4
      Krzysztof Kozlowski 提交于
      The structure zbud_ops is not modified so make the pointer to it a
      pointer to const.
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c83db4f4
    • K
      mm: zpool: constify the zpool_ops · 78672779
      Krzysztof Kozlowski 提交于
      The structure zpool_ops is not modified so make the pointer to it a
      pointer to const.
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78672779
    • D
      mm: swap: zswap: maybe_preload & refactoring · 5b999aad
      Dmitry Safonov 提交于
      zswap_get_swap_cache_page and read_swap_cache_async have pretty much the
      same code with only significant difference in return value and usage of
      swap_readpage.
      
      I a helper __read_swap_cache_async() with the common code.  Behavior
      change: now zswap_get_swap_cache_page will use radix_tree_maybe_preload
      instead radix_tree_preload.  Looks like, this wasn't changed only by the
      reason of code duplication.
      Signed-off-by: NDmitry Safonov <0x7f454c46@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b999aad
    • S
      zram: unify error reporting · 70864969
      Sergey Senozhatsky 提交于
      Make zram syslog error reporting more consistent. We have random
      error levels in some places. For example, critical errors like
        "Error allocating memory for compressed page"
      and
        "Unable to allocate temp memory"
      are reported as KERN_INFO messages.
      
      a) Reassign error levels
      
      Error messages that directly affect zram
      functionality -- pr_err():
      
       Error allocating zram address table
       Error creating memory pool
       Decompression failed! err=%d, page=%u
       Unable to allocate temp memory
       Compression failed! err=%d
       Error allocating memory for compressed page: %u, size=%zu
       Cannot initialise %s compressing backend
       Error allocating disk queue for device %d
       Error allocating disk structure for device %d
       Error creating sysfs group for device %d
       Unable to register zram-control class
       Unable to get major number
      
      Messages that do not affect functionality, but user
      must be warned (because sysfs attrs will be removed in
      this particular case) -- pr_warn():
      
       %d (%s) Attribute %s (and others) will be removed. %s
      
      Messages that do not affect functionality and mostly are
      informative -- pr_info():
      
       Cannot change max compression streams
       Can't change algorithm for initialized device
       Cannot change disksize for initialized device
       Added device: %s
       Removed device: %s
      
      b) Update sysfs_create_group() error message
      
      First, it lacks a trailing new line; add it.  Second, every error message
      in zram_add() has a "for device %d" part, which makes errors more
      informative.  Add missing part to "Error creating sysfs group" message.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70864969
    • S
      zsmalloc: remove null check from destroy_handle_cache() · cd10add0
      Sergey Senozhatsky 提交于
      We can pass a NULL cache pointer to kmem_cache_destroy(), because it
      NULL-checks its argument now.  Remove redundant test from
      destroy_handle_cache().
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd10add0
    • S
      zsmalloc: do not take class lock in zs_shrinker_count() · b3e237f1
      Sergey Senozhatsky 提交于
      We can avoid taking class ->lock around zs_can_compact() in
      zs_shrinker_count(), because the number that we return back is outdated
      in general case, by design.  We have different sources that are able to
      change class's state right after we return from zs_can_compact() --
      ongoing I/O operations, manually triggered compaction, or two of them
      happening simultaneously.
      
      We re-do this calculations during compaction on a per class basis
      anyway.
      
      zs_unregister_shrinker() will not return until we have an active
      shrinker, so classes won't unexpectedly disappear while
      zs_shrinker_count() iterates them.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3e237f1
    • M
      zsmalloc: use class->pages_per_zspage · 6cbf16b3
      Minchan Kim 提交于
      There is no need to recalcurate pages_per_zspage in runtime.  Just use
      class->pages_per_zspage to avoid unnecessary runtime overhead.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cbf16b3
    • M
      zsmalloc: consider ZS_ALMOST_FULL as migrate source · ad9d5e17
      Minchan Kim 提交于
      There is no reason to prevent select ZS_ALMOST_FULL as migration source
      if we cannot find source from ZS_ALMOST_EMPTY.
      
      With this patch, zs_can_compact will return more exact result.
      Signed-off-by: NMinchan Kim <minchan.kim@lge.com>
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad9d5e17
    • S
      zsmalloc: partial page ordering within a fullness_list · 58f17117
      Sergey Senozhatsky 提交于
      We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
      pages.  Put a page with higher ->inuse count first within its
      ->fullness_list, which will give us better chances to fill up this page
      with new objects (find_get_zspage() return ->fullness_list head for new
      object allocation), so some zspages will become ZS_ALMOST_FULL/ZS_FULL
      quicker.
      
      It performs a trivial and cheap ->inuse compare which does not slow down
      zsmalloc and in the worst case keeps the list pages in no particular
      order.
      
      A more expensive solution could sort fullness_list by ->inuse count.
      
      [minchan@kernel.org: code adjustments]
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58f17117
    • S
      zsmalloc: use shrinker to trigger auto-compaction · ab9d306d
      Sergey Senozhatsky 提交于
      Perform automatic pool compaction by a shrinker when system is getting
      tight on memory.
      
      User-space has a very little knowledge regarding zsmalloc fragmentation
      and basically has no mechanism to tell whether compaction will result in
      any memory gain.  Another issue is that user space is not always aware
      of the fact that system is getting tight on memory.  Which leads to very
      uncomfortable scenarios when user space may start issuing compaction
      'randomly' or from crontab (for example).  Fragmentation is not always
      necessarily bad, allocated and unused objects, after all, may be filled
      with the data later, w/o the need of allocating a new zspage.  On the
      other hand, we obviously don't want to waste memory when the system
      needs it.
      
      Compaction now has a relatively quick pool scan so we are able to
      estimate the number of pages that will be freed easily, which makes it
      possible to call this function from a shrinker->count_objects()
      callback.  We also abort compaction as soon as we detect that we can't
      free any pages any more, preventing wasteful objects migrations.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab9d306d
    • S
      zsmalloc: account the number of compacted pages · 860c707d
      Sergey Senozhatsky 提交于
      Compaction returns back to zram the number of migrated objects, which is
      quite uninformative -- we have objects of different sizes so user space
      cannot obtain any valuable data from that number.  Change compaction to
      operate in terms of pages and return back to compaction issuer the
      number of pages that were freed during compaction.  So from now on we
      will export more meaningful value in zram<id>/mm_stat -- the number of
      freed (compacted) pages.
      
      This requires:
       (a) a rename of `num_migrated' to 'pages_compacted'
       (b) a internal API change -- return first_page's fullness_group from
           putback_zspage(), so we know when putback_zspage() did
           free_zspage().  It helps us to account compaction stats correctly.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      860c707d
    • S
      zsmalloc/zram: introduce zs_pool_stats api · 7d3f3938
      Sergey Senozhatsky 提交于
      `zs_compact_control' accounts the number of migrated objects but it has
      a limited lifespan -- we lose it as soon as zs_compaction() returns back
      to zram.  It worked fine, because (a) zram had it's own counter of
      migrated objects and (b) only zram could trigger compaction.  However,
      this does not work for automatic pool compaction (not issued by zram).
      To account objects migrated during auto-compaction (issued by the
      shrinker) we need to store this number in zs_pool.
      
      Define a new `struct zs_pool_stats' structure to keep zs_pool's stats
      there.  It provides only `num_migrated', as of this writing, but it
      surely can be extended.
      
      A new zsmalloc zs_pool_stats() symbol exports zs_pool's stats back to
      caller.
      
      Use zs_pool_stats() in zram and remove `num_migrated' from zram_stats.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d3f3938
    • S
      zsmalloc: cosmetic compaction code adjustments · 0dc63d48
      Sergey Senozhatsky 提交于
      Change zs_object_copy() argument order to be (DST, SRC) rather than
      (SRC, DST).  copy/move functions usually have (to, from) arguments
      order.
      
      Rename alloc_target_page() to isolate_target_page().  This function
      doesn't allocate anything, it isolates target page, pretty much like
      isolate_source_page().
      
      Tweak __zs_compact() comment.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dc63d48
    • S
      zsmalloc: introduce zs_can_compact() function · 04f05909
      Sergey Senozhatsky 提交于
      This function checks if class compaction will free any pages.
      Rephrasing -- do we have enough unused objects to form at least one
      ZS_EMPTY page and free it.  It aborts compaction if class compaction
      will not result in any (further) savings.
      
      EXAMPLE (this debug output is not part of this patch set):
      
       - class size
       - number of allocated objects
       - number of used objects
       - max objects per zspage
       - pages per zspage
       - estimated number of pages that will be freed
      
      [..]
      class-512 objs:544 inuse:540 maxobj-per-zspage:8  pages-per-zspage:1 zspages-to-free:0
       ... class-512 compaction is useless. break
      class-496 objs:660 inuse:570 maxobj-per-zspage:33 pages-per-zspage:4 zspages-to-free:2
      class-496 objs:627 inuse:570 maxobj-per-zspage:33 pages-per-zspage:4 zspages-to-free:1
      class-496 objs:594 inuse:570 maxobj-per-zspage:33 pages-per-zspage:4 zspages-to-free:0
       ... class-496 compaction is useless. break
      class-448 objs:657 inuse:617 maxobj-per-zspage:9  pages-per-zspage:1 zspages-to-free:4
      class-448 objs:648 inuse:617 maxobj-per-zspage:9  pages-per-zspage:1 zspages-to-free:3
      class-448 objs:639 inuse:617 maxobj-per-zspage:9  pages-per-zspage:1 zspages-to-free:2
      class-448 objs:630 inuse:617 maxobj-per-zspage:9  pages-per-zspage:1 zspages-to-free:1
      class-448 objs:621 inuse:617 maxobj-per-zspage:9  pages-per-zspage:1 zspages-to-free:0
       ... class-448 compaction is useless. break
      class-432 objs:728 inuse:685 maxobj-per-zspage:28 pages-per-zspage:3 zspages-to-free:1
      class-432 objs:700 inuse:685 maxobj-per-zspage:28 pages-per-zspage:3 zspages-to-free:0
       ... class-432 compaction is useless. break
      class-416 objs:819 inuse:705 maxobj-per-zspage:39 pages-per-zspage:4 zspages-to-free:2
      class-416 objs:780 inuse:705 maxobj-per-zspage:39 pages-per-zspage:4 zspages-to-free:1
      class-416 objs:741 inuse:705 maxobj-per-zspage:39 pages-per-zspage:4 zspages-to-free:0
       ... class-416 compaction is useless. break
      class-400 objs:690 inuse:674 maxobj-per-zspage:10 pages-per-zspage:1 zspages-to-free:1
      class-400 objs:680 inuse:674 maxobj-per-zspage:10 pages-per-zspage:1 zspages-to-free:0
       ... class-400 compaction is useless. break
      class-384 objs:736 inuse:709 maxobj-per-zspage:32 pages-per-zspage:3 zspages-to-free:0
       ... class-384 compaction is useless. break
      [..]
      
      Every "compaction is useless" indicates that we saved CPU cycles.
      
      class-512 has
      	544	object allocated
      	540	objects used
      	8	objects per-page
      
      Even if we have a ALMOST_EMPTY zspage, we still don't have enough room to
      migrate all of its objects and free this zspage; so compaction will not
      make a lot of sense, it's better to just leave it as is.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04f05909
    • S
      zsmalloc: always keep per-class stats · 57244594
      Sergey Senozhatsky 提交于
      Always account per-class `zs_size_stat' stats.  This data will help us
      make better decisions during compaction.  We are especially interested
      in OBJ_ALLOCATED and OBJ_USED, which can tell us if class compaction
      will result in any memory gain.
      
      For instance, we know the number of allocated objects in the class, the
      number of objects being used (so we also know how many objects are not
      used) and the number of objects per-page.  So we can ensure if we have
      enough unused objects to form at least one ZS_EMPTY zspage during
      compaction.
      
      We calculate this value on per-class basis so we can calculate a total
      number of zspages that can be released.  Which is exactly what a
      shrinker wants to know.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57244594
    • S
      zsmalloc: drop unused variable `nr_to_migrate' · b430d1fd
      Sergey Senozhatsky 提交于
      This patchset tweaks compaction and makes it possible to trigger pool
      compaction automatically when system is getting low on memory.
      
      zsmalloc in some cases can suffer from a notable fragmentation and
      compaction can release some considerable amount of memory.  The problem
      here is that currently we fully rely on user space to perform compaction
      when needed.  However, performing zsmalloc compaction is not always an
      obvious thing to do.  For example, suppose we have a `idle' fragmented
      (compaction was never performed) zram device and system is getting low
      on memory due to some 3rd party user processes (gcc LTO, or firefox,
      etc.).  It's quite unlikely that user space will issue zpool compaction
      in this case.  Besides, user space cannot tell for sure how badly pool
      is fragmented; however, this info is known to zsmalloc and, hence, to a
      shrinker.
      
      This patch (of 7):
      
      __zs_compact() does not use `nr_to_migrate', drop it.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b430d1fd
    • A
    • Z
      mm/page_alloc.c: fix type information of memoryless node · 4ada0c5a
      Zhen Lei 提交于
      For a memoryless node, the output of get_pfn_range_for_nid are all zero.
      It will display mem from 0 to -1.
      Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ada0c5a
    • X
      memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node() · b5685e92
      Xishi Qiu 提交于
      When hot adding a node from add_memory(), we will add memblock first, so
      the node is not empty.  But when called from cpu_up(), the node should
      be empty.
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>\
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5685e92
    • Y
      mm/page_alloc.c: change sysctl_lower_zone_reserve_ratio to sysctl_lowmem_reserve_ratio in comments · 34b10060
      Yaowei Bai 提交于
      We use sysctl_lowmem_reserve_ratio rather than
      sysctl_lower_zone_reserve_ratio to determine how aggressive the kernel
      is in defending lowmem from the possibility of being captured into
      pinned user memory.  To avoid misleading, correct it in some comments.
      Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34b10060
    • Y
      mm/page_alloc.c: fix a misleading comment · 013110a7
      Yaowei Bai 提交于
      The comment says that the per-cpu batchsize and zone watermarks are
      determined by present_pages which is definitely wrong, they are both
      calculated from managed_pages.  Fix it.
      Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      013110a7
    • C
      mm/mmap.c:insert_vm_struct(): check for failure before setting values · c9d13f5f
      Chen Gang 提交于
      There's no point in initializing vma->vm_pgoff if the insertion attempt
      will be failing anyway.  Run the checks before performing the
      initialization.
      Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9d13f5f
    • P
      mm/khugepaged: allow interruption of allocation sleep again · bde43c6c
      Petr Mladek 提交于
      Commit 1dfb059b ("thp: reduce khugepaged freezing latency") fixed
      khugepaged to do not block a system suspend.  But the result is that it
      could not get interrupted before the given timeout because the condition
      for the wait event is "false".
      
      This patch puts back the original approach but it uses
      freezable_schedule_timeout_interruptible() instead of
      schedule_timeout_interruptible().  It does the right thing.  I am pretty
      sure that the freezable variant was not used in the original fix only
      because it was not available at that time.
      
      The regression has been there for ages.  It was not critical.  It just
      did the allocation throttling a little bit more aggressively.
      
      I found this problem when converting the kthread to kthread worker API
      and trying to understand the code.
      
      This bug is thought to have minimal userspace-visible impact.  Somebody
      could set a high alloc_sleep value by mistake, and then try to fix it
      back, but khugepaged would keep sleeping until the high value expires.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bde43c6c
    • A
      mm/memblock.c: fiy typos in comments · c1153931
      Alexander Kuleshov 提交于
      s/succees/success/
      Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1153931
    • J
      mm/compaction: correct to flush migrated pages if pageblock skip happens · 1a16718c
      Joonsoo Kim 提交于
      We cache isolate_start_pfn before entering isolate_migratepages().  If
      pageblock is skipped in isolate_migratepages() due to whatever reason,
      cc->migrate_pfn can be far from isolate_start_pfn hence we flush pages
      that were freed.  For example, the following scenario can be possible:
      
      - assume order-9 compaction, pageblock order is 9
      - start_isolate_pfn is 0x200
      - isolate_migratepages()
        - skip a number of pageblocks
        - start to isolate from pfn 0x600
        - cc->migrate_pfn = 0x620
        - return
      - last_migrated_pfn is set to 0x200
      - check flushing condition
        - current_block_start is set to 0x600
        - last_migrated_pfn < current_block_start then do useless flush
      
      This wrong flush would not help the performance and success rate so this
      patch tries to fix it.  One simple way to know the exact position where
      we start to isolate migratable pages is that we cache it in
      isolate_migratepages() before entering actual isolation.  This patch
      implements that and fixes the problem.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a16718c
    • V
      mm: use numa_mem_id() in alloc_pages_node() · 82c1fc71
      Vlastimil Babka 提交于
      alloc_pages_node() might fail when called with NUMA_NO_NODE and
      __GFP_THISNODE on a CPU belonging to a memoryless node.  To make the
      local-node fallback more robust and prevent such situations, use
      numa_mem_id(), which was introduced for similar scenarios in the slab
      context.
      Suggested-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82c1fc71
    • V
      mm: unify checks in alloc_pages_node() and __alloc_pages_node() · 0bc35a97
      Vlastimil Babka 提交于
      Perform the same debug checks in alloc_pages_node() as are done in
      __alloc_pages_node(), by making the former function a wrapper of the
      latter one.
      
      In addition to better diagnostics in DEBUG_VM builds for situations
      which have been already fatal (e.g.  out-of-bounds node id), there are
      two visible changes for potential existing buggy callers of
      alloc_pages_node():
      
      - calling alloc_pages_node() with any negative nid (e.g. due to arithmetic
        overflow) was treated as passing NUMA_NO_NODE and fallback to local node was
        applied. This will now be fatal.
      - calling alloc_pages_node() with an offline node will now be checked for
        DEBUG_VM builds. Since it's not fatal if the node has been previously online,
        and this patch may expose some existing buggy callers, change the VM_BUG_ON
        in __alloc_pages_node() to VM_WARN_ON.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bc35a97
    • V
      mm: rename alloc_pages_exact_node() to __alloc_pages_node() · 96db800f
      Vlastimil Babka 提交于
      alloc_pages_exact_node() was introduced in commit 6484eb3e ("page
      allocator: do not check NUMA node ID when the caller knows the node is
      valid") as an optimized variant of alloc_pages_node(), that doesn't
      fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
      name of the function can easily suggest that the allocation is
      restricted to the given node and fails otherwise.  In truth, the node is
      only preferred, unless __GFP_THISNODE is passed among the gfp flags.
      
      The misleading name has lead to mistakes in the past, see for example
      commits 5265047a ("mm, thp: really limit transparent hugepage
      allocation to local node") and b360edb4 ("mm, mempolicy:
      migrate_to_node should only migrate to node").
      
      Another issue with the name is that there's a family of
      alloc_pages_exact*() functions where 'exact' means exact size (instead
      of page order), which leads to more confusion.
      
      To prevent further mistakes, this patch effectively renames
      alloc_pages_exact_node() to __alloc_pages_node() to better convey that
      it's an optimized variant of alloc_pages_node() not intended for general
      usage.  Both functions get described in comments.
      
      It has been also considered to really provide a convenience function for
      allocations restricted to a node, but the major opinion seems to be that
      __GFP_THISNODE already provides that functionality and we shouldn't
      duplicate the API needlessly.  The number of users would be small
      anyway.
      
      Existing callers of alloc_pages_exact_node() are simply converted to
      call __alloc_pages_node(), with the exception of sba_alloc_coherent()
      which open-codes the check for NUMA_NO_NODE, so it is converted to use
      alloc_pages_node() instead.  This means it no longer performs some
      VM_BUG_ON checks, and since the current check for nid in
      alloc_pages_node() uses a 'nid < 0' comparison (which includes
      NUMA_NO_NODE), it may hide wrong values which would be previously
      exposed.
      
      Both differences will be rectified by the next patch.
      
      To sum up, this patch makes no functional changes, except temporarily
      hiding potentially buggy callers.  Restricting the checks in
      alloc_pages_node() is left for the next patch which can in turn expose
      more existing buggy callers.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NRobin Holt <robinmholt@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Cliff Whickman <cpw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96db800f